Roadmap
Items below summarise the roadmap of Data Caterer. As each task gets completed, it will be documented and linked.
Feature | Description | Sub Tasks |
---|---|---|
Data source support | Batch or real time data sources that can be added to Data Caterer. Support data sources that users want | - AWS, GCP and Azure related data services ( - - - RabbitMQ - ActiveMQ - MongoDB - Elasticsearch - Snowflake - Databricks - Pulsar |
Metadata discovery | Allow for schema and data profiling from external metadata sources | - - JMS - Read from samples - - - - - Amundsen - Datahub - Solace Event Portal - Airflow - DBT - Manually insert create table statement from UI |
Developer API | Scala/Java interface for developers/testers to create data generation and validation tasks | - - - Python - Javascript |
Report generation | Generate a report that summarises the data generation or validation results | - |
UI portal | Allow users to access a UI to input data generation or validation tasks. Also be able to view report results | - - - Metadata stored in database - - Preview of generated data - Additional dialog to confirm delete and execute plan |
Integration with data validation tools | Derive data validation rules from existing data validation tools | - - DBT constraints - SodaCL - MonteCarlo - |
Data validation rule suggestions | Based on metadata, generate data validation rules appropriate for the dataset | - |
Wait conditions before data validation | Define certain conditions to be met before starting data validations | - - - - |
Validation types | Ability to define simple/complex data validations | - - - Ordering (transactions are ordered by date) - - Data profile (how close the generated data profile is compared to the expected data profile) - - |
Data generation record count | Generate scenarios where there are one to many, many to many situations relating to record count. Also ability to cover all edge cases or scenarios | - - Ability to override edge cases |
Alerting | When tasks have completed, ability to define alerts based on certain conditions | - |
Metadata enhancements | Based on data profiling or inference, can add to existing metadata | - PII detection (can integrate with Presidio) - Relationship detection across data sources - SQL generation - Ordering information |
Data cleanup | Ability to clean up generated data | - - - Clean up data from real time sources (i.e. DELETE HTTP endpoint, delete events in JMS) |
Trial version | Trial version of the full app for users to test out all the features | - |
Code generation | Based on metadata or existing classes, code for data generation and validation could be generated | - Code generation - Schema generation from Scala/Java class |
Real time response data validations | Ability to define data validations based on the response from real time data sources (e.g. HTTP response) | - |