Skip to content


Below are a list of guides you can follow to create your data generation for your use case.

For any of the paid tier guides, you can use the trial version fo the app to try it out. Details on how to get the trial can be found here.


Data Sources

YAML Files

Base Concept

The execution of the data generator is based on the concept of plans and tasks. A plan represent the set of tasks that need to be executed, along with other information that spans across tasks, such as foreign keys between data sources.
A task represent the component(s) of a data source and its associated metadata so that it understands what the data should look like and how many steps (sub data sources) there are (i.e. tables in a database, topics in Kafka). Tasks can define one or more steps.


Foreign Keys

Define foreign keys across data sources in your plan to ensure generated data can match
Link to associated task 1
Link to associated task 2


Data Source Type Data Source Sample Task Notes
Database Postgres Sample
Database MySQL Sample
Database Cassandra Sample
File CSV Sample
File JSON Sample Contains nested schemas and use of SQL for generated values
File Parquet Sample Partition by year column
Kafka Kafka Sample Specific base schema to be used, define headers, key, value, etc.
JMS Solace Sample JSON formatted message
HTTP PUT Sample JSON formatted PUT body


Basic configuration


To see how it runs against different data sources, you can run using docker-compose and set DATA_SOURCE like below

./gradlew build
cd docker
DATA_SOURCE=postgres docker-compose up -d datacaterer

Can set it to one of the following:

  • postgres
  • mysql
  • cassandra
  • solace
  • kafka
  • http