Skip to content

Quick Start

Get started with Data Caterer in minutes. Choose your preferred approach:

  • Java/Scala API (Recommended)


    Full programmatic control for complex scenarios and test integration.

  • YAML


    Configuration-based approach. Great for CI/CD pipelines.

  • UI


    Point-and-click interface. No coding required.


Java/Scala API

The recommended approach for full control over data generation. Write your data generation logic in Scala or Java.

Run

git clone git@github.com:data-catering/data-caterer.git
cd data-caterer/example
./run.sh

Press Enter to run the default example, or enter a class name (e.g., CsvPlan).

What Happens

  1. Builds your Scala/Java code into a JAR
  2. Runs it via Docker with the Data Caterer engine
  3. Generates data and reports to docker/sample/

Example Code

class CsvPlan extends PlanRun {
  val accountTask = csv("accounts", "/opt/app/data/accounts", Map("header" -> "true"))
    .fields(
      field.name("account_id").regex("ACC[0-9]{8}").unique(true),
      field.name("name").expression("#{Name.name}"),
      field.name("balance").`type`(DoubleType).min(10).max(1000),
      field.name("status").oneOf("open", "closed", "pending")
    )
    .count(count.records(100))

  execute(accountTask)
}

More Examples

Class Description
DocumentationPlanRun JSON + CSV with foreign keys (default)
CsvPlan CSV files with relationships
PostgresPlanRun PostgreSQL tables
KafkaPlanRun Kafka messages
ValidationPlanRun Generate and validate data

Run any example: ./run.sh <ClassName>

All example classes are in src/main/scala/io/github/datacatering/plan/.


YAML

Define data generation using YAML configuration files.

Run

git clone git@github.com:data-catering/data-caterer.git
cd data-caterer/example
./run.sh csv.yaml

What Happens

  1. Builds the example JAR
  2. Runs the YAML plan via Docker
  3. Generates data and reports to docker/data/custom/

Example YAML

Plan file (docker/data/custom/plan/csv.yaml):

name: "csv_example_plan"
description: "Create transaction data in CSV file"
tasks:
  - name: "csv_transaction_file"
    dataSourceName: "csv"
    enabled: true

Task file (docker/data/custom/task/file/csv/):

name: "csv_transaction_file"
steps:
  - name: "transactions"
    type: "csv"
    options:
      path: "/opt/app/data/transactions"
      header: "true"
    count:
      records: 1000
    fields:
      - name: "account_id"
        options:
          regex: "ACC[0-9]{8}"
      - name: "amount"
        type: "double"
        options:
          min: 10
          max: 1000

More Examples

Plan File Description
csv.yaml CSV files
parquet.yaml Parquet files
postgres.yaml PostgreSQL tables
kafka.yaml Kafka messages
foreign-key.yaml Data with relationships
validation.yaml Generate and validate

Run any example: ./run.sh <filename>.yaml

All plan files are in docker/data/custom/plan/. Task definitions are in docker/data/custom/task/.


UI

A web interface for creating and running data generation plans.

Run

docker run -d -p 9898:9898 -e DEPLOY_MODE=standalone --name datacaterer datacatering/data-caterer:0.18.0

Open http://localhost:9898 in your browser.

What You Can Do

  • Create connections to databases, files, Kafka, and more
  • Define data schemas with field types and constraints
  • Generate test data with a single click
  • View results and reports in the browser

Try the UI demo


View Results

After running, check the generated report:

  • Java/Scala examples: docker/sample/report/index.html
  • YAML examples: docker/data/custom/report/index.html

Sample report preview


Next Steps