Migrating to Unified YAML Format
Data Caterer v1.0 introduces a new Unified YAML Format that simplifies configuration by combining plans, tasks, and connections into a single file.
Why Migrate?
The new unified format provides:
- ✅ Single-file configuration - No need for separate plan and task files
- ✅ Simpler structure - More intuitive hierarchy and naming
- ✅ Better validation - JSON schema support for IDE autocomplete
- ✅ Easier maintenance - All configuration in one place
- ✅ Improved readability - Cleaner, more concise syntax
Migration Tool
We provide an automated migration script (migrate_yaml.py) that converts your existing YAML plans to the new format.
Requirements
- Python 3.6 or higher
- PyYAML library (install with
pip install pyyaml)
Basic Usage
# Migrate a single plan file
python3 migrate_yaml.py old_plan.yaml new_plan.yaml
# Auto-generate output filename
python3 migrate_yaml.py old_plan.yaml
# Creates: old_plan_unified.yaml
# Migrate with explicit task folder
python3 migrate_yaml.py old_plan.yaml --task-folder ./tasks
# Dry run (preview without writing)
python3 migrate_yaml.py old_plan.yaml --dry-run
Batch Migration
Migrate entire directories:
Format Comparison
Old Format (Legacy)
Plan file: plan.yaml
name: "my_plan"
description: "Generate test data"
tasks:
- name: "json_task"
dataSourceName: "json_output"
enabled: true
flagsConfig:
enableGenerateData: true
enableCount: true
Task file: task/json_output.yaml
name: "json_output"
type: "json"
path: "/tmp/data/output"
steps:
- name: "users"
count:
records: 100
schema:
fields:
- name: "id"
generator:
type: "regex"
options:
regex: "USR[0-9]{6}"
New Format (Unified)
Single file: plan_unified.yaml
version: "1.0"
name: "my_plan"
description: "Generate test data"
configuration:
flags:
enableGenerateData: true
enableCount: true
dataSources:
- name: "json_output"
connection:
type: "json"
options:
path: "/tmp/data/output"
steps:
- name: "users"
count:
records: 100
fields:
- name: "id"
options:
regex: "USR[0-9]{6}"
Key Changes
1. Version Field
All unified YAML files must start with:
2. Tasks → Data Sources
Old: tasks array with references to separate task files
New: dataSources array with inline connection details
# Old
tasks:
- name: "my_task"
dataSourceName: "postgres_db"
# New
dataSources:
- name: "postgres_db"
connection:
type: "postgres"
options:
url: "jdbc:postgresql://localhost:5432/mydb"
3. Schema → Fields
Old: schema.fields with generator wrapper
New: Direct fields with options
# Old
schema:
fields:
- name: "email"
generator:
options:
expression: "#{Internet.emailAddress}"
# New
fields:
- name: "email"
options:
expression: "#{Internet.emailAddress}"
4. Configuration Consolidation
All configuration now under configuration key:
configuration:
flags:
enableGenerateData: true
enableFastGeneration: true
validation:
numSampleErrorRecords: 5
saveMode: "append"
Manual Migration Steps
If you prefer to migrate manually:
- Create new file with
version: "1.0" - Copy plan metadata:
name,description - Move configuration: Combine
flagsConfig,sinkOptions,validationConfigunderconfiguration - Merge tasks: For each task:
- Load the corresponding task file from
task/folder - Create a
dataSourceentry - Copy connection info to
connectionblock - Copy
stepswith field definitions - Rename keys:
schema.fields→fieldsgenerator.options→options- Remove
enabledflags (disabled items should be removed)
Testing Your Migration
After migrating, verify the unified YAML works:
# Set the plan file path
export PLAN_FILE_PATH=/path/to/unified_plan.yaml
# Run Data Caterer
./gradlew :app:run
Or use the manual test runner:
Common Issues
Issue: Missing connection details
Solution: The migration script auto-detects task folders. If your tasks are in a non-standard location, use --task-folder:
Issue: Unsupported field options
Solution: Some advanced options may need manual adjustment. Check the unified YAML schema for supported options.
Issue: Validation references
Solution: Update validation dataSourceName to match the new data source names in the unified format.
Getting Help
- Documentation: See Configuration Guide
- Examples: Check
misc/schema/examples/for unified YAML examples - Issues: Report migration problems at GitHub Issues
Backward Compatibility
The old format is still supported in Data Caterer v1.0, but we recommend migrating to the unified format for: - Better IDE support - Simplified configuration - Future-proof your setup
The legacy format will be deprecated in a future release.