0.17.2
Deployed: 30-10-2025
Latest features and fixes for Data Caterer include Custom Transformations for post-generation data processing, significant architectural improvements to the sink layer, and enhanced UI capabilities for sample data generation with relationship awareness.
Custom Transformations
-
Post-Generation Processing: Execute custom transformations after data generation and file writing, enabling "last mile" data processing for format conversions, business logic, and custom formatting
-
Transformation Modes:
- Per-Record: Transform each line/record independently (memory efficient, suitable for line-by-line modifications)
-
Whole-File: Transform entire files as single units (ideal for JSON array wrapping, header/footer addition, format conversions)
-
API Integration: Extended
TaskBuilderandConnectionBuilderwith.transformationPerRecord()and.transformationWholeFile()methods, plus.transformationOptions()for custom configuration -
Configuration: New
TransformationConfigmodel supporting custom class/method specification, transformation mode selection, output path customization, and runtime enable/disable control -
Use Cases: JSON array wrapping, CSV header addition, legacy format conversion (e.g., fixed-width), field encryption, and custom business logic application
Architecture & Code Quality
- Sink Layer Refactoring: Decomposed monolithic
SinkFactoryinto specialized components for improved maintainability and testability: BatchSinkWriter: Batch data writes with format supportRealTimeSinkWriter: Streaming writes with rate limitingFileConsolidator: Spark part file consolidation to single files-
TransformationApplicator: Post-write transformation orchestration -
UI Sample Generation: Enhanced sample data generation with new modular components:
RelationshipAwareSampleGenerator: Foreign key and relationship handling in samplesSampleSizeCalculator: Intelligent sample size determinationSampleDataConverter: Data format conversion for UI preview-
StepParser: Improved YAML/plan parsing for sample requests -
Data Models: New
TransformationModels.scalawith transformation configuration types and serialization support
API & UI Integration
- REST API Transformation Support: Sample endpoints automatically apply transformations when configured, enabling preview of transformed data:
GET /sample/plans/{planName},/sample/tasks/{taskName},/sample/steps/{stepName}-
Transformations execute during sample generation for accurate preview
-
UI Enhancements: Updated web interface components to support transformation configuration and preview transformed sample data
Deployment & Integration
-
Custom Transformer Deployment: Multiple deployment options including Docker volume mounts (
/opt/app/custom), custom Docker images, Kubernetes ConfigMaps/PersistentVolumes, and local classpath -
Example Transformers: Comprehensive examples in repository:
UpperCasePerRecordTransformer.scala- Per-record text transformationJsonArrayWrapperTransformer.scala- Whole-file JSON array wrappingCsvToFixedWidthTransformer.java- Format conversion exampleTransformationExamplePlan.scala- Complete usage patterns
Documentation
-
Transformation Guide: New comprehensive documentation at docs/generator/transformation.md covering API reference, implementation guides for Java/Scala/YAML, deployment strategies, and best practices
-
Updated Guides: Enhanced API Documentation with transformation support and updated Deployment Guide with transformer deployment examples
Testing & Examples
- Integration Tests: New
TransformationIntegrationTestfor end-to-end transformation workflows - Unit Tests:
PerRecordTransformerTestandWholeFileTransformerTestfor transformer component validation - API Tests: Updated
PlanApiEndToEndTestwith transformation scenarios - Example Plans:
TransformationExamplePlan.scalademonstrating both transformation modes with real-world use cases
Migration Notes
This release is fully backward compatible. Existing plans continue to work without modification. The transformation feature is opt-in and only activates when explicitly configured in task/step definitions.
Performance Characteristics
Transformations execute after data generation with performance characteristics dependent on mode: - Per-Record: Memory efficient, linear scaling with record count - Whole-File: May require full file loading for large datasets - Custom Logic: Performance depends on transformer implementation complexity