0.17.2

Deployed: 30-10-2025

Latest features and fixes for Data Caterer include Custom Transformations for post-generation data processing, significant architectural improvements to the sink layer, and enhanced UI capabilities for sample data generation with relationship awareness.

Custom Transformations

Post-Generation Processing: Execute custom transformations after data generation and file writing, enabling "last mile" data processing for format conversions, business logic, and custom formatting
Transformation Modes:
Per-Record: Transform each line/record independently (memory efficient, suitable for line-by-line modifications)
Whole-File: Transform entire files as single units (ideal for JSON array wrapping, header/footer addition, format conversions)
API Integration: Extended TaskBuilder and ConnectionBuilder with .transformationPerRecord() and .transformationWholeFile() methods, plus .transformationOptions() for custom configuration
Configuration: New TransformationConfig model supporting custom class/method specification, transformation mode selection, output path customization, and runtime enable/disable control
Use Cases: JSON array wrapping, CSV header addition, legacy format conversion (e.g., fixed-width), field encryption, and custom business logic application

Architecture & Code Quality

Sink Layer Refactoring: Decomposed monolithic SinkFactory into specialized components for improved maintainability and testability:
BatchSinkWriter: Batch data writes with format support
RealTimeSinkWriter: Streaming writes with rate limiting
FileConsolidator: Spark part file consolidation to single files
TransformationApplicator: Post-write transformation orchestration
UI Sample Generation: Enhanced sample data generation with new modular components:
RelationshipAwareSampleGenerator: Foreign key and relationship handling in samples
SampleSizeCalculator: Intelligent sample size determination
SampleDataConverter: Data format conversion for UI preview
StepParser: Improved YAML/plan parsing for sample requests
Data Models: New TransformationModels.scala with transformation configuration types and serialization support

API & UI Integration

REST API Transformation Support: Sample endpoints automatically apply transformations when configured, enabling preview of transformed data:
GET /sample/plans/{planName}, /sample/tasks/{taskName}, /sample/steps/{stepName}
Transformations execute during sample generation for accurate preview
UI Enhancements: Updated web interface components to support transformation configuration and preview transformed sample data

Deployment & Integration

Custom Transformer Deployment: Multiple deployment options including Docker volume mounts (/opt/app/custom), custom Docker images, Kubernetes ConfigMaps/PersistentVolumes, and local classpath
Example Transformers: Comprehensive examples in repository:
UpperCasePerRecordTransformer.scala - Per-record text transformation
JsonArrayWrapperTransformer.scala - Whole-file JSON array wrapping
CsvToFixedWidthTransformer.java - Format conversion example
TransformationExamplePlan.scala - Complete usage patterns

Documentation

Transformation Guide: New comprehensive documentation at docs/generator/transformation.md covering API reference, implementation guides for Java/Scala/YAML, deployment strategies, and best practices
Updated Guides: Enhanced API Documentation with transformation support and updated Deployment Guide with transformer deployment examples

Testing & Examples

Integration Tests: New TransformationIntegrationTest for end-to-end transformation workflows
Unit Tests: PerRecordTransformerTest and WholeFileTransformerTest for transformer component validation
API Tests: Updated PlanApiEndToEndTest with transformation scenarios
Example Plans: TransformationExamplePlan.scala demonstrating both transformation modes with real-world use cases

Migration Notes

This release is fully backward compatible. Existing plans continue to work without modification. The transformation feature is opt-in and only activates when explicitly configured in task/step definitions.

Performance Characteristics

Transformations execute after data generation with performance characteristics dependent on mode: - Per-Record: Memory efficient, linear scaling with record count - Whole-File: May require full file loading for large datasets - Custom Logic: Performance depends on transformer implementation complexity