Skip to content

0.17.2

Deployed: 30-10-2025

Latest features and fixes for Data Caterer include Custom Transformations for post-generation data processing, significant architectural improvements to the sink layer, and enhanced UI capabilities for sample data generation with relationship awareness.

Custom Transformations

  • Post-Generation Processing: Execute custom transformations after data generation and file writing, enabling "last mile" data processing for format conversions, business logic, and custom formatting

  • Transformation Modes:

  • Per-Record: Transform each line/record independently (memory efficient, suitable for line-by-line modifications)
  • Whole-File: Transform entire files as single units (ideal for JSON array wrapping, header/footer addition, format conversions)

  • API Integration: Extended TaskBuilder and ConnectionBuilder with .transformationPerRecord() and .transformationWholeFile() methods, plus .transformationOptions() for custom configuration

  • Configuration: New TransformationConfig model supporting custom class/method specification, transformation mode selection, output path customization, and runtime enable/disable control

  • Use Cases: JSON array wrapping, CSV header addition, legacy format conversion (e.g., fixed-width), field encryption, and custom business logic application

Architecture & Code Quality

  • Sink Layer Refactoring: Decomposed monolithic SinkFactory into specialized components for improved maintainability and testability:
  • BatchSinkWriter: Batch data writes with format support
  • RealTimeSinkWriter: Streaming writes with rate limiting
  • FileConsolidator: Spark part file consolidation to single files
  • TransformationApplicator: Post-write transformation orchestration

  • UI Sample Generation: Enhanced sample data generation with new modular components:

  • RelationshipAwareSampleGenerator: Foreign key and relationship handling in samples
  • SampleSizeCalculator: Intelligent sample size determination
  • SampleDataConverter: Data format conversion for UI preview
  • StepParser: Improved YAML/plan parsing for sample requests

  • Data Models: New TransformationModels.scala with transformation configuration types and serialization support

API & UI Integration

  • REST API Transformation Support: Sample endpoints automatically apply transformations when configured, enabling preview of transformed data:
  • GET /sample/plans/{planName}, /sample/tasks/{taskName}, /sample/steps/{stepName}
  • Transformations execute during sample generation for accurate preview

  • UI Enhancements: Updated web interface components to support transformation configuration and preview transformed sample data

Deployment & Integration

  • Custom Transformer Deployment: Multiple deployment options including Docker volume mounts (/opt/app/custom), custom Docker images, Kubernetes ConfigMaps/PersistentVolumes, and local classpath

  • Example Transformers: Comprehensive examples in repository:

  • UpperCasePerRecordTransformer.scala - Per-record text transformation
  • JsonArrayWrapperTransformer.scala - Whole-file JSON array wrapping
  • CsvToFixedWidthTransformer.java - Format conversion example
  • TransformationExamplePlan.scala - Complete usage patterns

Documentation

  • Transformation Guide: New comprehensive documentation at docs/generator/transformation.md covering API reference, implementation guides for Java/Scala/YAML, deployment strategies, and best practices

  • Updated Guides: Enhanced API Documentation with transformation support and updated Deployment Guide with transformer deployment examples

Testing & Examples

  • Integration Tests: New TransformationIntegrationTest for end-to-end transformation workflows
  • Unit Tests: PerRecordTransformerTest and WholeFileTransformerTest for transformer component validation
  • API Tests: Updated PlanApiEndToEndTest with transformation scenarios
  • Example Plans: TransformationExamplePlan.scala demonstrating both transformation modes with real-world use cases

Migration Notes

This release is fully backward compatible. Existing plans continue to work without modification. The transformation feature is opt-in and only activates when explicitly configured in task/step definitions.

Performance Characteristics

Transformations execute after data generation with performance characteristics dependent on mode: - Per-Record: Memory efficient, linear scaling with record count - Whole-File: May require full file loading for large datasets - Custom Logic: Performance depends on transformer implementation complexity