0.18.0
Deployed: 12-12-2025
Latest features and fixes for Data Caterer include advanced execution strategies with load patterns for performance testing, complete foreign key architecture refactoring with strategy patterns, unified YAML configuration with inline connections and environment variables, and comprehensive performance metrics collection with validation capabilities.
Advanced Execution Strategies & Load Patterns
- Duration-Based Execution: New execution mode supporting time-based data generation with rate limiting, enabling realistic performance testing scenarios
DurationBasedExecutionStrategywith configurable duration (seconds/minutes/hours) and target rates- Rate limiting with
RateLimitersupporting various time units (1s, 1m, etc.) -
DurationTrackerfor precise execution time management -
Load Pattern Framework: Comprehensive load testing capabilities with multiple pattern types:
- Ramp Pattern: Linear load increase from start rate to end rate for capacity testing
- Spike Pattern: Sudden load spikes with configurable duration and intensity
- Wave Pattern: Sinusoidal load variations for stress testing
- Stepped Pattern: Staircase load increases with configurable steps
- Constant Pattern: Steady load maintenance for baseline testing
-
Breaking Point Pattern: Aggressive load escalation to find system limits
-
Weighted Task Execution: Task prioritization and distribution control:
WeightedTaskSelectorfor proportional task execution based on assigned weights- Enhanced
StageCoordinatorfor managing multi-task execution phases -
WarmupCooldownManagerfor gradual load introduction and teardown -
Execution Strategy Architecture: Modular design with pluggable strategies:
ExecutionStrategytrait withcalculateNumBatches(),shouldContinue(), and metrics collectionGenerationModeenum (Batched, AllUpfront, Progressive) for different data generation approaches- Strategy factory pattern with
ExecutionStrategyFactory
Foreign Key Strategy Architecture
- Strategy Pattern Refactoring: Complete architectural overhaul of foreign key processing:
ForeignKeyProcessorV2 with modular strategy compositionForeignKeyStrategytrait with specialized implementations-
Strategy selection based on configuration and data characteristics
-
Cardinality Strategy: Advanced one-to-many relationship creation:
CardinalityStrategywith group-based and index-based assignment modes- Support for
perFieldcount configuration for maintaining group structure -
Configurable min/max ratios and distribution patterns (uniform, varying)
-
Generation Mode Strategy: Flexible foreign key value assignment:
- All-Exist Mode: All foreign key references are valid (default)
- Partial Mode: Configurable percentage of null/invalid FK values
- All-Combinations Mode: Exhaustive FK value combinations (future enhancement)
-
GenerationModeStrategyfor mode-specific logic -
Nullability Strategy: Intelligent null handling for foreign keys:
NullabilityStrategywith post-processing null application- Configurable null percentages per relationship
-
Preservation of cardinality structure when applying nulls
-
Enhanced Foreign Key Context: Comprehensive relationship metadata:
EnhancedForeignKeyRelationwith detailed configuration supportForeignKeyConfigwith violation ratios, strategies, and broadcast optimization-
ForeignKeyContextfor passing plan, data, and task information -
Utility Architecture: Specialized components for FK processing:
InsertOrderCalculatorfor determining safe insertion sequencesMetadataUtilfor data source metadata extractionNestedFieldUtilfor complex nested field handlingDataFrameSizeEstimatorfor memory-efficient processing
Unified YAML Configuration
- Inline Connections: Define connections directly within task configurations:
- No separate connection files required for simple setups
- Environment variable interpolation in connection URLs and options
-
Support for all connection types (JDBC, Kafka, HTTP, etc.)
-
Environment Variable Support: Dynamic configuration through environment variables:
${VAR_NAME}syntax throughout YAML configurations- Default values with
${VAR_NAME:-default}syntax -
Secure configuration management for different environments
-
Comprehensive Validation Framework: Inline validation definitions:
- Field Validations: Unique, null checks, regex matching, range validation
- Expression Validations: SQL expressions with error thresholds
- GroupBy Validations: Aggregated validations by grouping fields
-
Metric Validations: Performance and data quality metrics
-
Performance Test Configuration: Dedicated performance testing setup:
testType: "performance"for load testing scenariostestConfigwith warmup/cooldown periods and execution modes- Weighted task distribution for realistic workload simulation
Performance Metrics & Validation
- Advanced Metrics Collection: Comprehensive performance tracking:
PerformanceMetricswith batch-level granularity- Throughput, latency percentiles (P50, P75, P90, P95, P99, P99.9)
-
Total records, duration tracking, and error rate monitoring
-
Percentile Calculation: Memory-efficient percentile calculation:
SimplePercentileCalculatorfor large datasets (>100k samples)- Automatic fallback to exact calculation for smaller datasets
-
Configurable threshold for algorithm selection
-
Performance Validation: Metric-based validation rules:
- Throughput validation with configurable thresholds
- Latency percentile validation (P95, P99, etc.)
- Error rate validation for reliability testing
-
Custom validation expressions with pre-filters
-
Metrics Exporter: Enhanced reporting capabilities:
PerformanceMetricsExporterfor structured metrics output- HTML report generation with performance charts
- Integration with existing report framework
Architecture & Code Quality
- Sink Layer Enhancements: Improved real-time and batch data writing:
- Enhanced
PekkoStreamingSinkWriterwith performance optimizations -
SinkRouterfor intelligent sink selection -
Connection Management: Flexible connection handling:
ConnectionDeserializerfor YAML-based connection parsingConnectionResolverfor plan-level connection management-
Support for reusable and inline connection definitions
-
Plan Processing Architecture: Modular plan execution:
CardinalityCountAdjustmentProcessorfor data expansionForeignKeyUniquenessProcessorfor FK relationship validation-
MutatingPrePlanProcessorfor plan transformation -
API Enhancements: Builder pattern improvements:
ForeignKeyConfigBuildersfor declarative FK configuration- Enhanced
PlanBuilderandSinkOptionsBuilderwith new options - Type-safe configuration with
ConnectionDeserializer
Data Generation & Validation
- Enhanced Sample Generation: Relationship-aware data preview:
RelationshipAwareSampleGeneratorwith FK relationship handlingSampleSizeCalculatorfor intelligent sample size determination-
Improved UI sample generation with transformation support
-
Validation Processor Updates: Advanced validation capabilities:
MetricValidatorfor performance and data quality metrics- Enhanced expression evaluation with error thresholds
-
Support for complex aggregation validations
-
Parser Enhancements: Improved YAML and plan parsing:
LoadPatternParserfor load pattern configuration- Enhanced
PlanParserwith unified format support - Better error handling and validation
Examples & Documentation
- Foreign Key Examples: Relationship configuration patterns:
foreign-key-advanced-example.yaml: Complex FK scenariosforeign-key-cardinality-example.yaml: Cardinality patternsforeign-key-generation-modes-example.yaml: Generation mode variations-
foreign-key-nullability-example.yaml: Null handling strategies -
Test Plan Enhancements: Comprehensive test coverage:
- HTTP execution strategy test plans with various load patterns
- Metric validation test plans (pass/fail scenarios)
- Duration-based execution examples
- Warmup/cooldown and weighted task demonstrations
Migration Notes
This release introduces significant architectural improvements while maintaining backward compatibility. Existing plans continue to work without modification. New features are opt-in and require explicit configuration:
- Execution Strategies: Default to count-based; use
count.durationfor duration-based execution - Load Patterns: Optional enhancement to duration-based execution
- Foreign Key Strategies: Automatic strategy selection; V2 implementation enabled by default
- Unified YAML: New format supported alongside existing formats
- Performance Metrics: Automatically collected for duration-based execution
Performance Characteristics
- Execution Strategies: Duration-based execution with load patterns enables realistic performance testing with memory-efficient streaming
- Foreign Key Processing: Strategy-based approach reduces memory usage through specialized algorithms
- Metrics Collection: SimplePercentileCalculator provides bounded memory usage for percentile calculations on large datasets
- Load Patterns: Configurable rate changes with minimal performance overhead
Testing & Quality
- Integration Tests: New test suites for execution strategies, foreign key strategies, and unified YAML parsing
- Unit Tests: Comprehensive coverage for all new components and strategies
- Performance Tests: Load pattern validation and metrics collection verification
- Example Validation: All examples tested and verified for correctness