Skip to content

0.18.0

Deployed: 12-12-2025

Latest features and fixes for Data Caterer include advanced execution strategies with load patterns for performance testing, complete foreign key architecture refactoring with strategy patterns, unified YAML configuration with inline connections and environment variables, and comprehensive performance metrics collection with validation capabilities.

Advanced Execution Strategies & Load Patterns

  • Duration-Based Execution: New execution mode supporting time-based data generation with rate limiting, enabling realistic performance testing scenarios
  • DurationBasedExecutionStrategy with configurable duration (seconds/minutes/hours) and target rates
  • Rate limiting with RateLimiter supporting various time units (1s, 1m, etc.)
  • DurationTracker for precise execution time management

  • Load Pattern Framework: Comprehensive load testing capabilities with multiple pattern types:

  • Ramp Pattern: Linear load increase from start rate to end rate for capacity testing
  • Spike Pattern: Sudden load spikes with configurable duration and intensity
  • Wave Pattern: Sinusoidal load variations for stress testing
  • Stepped Pattern: Staircase load increases with configurable steps
  • Constant Pattern: Steady load maintenance for baseline testing
  • Breaking Point Pattern: Aggressive load escalation to find system limits

  • Weighted Task Execution: Task prioritization and distribution control:

  • WeightedTaskSelector for proportional task execution based on assigned weights
  • Enhanced StageCoordinator for managing multi-task execution phases
  • WarmupCooldownManager for gradual load introduction and teardown

  • Execution Strategy Architecture: Modular design with pluggable strategies:

  • ExecutionStrategy trait with calculateNumBatches(), shouldContinue(), and metrics collection
  • GenerationMode enum (Batched, AllUpfront, Progressive) for different data generation approaches
  • Strategy factory pattern with ExecutionStrategyFactory

Foreign Key Strategy Architecture

  • Strategy Pattern Refactoring: Complete architectural overhaul of foreign key processing:
  • ForeignKeyProcessor V2 with modular strategy composition
  • ForeignKeyStrategy trait with specialized implementations
  • Strategy selection based on configuration and data characteristics

  • Cardinality Strategy: Advanced one-to-many relationship creation:

  • CardinalityStrategy with group-based and index-based assignment modes
  • Support for perField count configuration for maintaining group structure
  • Configurable min/max ratios and distribution patterns (uniform, varying)

  • Generation Mode Strategy: Flexible foreign key value assignment:

  • All-Exist Mode: All foreign key references are valid (default)
  • Partial Mode: Configurable percentage of null/invalid FK values
  • All-Combinations Mode: Exhaustive FK value combinations (future enhancement)
  • GenerationModeStrategy for mode-specific logic

  • Nullability Strategy: Intelligent null handling for foreign keys:

  • NullabilityStrategy with post-processing null application
  • Configurable null percentages per relationship
  • Preservation of cardinality structure when applying nulls

  • Enhanced Foreign Key Context: Comprehensive relationship metadata:

  • EnhancedForeignKeyRelation with detailed configuration support
  • ForeignKeyConfig with violation ratios, strategies, and broadcast optimization
  • ForeignKeyContext for passing plan, data, and task information

  • Utility Architecture: Specialized components for FK processing:

  • InsertOrderCalculator for determining safe insertion sequences
  • MetadataUtil for data source metadata extraction
  • NestedFieldUtil for complex nested field handling
  • DataFrameSizeEstimator for memory-efficient processing

Unified YAML Configuration

  • Inline Connections: Define connections directly within task configurations:
  • No separate connection files required for simple setups
  • Environment variable interpolation in connection URLs and options
  • Support for all connection types (JDBC, Kafka, HTTP, etc.)

  • Environment Variable Support: Dynamic configuration through environment variables:

  • ${VAR_NAME} syntax throughout YAML configurations
  • Default values with ${VAR_NAME:-default} syntax
  • Secure configuration management for different environments

  • Comprehensive Validation Framework: Inline validation definitions:

  • Field Validations: Unique, null checks, regex matching, range validation
  • Expression Validations: SQL expressions with error thresholds
  • GroupBy Validations: Aggregated validations by grouping fields
  • Metric Validations: Performance and data quality metrics

  • Performance Test Configuration: Dedicated performance testing setup:

  • testType: "performance" for load testing scenarios
  • testConfig with warmup/cooldown periods and execution modes
  • Weighted task distribution for realistic workload simulation

Performance Metrics & Validation

  • Advanced Metrics Collection: Comprehensive performance tracking:
  • PerformanceMetrics with batch-level granularity
  • Throughput, latency percentiles (P50, P75, P90, P95, P99, P99.9)
  • Total records, duration tracking, and error rate monitoring

  • Percentile Calculation: Memory-efficient percentile calculation:

  • SimplePercentileCalculator for large datasets (>100k samples)
  • Automatic fallback to exact calculation for smaller datasets
  • Configurable threshold for algorithm selection

  • Performance Validation: Metric-based validation rules:

  • Throughput validation with configurable thresholds
  • Latency percentile validation (P95, P99, etc.)
  • Error rate validation for reliability testing
  • Custom validation expressions with pre-filters

  • Metrics Exporter: Enhanced reporting capabilities:

  • PerformanceMetricsExporter for structured metrics output
  • HTML report generation with performance charts
  • Integration with existing report framework

Architecture & Code Quality

  • Sink Layer Enhancements: Improved real-time and batch data writing:
  • Enhanced PekkoStreamingSinkWriter with performance optimizations
  • SinkRouter for intelligent sink selection

  • Connection Management: Flexible connection handling:

  • ConnectionDeserializer for YAML-based connection parsing
  • ConnectionResolver for plan-level connection management
  • Support for reusable and inline connection definitions

  • Plan Processing Architecture: Modular plan execution:

  • CardinalityCountAdjustmentProcessor for data expansion
  • ForeignKeyUniquenessProcessor for FK relationship validation
  • MutatingPrePlanProcessor for plan transformation

  • API Enhancements: Builder pattern improvements:

  • ForeignKeyConfigBuilders for declarative FK configuration
  • Enhanced PlanBuilder and SinkOptionsBuilder with new options
  • Type-safe configuration with ConnectionDeserializer

Data Generation & Validation

  • Enhanced Sample Generation: Relationship-aware data preview:
  • RelationshipAwareSampleGenerator with FK relationship handling
  • SampleSizeCalculator for intelligent sample size determination
  • Improved UI sample generation with transformation support

  • Validation Processor Updates: Advanced validation capabilities:

  • MetricValidator for performance and data quality metrics
  • Enhanced expression evaluation with error thresholds
  • Support for complex aggregation validations

  • Parser Enhancements: Improved YAML and plan parsing:

  • LoadPatternParser for load pattern configuration
  • Enhanced PlanParser with unified format support
  • Better error handling and validation

Examples & Documentation

  • Foreign Key Examples: Relationship configuration patterns:
  • foreign-key-advanced-example.yaml: Complex FK scenarios
  • foreign-key-cardinality-example.yaml: Cardinality patterns
  • foreign-key-generation-modes-example.yaml: Generation mode variations
  • foreign-key-nullability-example.yaml: Null handling strategies

  • Test Plan Enhancements: Comprehensive test coverage:

  • HTTP execution strategy test plans with various load patterns
  • Metric validation test plans (pass/fail scenarios)
  • Duration-based execution examples
  • Warmup/cooldown and weighted task demonstrations

Migration Notes

This release introduces significant architectural improvements while maintaining backward compatibility. Existing plans continue to work without modification. New features are opt-in and require explicit configuration:

  • Execution Strategies: Default to count-based; use count.duration for duration-based execution
  • Load Patterns: Optional enhancement to duration-based execution
  • Foreign Key Strategies: Automatic strategy selection; V2 implementation enabled by default
  • Unified YAML: New format supported alongside existing formats
  • Performance Metrics: Automatically collected for duration-based execution

Performance Characteristics

  • Execution Strategies: Duration-based execution with load patterns enables realistic performance testing with memory-efficient streaming
  • Foreign Key Processing: Strategy-based approach reduces memory usage through specialized algorithms
  • Metrics Collection: SimplePercentileCalculator provides bounded memory usage for percentile calculations on large datasets
  • Load Patterns: Configurable rate changes with minimal performance overhead

Testing & Quality

  • Integration Tests: New test suites for execution strategies, foreign key strategies, and unified YAML parsing
  • Unit Tests: Comprehensive coverage for all new components and strategies
  • Performance Tests: Load pattern validation and metrics collection verification
  • Example Validation: All examples tested and verified for correctness