0.17.1
Deployed: 26-10-2025
Latest features and fixes for Data Caterer include major performance optimizations for regex-based data generation, architectural improvements to the UI layer, and enhanced testing infrastructure.
Performance Optimization
-
Intelligent Regex Generation: Automatically converts common regex patterns to pure SQL expressions (no UDFs), with automatic fallback to UDF for complex patterns. Supports character classes (
\d,[A-Z],[0-9]), quantifiers ({n},{m,n}), alternations(A|B|C), alphanumeric sets, and custom character sets. Docs -
Performance Benchmark: 1M records with
ACC[0-9]{8}pattern - UDF mode: ~45s, SQL mode: ~8s (~5-6x faster)
UI & Service Layer Architecture
-
Service Components: New centralized services for connection management, YAML plan/task loading, and Spark DataFrame lifecycle management
-
Multi-Level Caching: In-memory (LRU), file-backed, and resource-specific caching strategies for reduced I/O and faster plan/task lookups
-
HTTP Utilities: Cache headers (ETags, Cache-Control), structured error responses, and request rate limiting
-
Resource Management: Centralized Spark session lifecycle management with proper resource cleanup
Testing Infrastructure
-
Performance Testing: New
performanceTestGradle source set with benchmark tests for data generation and foreign key operations, automated benchmark execution and comparison scripts -
Integration Testing: Enhanced test isolation with per-test temporary directories and unique actor names, JVM forking per test class to prevent state conflicts, improved plan repository and YAML processing end-to-end tests
-
CI/CD Pipeline: Added Gradle integration tests to CI workflow, validation failure checks with error exit codes, separated insta-integration testing into distinct step
-
Comprehensive Coverage: Regex pattern parsing, HTTP utilities, caching, rate limiting, and fast mode data generation
Code Quality
-
Zero Compiler Warnings: Resolved unreachable code and type erasure warnings in both
appandapimodules -
Foreign Key Generation: Improved per-field foreign key handling with accurate record counting and batch processing logic for precise total record generation
Build & Deployment
-
Gradle 9.1.0: Updated wrapper version with dedicated integration and performance test tasks
-
Multi-Stage Dockerfile: Optimized build with Alpine-based runtime for smaller images. Docs
-
CI/CD: Updated GitHub Actions workflow for automated performance testing and benchmark tracking