Skip to content

0.17.1

Deployed: 26-10-2025

Latest features and fixes for Data Caterer include major performance optimizations for regex-based data generation, architectural improvements to the UI layer, and enhanced testing infrastructure.

Performance Optimization

  • Intelligent Regex Generation: Automatically converts common regex patterns to pure SQL expressions (no UDFs), with automatic fallback to UDF for complex patterns. Supports character classes (\d, [A-Z], [0-9]), quantifiers ({n}, {m,n}), alternations (A|B|C), alphanumeric sets, and custom character sets. Docs

  • Performance Benchmark: 1M records with ACC[0-9]{8} pattern - UDF mode: ~45s, SQL mode: ~8s (~5-6x faster)

UI & Service Layer Architecture

  • Service Components: New centralized services for connection management, YAML plan/task loading, and Spark DataFrame lifecycle management

  • Multi-Level Caching: In-memory (LRU), file-backed, and resource-specific caching strategies for reduced I/O and faster plan/task lookups

  • HTTP Utilities: Cache headers (ETags, Cache-Control), structured error responses, and request rate limiting

  • Resource Management: Centralized Spark session lifecycle management with proper resource cleanup

Testing Infrastructure

  • Performance Testing: New performanceTest Gradle source set with benchmark tests for data generation and foreign key operations, automated benchmark execution and comparison scripts

  • Integration Testing: Enhanced test isolation with per-test temporary directories and unique actor names, JVM forking per test class to prevent state conflicts, improved plan repository and YAML processing end-to-end tests

  • CI/CD Pipeline: Added Gradle integration tests to CI workflow, validation failure checks with error exit codes, separated insta-integration testing into distinct step

  • Comprehensive Coverage: Regex pattern parsing, HTTP utilities, caching, rate limiting, and fast mode data generation

Code Quality

  • Zero Compiler Warnings: Resolved unreachable code and type erasure warnings in both app and api modules

  • Foreign Key Generation: Improved per-field foreign key handling with accurate record counting and batch processing logic for precise total record generation

Build & Deployment

  • Gradle 9.1.0: Updated wrapper version with dedicated integration and performance test tasks

  • Multi-Stage Dockerfile: Optimized build with Alpine-based runtime for smaller images. Docs

  • CI/CD: Updated GitHub Actions workflow for automated performance testing and benchmark tracking

Documentation

  • Regex Patterns Guide: Comprehensive documentation with standard vs fast mode comparison, supported patterns, performance benchmarks, and configuration examples. Docs

  • Deployment Guide: Enhanced Docker documentation with multi-stage build examples. Docs