Skip to content

Data Caterer is a metadata-driven data generation and validation tool that aids in creating production-like data across both batch and event data systems. Run data validations to ensure your systems have ingested it as expected, then clean up the data afterwards.

Simplify your data testing

Take away the pain and complexity of your data landscape and let Data Caterer handle it

Try now Demo

Data testing is difficult and fragmented

  • Data being sent via messages, HTTP requests or files and getting stored in databases, file systems, etc.
  • Maintaining and updating tests with the latest schemas and business definitions
  • Different testing tools for services, jobs or data sources
  • Complex relationships between datasets and fields
  • Different scenarios, permutations, combinations and edge cases to cover

Current solutions only cover half the story

  • Specific testing frameworks that support one or limited number of data sources or transport protocols
  • Under utilizing metadata from data catalogs or metadata discovery services
  • Testing teams having difficulties understanding when failures occur
  • Integration tests relying on external teams/services
  • Manually generating data, or worse, copying/masking production data into lower environments
  • Observability pushes towards being reactive rather than proactive

Try now Demo

What you need is a reliable tool that can handle changes to your data landscape

High level overview of Data Caterer High level overview of Data Caterer

With Data Caterer, you get:

  • Ability to connect to any type of data source: files, SQL or no-SQL databases, messaging systems, HTTP APIs
  • Discover metadata from your existing infrastructure and services
  • Gain confidence that bugs do not propagate to production
  • Synthetic data generation that is production-like without ever connecting to production
  • Be proactive in ensuring changes do not affect other data producers or consumers
  • Configurability to run the way you want

Try now Demo

Tech Summary

Use the Java, Scala API, or YAML files to help with setup or customisation that are all run via a Docker image. Want to get into details? Checkout the setup pages here to get code examples and guides that will take you through scenarios and data sources.

Main features include:

  • Metadata discovery
  • Batch and event data generation
  • Maintain referential integrity across any dataset
  • Create custom data generation scenarios
  • Clean up generated data
  • Validate data
  • Suggest data validations

Check other run configurations here.

What is it

  • Data generation and testing tool

    Generate synthetic production-like data to be consumed and validated.

  • Designed for any data source

    We aim to support pushing data to any data source, in any format.

  • Low/no code solution

    Can use the tool via either Scala, Java or YAML. Connect to data or metadata sources to generate data and validate.

  • Developer productivity tool

    If you are a new developer or seasoned veteran, cut down on your feedback loop when developing with data.

What it is not

  • Metadata storage/platform

    You could store and use metadata within the data generation/validation tasks but is not the recommended approach. Rather, this metadata should be gathered from existing services who handle metadata on behalf of Data Caterer.

  • Data contract

    The focus of Data Caterer is on the data generation and testing, which can include details about how the data looks like and how it behaves. But it does not encompass all the additional metadata that comes with a data contract such as SLAs, security, etc.

  • Metrics from load testing

    Although millions of records can be generated, there are limited capabilities in terms of metric capturing.

Try now Demo

Data Catering vs Other tools vs In-house

Data Catering Other tools In-house
Data flow Batch and events generation with validation Batch generation only or validation only Depends on architecture and design
Time to results 1 day 1+ month to integrate, deploy and onboard 1+ month to build and deploy
Solution Connect with your existing data ecosystem, automatic generation and validation Manual UI data entry or via SDK Depends on engineer(s) building it