Open Data Contract Standard (ODCS) Source
Creating a data generator for a CSV file based on metadata stored in Open Data Contract Standard (ODCS).
Requirements
- 10 minutes
- Git
- Gradle
- Docker
Get Started
First, we will clone the data-caterer repo which will already have the base project setup required.
Open Data Contract Standard (ODCS) Setup
We will be using the following ODCS file for this example.
Plan Setup
Create a new Java/Scala class or YAML file.
- Java:
src/main/java/io/github/datacatering/plan/MyAdvancedODCSJavaPlanRun.java - Scala:
src/main/scala/io/github/datacatering/plan/MyAdvancedODCSPlanRun.scala - YAML:
docker/data/customer/plan/my-odcs.yaml
Make sure your class extends PlanRun.
In docker/data/custom/plan/my-odcs.yaml:
name: "my_odcs_plan"
description: "Create account data in CSV via ODCS metadata"
tasks:
- name: "csv_account_file"
dataSourceName: "customer_accounts"
In docker/data/custom/application.conf:
- Click on
Advanced Configurationtowards the bottom of the screen - Click on
Flagand click onUnique Check - Click on
Folderand enter/tmp/data-caterer/reportforGenerated Reports Folder Path
We will enable generate plan and tasks so that we can read from external sources for metadata and save the reports under a folder we can easily access.
Schema
We can point the schema of a data source to our Open Data Contract Standard (ODCS) file.
In docker/data/custom/task/file/csv/csv-odcs-account-task.yaml:
- Click on
Connectiontab at the top - Select
ODCSas the data source and enterexample-odcs - Copy this file into
/tmp/odcs/full-example.yaml - Enter
/tmp/odcs/full-example.yamlas theContract File
The above defines that the schema will come from Open Data Contract Standard (ODCS), which is a type of metadata source
that contains information about schemas.
Specifically, it points to the schema provided here
in the docker/mount/odcs folder of data-caterer repo.
Run
Let's try run and see what happens.
It should look something like this.
txn_ref_dt,rcvr_id,rcvr_cntry_code
2023-07-11,PB0Wo dMx,nWlbRGIinpJfP
2024-05-01,5GtkNkHfwuxLKdM,1a
2024-05-01,OxuATCLAUIhHzr,gSxn2ct
2024-05-22,P4qe,y9htWZhyjW
Looks like we have some data now. But we can do better and add some enhancements to it.
Custom metadata
We can see from the data generated, that it isn't quite what we want. Sometimes, the metadata is not sufficient for us to produce production-like data yet, and we want to manually edit it. Let's try to add some enhancements to it.
Let's make the rcvr_id field follow the regex RC[0-9]{8} and the field rcvr_cntry_code should only be one of
either AU, US or TW. For the full guide on data generation options,
check the following page.
In docker/data/custom/task/file/csv/csv-odcs-account-task.yaml:
name: "csv_account_file"
steps:
- name: "accounts"
type: "csv"
options:
path: "/opt/app/data/csv/account-odcs"
metadataSourceType: "openDataContractStandard"
dataContractFile: "/opt/app/mount/odcs/full-example.yaml"
count:
records: 100
fields:
- name: "rcvr_id"
options:
regex: "RC[0-9]{8}"
- name: "rcvr_cntry_code"
options:
oneOf:
- "AU"
- "US"
- "TW"
- Click on
Generationand tick theManualcheckbox - Click on
+ Field- Go to
rcvr_idfield - Click on
+dropdown next tostringdata type - Click
Regexand enterRC[0-9]{8}
- Go to
- Click on
+ Field- Go to
rcvr_cntry_codefield - Click on
+dropdown next tostringdata type - Click
One Ofand enterAU,US,TW
- Go to
Let's test it out by running it again
txn_ref_dt,rcvr_id,rcvr_cntry_code
2024-02-15,RC02579393,US
2023-08-18,RC14320425,AU
2023-07-07,RC17915355,TW
2024-06-07,RC47347046,TW
Great! Now we have the ability to get schema information from an external source, add our own metadata and generate data.
Data validation
To find out what data validation options are available, check this link.
Another aspect of Open Data Contract Standard (ODCS) that can be leveraged is the definition of data quality rules.
Once the latest version of ODCS is released (version 3.x), there should be a vendor neutral definition of data quality
rules that Data Caterer can use. Once available, it will be as easy as enabling data validations
via enableGenerateValidations in configuration.
- Click on
Advanced Configurationtowards the bottom of the screen - Click on
Flagand click onGenerate Validations
Check out the full example under ODCSSourcePlanRun in the example repo.