Skip to content

Delete Generated Data

Info

Deleting generated data is a paid feature.

As you generate and validate data, you may want to clean up the data that has been generated. This helps you:

  1. Keep your test environments clean
  2. Reduce chance of existing data interacting with your validations
  3. Creates a simple workflow for developers and testers to follow

Foreign Keys/Relationships

You can either define a foreign key for data generation (i.e. create same account numbers across accounts and transactions table) or for data deletion (i.e. account numbers generated in Postgres are consumed by a job and pushed into a Parquet file, you can delete the Postgres and Parquet data via the account numbers generated).

Generate

In scenarios where you have defined foreign keys for multiple data sources, when data is generated, Data Caterer will ensure that the values generated in one data source, will be the same in the other. When you want to delete the data, data will be deleted in reverse order of how the data was inserted. This ensures that for data sources, such as Postgres, no errors will occur whilst deleting data.

graph LR
  subgraph plan ["Plan/Scenario"]
    postgresAccount["Generate Postgres accounts"]
    postgresTransaction["Generate Postgres transactions"]
  end

  dataCaterer["Data Caterer"]

  subgraph postgres ["Postgres"]
    subgraph postgresAccTable ["Accounts table"]
      accountA["ACC12345,2024-01-01"] ~~~
      accountB["ACC98765,2024-01-23"] ~~~
      accountC["..."]
    end
    subgraph postgresTxnTable ["Transactions table"]
      accountATxn["ACC12345,10.23"] ~~~
      accountBTxn["ACC98765,93.51"] ~~~
      accountCTxn["ACC98765,5.72"] ~~~
      accountDTxn["..."]
    end
  end

  postgresAccount --> dataCaterer
  postgresTransaction --> dataCaterer
  dataCaterer --> postgresAccTable
  dataCaterer --> postgresTxnTable

Configuration

To define the generated data that should be deleted, follow the below configurations:

var postgresAcc = postgres("my_postgres", "jdbc:...")
  .table("public.accounts")
  .schema(
    field().name("account_id"),
    field().name("name"),
    ...
  );
var postgresTxn = postgres(postgresAcc)
  .table("public.transactions")
  .schema(
    field().name("account_id"),
    field().name("full_name"),
    ...
  );

var deletePlan = plan().addForeignKeyRelationship(
  postgresAcc, "account_id",
  List.of(Map.entry(postgresTxn, "account_id"))
);

var deleteConfig = configuration()
  .enableRecordTracking(true)
  .enableDeleteGeneratedRecords(true)
  .enableGenerateData(false);

execute(deletePlan, deleteConfig, postgresAcc, postgresTxn);
val postgresAcc = postgres("my_postgres", "jdbc:...")
  .table("public.accounts")
  .schema(
    field.name("account_id"),
    field.name("name"),
    ...
  )
val postgresTxn = postgres(postgresAcc)
  .table("public.transactions")
  .schema(
    field.name("account_id"),
    field.name("full_name"),
    ...
  )

val deletePlan = plan.addForeignKeyRelationship(
  postgresAcc, "account_id",
  List(postgresTxn -> "account_id")
)

val deleteConfig = configuration
  .enableRecordTracking(true)
  .enableDeleteGeneratedRecords(true)
  .enableGenerateData(false)

execute(deletePlan, deleteConfig, postgresAcc, postgresTxn)
---
name: "postgres_data"
steps:
  - name: "accounts"
    type: "postgres"
    options:
      dbtable: "account.accounts"
    schema:
      fields:
        - name: "account_id"
        - name: "name"
  - name: "transactions"
    type: "postgres"
    options:
      dbtable: "account.transactions"
    schema:
      fields:
        - name: "account_id"
        - name: "full_name"
---
name: "customer_create_plan"
description: "Create customers in JDBC"
tasks:
  - name: "postgres_data"
    dataSourceName: "my_postgres"

sinkOptions:
  foreignKeys:
    - - "postgres.accounts.account_id"
      - - "postgres.transactions.account_id"
      - []
UI

Generate relationship in UI

Delete

Once you have generated data, you may consume it via a job or service and push that data down into other data sources. You can choose to also delete the data that is pushed down into the other data sources by defining a delete relationship.

graph LR
  subgraph plan ["Plan/Scenario"]
    postgresAccount["Generate Postgres accounts"]
  end

  dataCaterer["Data Caterer"]

  subgraph postgresAccTable ["Postgres accounts table"]
    direction TB
    accountA["ACC12345,2024-01-01"] ~~~
    accountB["ACC98765,2024-01-23"] ~~~
    accountC["..."]
  end

  consumerJob["Consumer"]

  subgraph parquetAcc ["Parquet accounts file"]
    direction TB
    accountParquetA["ACC12345,2024-01-01"] ~~~
    accountParquetB["ACC98765,2024-01-23"] ~~~
    accountParquetC["..."]
  end

  postgresAccount --> dataCaterer
  dataCaterer --> postgresAccTable
  consumerJob --> postgresAccTable
  consumerJob --> parquetAcc

Configuration

We will use the scenario that we generate data for accounts table in Postgres and a job will insert a record into the balances table for each record generated. To define the consumed data that should also be deleted, follow the below example:

var postgresAcc = postgres("my_postgres", "jdbc:...")
  .table("public.accounts")
  .schema(
    field().name("account_id"),
    field().name("name"),
    ...
  );
var postgresBal = postgres(postgresAcc)
  .table("public.balances");

var deletePlan = plan().addForeignKeyRelationship(
  postgresAcc, "account_id",
  List.of(),
  List.of(Map.entry(postgresBal, "account_id"))
);

var deleteConfig = configuration()
  .enableRecordTracking(true)
  .enableDeleteGeneratedRecords(true)
  .enableGenerateData(false);

execute(deletePlan, deleteConfig, postgresAcc);
val postgresAcc = postgres("my_postgres", "jdbc:...")
  .table("public.accounts")
  .schema(
    field.name("account_id"),
    field.name("name"),
    ...
  )
val postgresBal = postgres(postgresAcc)
  .table("public.balances")

val deletePlan = plan.addForeignKeyRelationship(
  postgresAcc, "account_id",
  List(),
  List(postgresBal -> "account_id")
)

val deleteConfig = configuration
  .enableRecordTracking(true)
  .enableDeleteGeneratedRecords(true)
  .enableGenerateData(false)

execute(deletePlan, deleteConfig, postgresAcc)
---
name: "postgres_data"
steps:
  - name: "accounts"
    type: "postgres"
    options:
      dbtable: "account.accounts"
    schema:
      fields:
        - name: "account_id"
        - name: "name"
  - name: "balances"
    type: "postgres"
    options:
      dbtable: "account.balances"
---
name: "customer_create_plan"
description: "Create customers in JDBC"
tasks:
  - name: "postgres_data"
    dataSourceName: "my_postgres"

sinkOptions:
  foreignKeys:
    - - "postgres.accounts.account_id"
      - []
      - - "postgres.balances.account_id"
UI

Delete relationship in UI