Custom Transformations
Apply custom logic to generated files after they are written. This "last mile" transformation enables converting to custom formats, applying business-specific transformations, or restructuring data.
Overview
Transformations execute after Data Caterer generates and writes data to files. They operate in two modes:
- Per-Record: Transform each line/record individually
- Whole-File: Transform the entire file as a unit
Per-Record Transformation
Transform each record/line in a file independently. Best for line-by-line modifications like formatting, prefixing, or simple data mapping.
name: "csv_task"
steps:
- name: "accounts"
type: "csv"
options:
path: "/tmp/accounts.csv"
count:
records: 100
fields:
- name: "account_id"
- name: "name"
transformation:
className: "com.example.MyPerRecordTransformer"
methodName: "transformRecord"
mode: "per-record"
options:
prefix: "BANK_"
suffix: "_VERIFIED"
Implementing Per-Record Transformer
Whole-File Transformation
Transform the entire file as a single unit. Best for operations requiring full file context like wrapping JSON objects in arrays, adding headers/footers, or format conversions.
Implementing Whole-File Transformer
package com.example;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Map;
public class JsonArrayWrapperTransformer {
public void transformFile(String inputPath, String outputPath,
Map<String, String> options) throws Exception {
String content = Files.readString(Path.of(inputPath));
String[] lines = content.split("\\n");
boolean minify = Boolean.parseBoolean(
options.getOrDefault("minify", "false")
);
StringBuilder result = new StringBuilder("[");
if (!minify) result.append("\n");
for (int i = 0; i < lines.length; i++) {
if (!minify) result.append(" ");
result.append(lines[i]);
if (i < lines.length - 1) result.append(",");
if (!minify) result.append("\n");
}
if (!minify) result.append("\n");
result.append("]");
Files.writeString(Path.of(outputPath), result.toString());
}
}
package com.example
import java.nio.file.{Files, Paths}
import scala.io.Source
class JsonArrayWrapperTransformer {
def transformFile(inputPath: String, outputPath: String,
options: Map[String, String]): Unit = {
val content = Source.fromFile(inputPath).mkString
val lines = content.split("\n")
val minify = options.getOrElse("minify", "false").toBoolean
val separator = if (minify) "" else "\n"
val indent = if (minify) "" else " "
val wrapped = lines.zipWithIndex.map { case (line, idx) =>
val comma = if (idx < lines.length - 1) "," else ""
s"$indent$line$comma"
}.mkString(separator)
val result = if (minify) {
s"[$wrapped]"
} else {
s"[$separator$wrapped$separator]"
}
Files.writeString(Paths.get(outputPath), result)
}
}
Task-Level Transformation
Apply the same transformation to all steps in a task.
Custom Output Path
Save transformed files to a different location. Optionally delete the original.
Enable/Disable Transformation
Control transformation execution dynamically.
Configuration Reference
| Property | Type | Default | Description |
|---|---|---|---|
className |
String | Required | Fully qualified class name of transformer |
methodName |
String | "transformRecord" (per-record)"transformFile" (whole-file) |
Method to invoke |
mode |
String | "whole-file" |
"per-record" or "whole-file" |
outputPath |
String | Original path | Custom output path (optional) |
deleteOriginal |
Boolean | false |
Delete original after transformation |
options |
Map | {} |
Custom options passed to transformer |
enabled |
Boolean | true |
Enable/disable transformation |
Use Cases
Convert JSON Lines to Array
// Wraps individual JSON objects into a JSON array
.transformationWholeFile("com.example.JsonArrayWrapperTransformer")
Add CSV Header
// Prepend custom header to CSV file
.transformationWholeFile("com.example.CsvHeaderTransformer")
.transformationOptions(Map.of("header", "ID,Name,Amount"))
Format for Legacy Systems
// Convert to fixed-width format
.transformationPerRecord("com.example.FixedWidthFormatter")
.transformationOptions(Map.of(
"id_width", "10",
"name_width", "30",
"amount_width", "15"
))
Encrypt Sensitive Fields
// Apply encryption line-by-line
.transformationPerRecord("com.example.FieldEncryptor")
.transformationOptions(Map.of(
"fields", "ssn,credit_card",
"algorithm", "AES"
))
Deployment with Custom Transformers
To use custom transformers in production, you need to make them available in the Data Caterer classpath.
Docker Deployment
Option 1: Volume Mount Custom JARs
Mount your transformer JAR(s) into the /opt/app/custom directory, which is automatically included in the classpath.
# Build your transformer JAR
./gradlew clean build
# Run Data Caterer with custom transformer
docker run -d \
-v /path/to/your-transformers.jar:/opt/app/custom/transformers.jar \
-v /path/to/config:/opt/DataCaterer/plan \
datacatering/data-caterer:0.17.0
Option 2: Build Custom Docker Image
Create a multi-stage Dockerfile that includes your transformers:
# Stage 1: Build your transformer JAR
FROM gradle:8.11.1-jdk17 AS builder
WORKDIR /build
COPY gradle ./gradle
COPY gradlew gradlew.bat build.gradle.kts settings.gradle.kts ./
COPY buildSrc ./buildSrc
# Download dependencies (cached layer)
RUN ./gradlew dependencies --no-daemon || true
# Copy source and build
COPY src ./src
RUN ./gradlew clean build --no-daemon
# Stage 2: Runtime with Data Caterer
ARG DATA_CATERER_VERSION=0.17.0
FROM datacatering/data-caterer:${DATA_CATERER_VERSION}
# Copy transformer JAR to custom directory
COPY --from=builder --chown=app:app /build/build/libs/transformers.jar /opt/app/custom/transformers.jar
# Optionally copy your plan files
COPY --chown=app:app plan /opt/DataCaterer/plan
Build and run:
Option 3: Extend Base Image
For simple cases, extend the base image directly:
ARG DATA_CATERER_VERSION=0.17.0
FROM datacatering/data-caterer:${DATA_CATERER_VERSION}
# Copy pre-built transformer JAR
COPY --chown=app:app target/my-transformers.jar /opt/app/custom/
# Copy configuration
COPY --chown=app:app config/plan.yaml /opt/DataCaterer/plan/
Kubernetes/Helm Deployment
Using ConfigMaps and PersistentVolumes
apiVersion: v1
kind: ConfigMap
metadata:
name: data-caterer-plan
data:
plan.yaml: |
name: "my_plan"
# ... your plan configuration
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: custom-transformers
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: data-caterer
spec:
template:
spec:
containers:
- name: data-caterer
image: datacatering/data-caterer:0.17.0
volumeMounts:
- name: plan
mountPath: /opt/DataCaterer/plan
- name: custom-jars
mountPath: /opt/app/custom
volumes:
- name: plan
configMap:
name: data-caterer-plan
- name: custom-jars
persistentVolumeClaim:
claimName: custom-transformers
Using InitContainers
Download transformer JARs at runtime:
apiVersion: apps/v1
kind: Deployment
metadata:
name: data-caterer
spec:
template:
spec:
initContainers:
- name: fetch-transformers
image: curlimages/curl:latest
command:
- sh
- -c
- |
curl -L -o /custom/transformers.jar \
https://your-repo.com/transformers.jar
volumeMounts:
- name: custom-jars
mountPath: /custom
containers:
- name: data-caterer
image: datacatering/data-caterer:0.17.0
volumeMounts:
- name: custom-jars
mountPath: /opt/app/custom
volumes:
- name: custom-jars
emptyDir: {}
Local/Application Deployment
When running Data Caterer as a standalone application or via Spark submit:
Add to Classpath
# Using spark-submit
spark-submit \
--class io.github.datacatering.datacaterer.App \
--master local[*] \
--jars /path/to/your-transformers.jar \
data-caterer.jar
# Using java directly
java -cp "data-caterer.jar:/path/to/your-transformers.jar" \
io.github.datacatering.datacaterer.App
Gradle Configuration
Add transformer dependency to your build.gradle.kts:
dependencies {
implementation("io.github.data-catering:data-caterer-api:0.17.0")
implementation(files("libs/your-transformers.jar"))
}
tasks.register<JavaExec>("runWithTransformers") {
mainClass.set("io.github.datacatering.datacaterer.App")
classpath = sourceSets["main"].runtimeClasspath
}
Project Structure for Transformers
Recommended project structure:
my-data-project/
├── build.gradle.kts
├── src/
│ ├── main/
│ │ ├── java/com/mycompany/
│ │ │ └── transformers/
│ │ │ ├── CustomPerRecordTransformer.java
│ │ │ └── CustomWholeFileTransformer.java
│ │ └── scala/com/mycompany/
│ │ └── plans/
│ │ └── MyDataPlan.scala
│ └── test/
│ └── java/com/mycompany/
│ └── transformers/
│ └── TransformerTest.java
├── plan/
│ └── my-plan.yaml
└── Dockerfile
Example build.gradle.kts
plugins {
java
scala
id("com.github.johnrengelman.shadow") version "8.1.1"
}
group = "com.mycompany"
version = "1.0.0"
repositories {
mavenCentral()
}
dependencies {
implementation("io.github.data-catering:data-caterer-api:0.17.0")
implementation("org.scala-lang:scala-library:2.12.18")
testImplementation("junit:junit:4.13.2")
}
tasks.shadowJar {
archiveBaseName.set("my-transformers")
archiveClassifier.set("")
archiveVersion.set(version.toString())
}
Best Practices
- Use per-record for line-by-line operations - More memory efficient
- Use whole-file for structural changes - When context across records is needed
- Keep transformers stateless - For reliability and reusability
- Handle errors gracefully - Log issues without failing the entire transformation
- Make transformers configurable - Use options for flexibility
- Test transformers independently - Unit test transformation logic separately
- Package transformers separately - Keep transformers in their own JAR for reusability
- Use meaningful class names - Makes configuration and debugging easier
- Document transformer options - Clearly specify required and optional parameters
- Version your transformers - Especially when deploying to production
API/Sample Endpoint Support
Transformations are automatically applied when using Data Caterer's REST API sample endpoints. This enables you to preview transformed data through the API.
How It Works
When you configure a transformation on a step, it will be applied when generating samples via:
GET /sample/plans/{planName}- Entire plan with all stepsGET /sample/plans/{planName}/tasks/{taskName}- Specific taskGET /sample/plans/{planName}/tasks/{taskName}/steps/{stepName}- Specific stepGET /sample/tasks/{taskName}- By task nameGET /sample/steps/{stepName}- By step name
Example Usage
1. Configure transformation in your task:
2. Generate sample with transformation applied:
Benefits
✅ Preview transformations - See transformed output before full generation
✅ Test transformers - Validate transformer logic with sample data
✅ API integration - External systems can request transformed samples
✅ Development workflow - Quick feedback during transformer development
Notes
- Transformations are applied to the generated sample data before returning it
- Per-record transformations process each line individually
- Whole-file transformations process the entire sample as a unit
- Transformation options (from configuration) are passed to the transformer
- Same transformer classes work for both full generation and sampling
See the API Documentation for more details on sample endpoints.
Example Transformers
See the example transformers in the repository: