Deployment
Three main ways to deploy and run Data Caterer:
- Application
- Docker
- Helm
Application
Run the OS native application from downloading the specific OS application here.
Docker
Building Your Own Docker Image
Data Caterer provides a multi-stage Dockerfile that automatically builds your custom Scala/Java project and packages it with the Data Caterer runtime. You can use this as a template for your own projects.
The example Dockerfile demonstrates a two-stage build:
- Build Stage: Uses Gradle to compile your Scala/Java code and create a JAR
- Runtime Stage: Copies the JAR into the Data Caterer base image
Using the Multi-Stage Dockerfile
The multi-stage approach offers several advantages: - No need to pre-build the JAR locally - Consistent build environment across different machines - Optimized Docker layer caching for faster rebuilds - Smaller final image size (build tools not included)
To build your own image:
# Build the Docker image (no pre-build needed)
docker build -t <my_image_name>:<my_image_tag> .
# Run your custom image
docker run -d \
-e PLAN_CLASS=io.github.datacatering.plan.YourPlanClass \
-e DEPLOY_MODE=client \
<my_image_name>:<my_image_tag>
Customizing for Your Project
To adapt this for your own project, create a similar Dockerfile structure:
# Stage 1: Build
FROM gradle:8.11.1-jdk17 AS builder
WORKDIR /build
# Copy build configuration
COPY gradle ./gradle
COPY gradlew gradlew.bat build.gradle.kts settings.gradle.kts* ./
COPY buildSrc ./buildSrc
# Download dependencies (cached layer)
RUN ./gradlew dependencies --no-daemon || true
# Copy source and build
COPY src ./src
RUN ./gradlew clean build --no-daemon
# Stage 2: Runtime
ARG DATA_CATERER_VERSION=0.17.0
FROM datacatering/data-caterer:${DATA_CATERER_VERSION}
# Copy your JAR (adjust path if needed)
COPY --from=builder --chown=app:app /build/build/libs/your-project.jar /opt/app/job.jar
Running with Java/Scala Locally
If you prefer to run locally without Docker, you can build and execute the JAR directly:
# Build the JAR
./gradlew clean build
# Run with Spark submit (requires Spark installation)
spark-submit \
--class io.github.datacatering.plan.YourPlanClass \
--master local[*] \
build/libs/your-project.jar
# Or run via Gradle
./gradlew run --args="YourPlanClass"
Build Configuration Tips
- Gradle Version: The example uses Gradle 8.11.1 with JDK 17. Adjust based on your project needs.
- Dependencies: The
buildSrcdirectory contains custom Gradle build logic - include it if present in your project. - JAR Name: Update the
COPYcommand in Stage 2 to match your JAR name frombuild.gradle.kts. - Data Caterer Version: Change
DATA_CATERER_VERSIONto match your required version.
Docker Pre and Post Processing Scripts
Data Caterer supports running custom scripts before and after the main data generation process when deployed via Docker. This is useful for setup tasks, cleanup operations, notifications, or integrating with external systems.
Configuration
Configure pre and post processing scripts using environment variables:
| Environment Variable | Description | Default |
|---|---|---|
PRE_PROCESSOR_SCRIPT |
Path to script to run before Data Caterer execution | (empty) |
POST_PROCESSOR_SCRIPT |
Path to script to run after Data Caterer execution | (empty) |
POST_PROCESSOR_CONDITION |
When to run post processor: success, failure, always |
success |
Usage Example
docker run -d \
-e PRE_PROCESSOR_SCRIPT="/opt/app/scripts/setup.sh" \
-e POST_PROCESSOR_SCRIPT="/opt/app/scripts/cleanup.sh" \
-e POST_PROCESSOR_CONDITION="always" \
-v /path/to/scripts:/opt/app/scripts \
datacatering/data-caterer:0.17.0
Script Execution Behavior
- Pre-processor: Runs before Data Caterer starts
- If the script fails, Data Caterer execution is stopped
- Script must be executable and return exit code 0 for success
- Post-processor: Runs after Data Caterer completes, based on condition:
success: Only runs if Data Caterer exit code is 0failure: Only runs if Data Caterer exit code is non-zeroalways: Runs regardless of Data Caterer exit code- If post-processor fails, the original Data Caterer exit code is preserved
Error Handling
- Scripts are executed with bash and include comprehensive error logging
- Missing script files generate warnings but don't stop execution
- All script output is logged with clear prefixes (pre/post processor)
- The final exit code is always Data Caterer's original exit code
Helm
Link to sample helm on GitHub here
Update the configuration to your own data connections and configuration or own image created from above.