Containerizing ML Models with Docker

“It Works on My Machine”

You spend a week tuning a churn prediction model with PyTorch 2.0 and CUDA 11.8. Your results are excellent. You hand the code to the deployment team. They try to run it on the production server: Python 3.9 (you used 3.11), PyTorch 1.13 (you used 2.0), and no CUDA at all. Four hours of debugging later, they give up.

This is the dependency hell problem. ML is worse than regular software because:

ML models have deep dependency chains — PyTorch, CUDA, cuDNN, numpy, scikit-learn, all version-sensitive
GPU and CUDA versions must exactly match — a CUDA 11.8 model won’t load under CUDA 12.0 without recompilation
Preprocessing must mirror training — if you preprocess with pandas 2.0 during training and pandas 1.3 during serving, the behavior may differ subtly

Docker solves this by packaging your code, Python runtime, and all dependencies into a single image. The image runs identically on a developer laptop, a CI runner, a staging server, and a production Kubernetes cluster.

Docker Concepts in 5 Minutes

Image: A static snapshot of a filesystem. Think of it as a zip file that contains Python 3.11, your dependencies, and your code. Immutable.

Container: A running instance of an image. You can run 10 containers from the same image simultaneously.

Dockerfile: A script that describes how to build an image. Each instruction is a layer.

Registry: A server that stores and distributes images. Docker Hub is public. AWS ECR, Google Artifact Registry, and GitHub Container Registry are common private options.

The workflow:

Dockerfile  →  docker build  →  Image  →  docker run  →  Container
                                  ↓
                              docker push  →  Registry
                                                  ↓
                                          Production server pulls and runs

A Complete Dockerfile for an ML API

Let’s build a production-quality Dockerfile for the churn prediction FastAPI service from the next lesson. We’ll use the model trained in the CI pipeline.

# Dockerfile

# ─── Stage 1: Build dependencies ─────────────────────────────────────────────
FROM python:3.11-slim AS builder

# Prevent Python from writing .pyc files and from buffering stdout/stderr
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

# Install system dependencies needed to compile some Python packages
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /build

# Copy requirements first — this layer is cached if requirements don't change
COPY requirements.txt .
RUN pip install --upgrade pip \
    && pip install --no-cache-dir --prefix=/install -r requirements.txt

# ─── Stage 2: Runtime image ───────────────────────────────────────────────────
FROM python:3.11-slim AS runtime

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# Ensures Python finds the installed packages
ENV PYTHONPATH=/app

# Install only runtime system libraries (not compilers)
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgomp1 \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Create a non-root user for security
RUN useradd --create-home --shell /bin/bash appuser
WORKDIR /app
RUN chown appuser:appuser /app

# Copy installed packages from builder stage
COPY --from=builder /install /usr/local

# Copy application source code
COPY --chown=appuser:appuser src/ ./src/
COPY --chown=appuser:appuser models/ ./models/

# Switch to non-root user
USER appuser

# Expose the port FastAPI will run on
EXPOSE 8000

# Health check — Docker and Kubernetes use this to know if the container is ready
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Start the FastAPI server
CMD ["uvicorn", "src.api.app:app", "--host", "0.0.0.0", "--port", "8000"]

Why Layer Order Matters

The most important performance optimization in Docker: put things that change rarely near the top, things that change often near the bottom.

Docker builds images layer by layer. When you rebuild, it reuses cached layers until it hits the first changed layer, then rebuilds everything after that.

Wrong order (slow builds):

COPY . .                   # Layer 3: copies all source files
RUN pip install -r requirements.txt  # Layer 4: reinstalls every package EVERY time

Every time you change one line of code, Docker reinstalls all your packages. A fresh pip install of torch takes 3-4 minutes.

Right order (fast builds):

COPY requirements.txt .    # Layer 3: only requirements.txt
RUN pip install -r requirements.txt  # Layer 4: cached unless requirements.txt changes
COPY src/ .                # Layer 5: source code (changes often, that's fine)

Now, when you change src/api/app.py, Docker reuses the cached pip install layer and only rebuilds the final layer. Build time drops from 4 minutes to 10 seconds.

The requirements.txt for Production

Don’t just pip freeze your dev environment — it includes every package you ever installed, including dev tools. Create a clean production requirements file:

# requirements.txt
fastapi==0.111.0
uvicorn[standard]==0.29.0
pydantic==2.7.1
scikit-learn==1.5.0
numpy==1.26.4
pandas==2.2.2
mlflow==2.13.0
joblib==1.4.2

Pin exact versions. In production, scikit-learn>=1.0 is a liability — you don’t know what version will be installed next month when a new release drops.

Building and Running the Container

# Build the image
# -t tags the image with a name and version
docker build -t churn-predictor:1.0.0 .

# See the image
docker images

# Run the container locally
# -p maps host port 8080 to container port 8000
# --rm removes the container when it stops
docker run -p 8080:8000 --rm churn-predictor:1.0.0

# The API is now available at http://localhost:8080

# Test it
curl http://localhost:8080/health
curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"tenure_months": 12, "monthly_charges": 65.0, "total_charges": 780.0, "num_products": 2, "has_support_calls": 1}'

Environment Variables and Secrets

Never bake secrets into Docker images. Use environment variables instead:

# Dockerfile — reference env vars, don't set them
ENV MODEL_PATH=/app/models/churn_model.pkl
ENV LOG_LEVEL=INFO
# Never: ENV DATABASE_PASSWORD=supersecret

Pass secrets at runtime:

docker run \
  -p 8080:8000 \
  -e MLFLOW_TRACKING_URI=http://mlflow-server:5000 \
  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
  churn-predictor:1.0.0

Or use a .env file (never commit it to Git):

docker run --env-file .env -p 8080:8000 churn-predictor:1.0.0

The .dockerignore File

Like .gitignore but for Docker. Without it, COPY . . sends your entire project context to Docker, including data/ (gigabytes of training data), .git/ (version history), and virtual environments.

# .dockerignore
.git/
.github/
.dvc/
data/
notebooks/
tests/
*.pyc
__pycache__/
.env
.env.*
venv/
.venv/
*.egg-info/
dist/
build/
mlruns/
.pytest_cache/

This reduces your Docker build context from potentially gigabytes to just a few megabytes. It also prevents secrets in .env files from accidentally being baked into the image.

Multi-Stage Builds for Smaller Images

The builder and runtime stages in our Dockerfile above are a multi-stage build. The key benefit: the final image only contains what’s needed to run, not the compilers and build tools used to compile packages.

Compare:

Single-stage image with compilers: ~1.8 GB
Multi-stage image (runtime only): ~450 MB

Smaller images mean faster pulls in CI, faster deployments, and smaller attack surface.

To verify: docker images | grep churn-predictor

Handling GPU Models

If your model uses PyTorch with CUDA, use the CUDA base image:

# For GPU-accelerated models
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 AS runtime

RUN apt-get update && apt-get install -y python3.11 python3-pip

# Install PyTorch with the matching CUDA version
RUN pip install torch==2.0.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

This ensures the CUDA version in the container exactly matches what you trained with. The container won’t run on hosts without an NVIDIA GPU and the NVIDIA container runtime, but it will run identically on any host that has them.

Pushing to a Registry

# Tag the image for your registry
docker tag churn-predictor:1.0.0 ghcr.io/your-org/churn-predictor:1.0.0

# Authenticate
echo $GITHUB_TOKEN | docker login ghcr.io -u your-username --password-stdin

# Push
docker push ghcr.io/your-org/churn-predictor:1.0.0

Add this to your GitHub Actions CI pipeline after the quality gates pass:

      - name: Build and push Docker image
        if: github.ref == 'refs/heads/main'
        run: |
          docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} .
          docker push ghcr.io/${{ github.repository }}:${{ github.sha }}

Now every passing CI run produces a versioned Docker image in the registry. The SHA tag means you can trace any running container back to the exact commit that produced it.

Summary

Docker solves the “it works on my machine” problem by packaging your model, code, and dependencies into a portable, reproducible image. The key practices:

Use multi-stage builds to keep images small
Copy requirements.txt before source code to maximize layer caching
Use a non-root user for security
Use .dockerignore to exclude data, notebooks, and secrets
Pin exact dependency versions in requirements.txt
Pass secrets as environment variables, never bake them into the image

With your model containerized, the next step is building the FastAPI application that lives inside that container and serves predictions.

Course Content

“It Works on My Machine”

Docker Concepts in 5 Minutes

A Complete Dockerfile for an ML API

Why Layer Order Matters

The requirements.txt for Production

Building and Running the Container

Environment Variables and Secrets

The .dockerignore File

Multi-Stage Builds for Smaller Images

Handling GPU Models

Pushing to a Registry

Summary

🍪 Cookie Notice

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

Functionality Cookies