🎓 Complete all tutorials to earn your Free MLOps Engineer Certificate
Shareable on LinkedIn • Verified by AITutorials.site • No signup fee
🐳 Why Containerization Matters
Your ML API works perfectly on your laptop. But when you deploy it to a server, everything breaks. Different Python version. Missing libraries. Incompatible CUDA drivers. Environment variables not set. It's the classic "works on my machine" nightmare.
Docker solves this problem by packaging your application, dependencies, and environment into a single, portable container that runs identically everywhere - on your laptop, production servers, or in the cloud.
⚠️ Problems Docker Solves:
- "Works on my machine" - different environments cause failures
- Dependency conflicts between projects
- Complex setup instructions for team members
- Inconsistent production environments
- Difficult to scale and replicate services
- No isolation between applications on same server
💡 Real-World Impact: Netflix runs 100,000+ containers. Uber migrated 4,000 microservices to containers. Google starts 2 billion containers per week. Docker is the foundation of modern ML deployment.
🏗️ Docker Fundamentals
Key Concepts
Image
Read-only template with application code, dependencies, and OS. Like a snapshot or class definition.
Container
Running instance of an image. Isolated process with its own filesystem, network, and resources.
Dockerfile
Text file with instructions to build a Docker image. Defines base image, dependencies, and commands.
Registry
Repository for storing and distributing images. Docker Hub is the public registry.
Installation
# macOS (using Homebrew)
brew install --cask docker
# Or download Docker Desktop from docker.com
# Verify installation
docker --version
docker run hello-world
# Check if Docker daemon is running
docker ps
Essential Docker Commands
# Build an image
docker build -t myapp:v1 .
# Run a container
docker run -p 8000:8000 myapp:v1
# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# List images
docker images
# Stop a container
docker stop container_id
# Remove container
docker rm container_id
# Remove image
docker rmi image_id
# View logs
docker logs container_id
# Execute command in running container
docker exec -it container_id bash
# View container resource usage
docker stats
📝 Your First ML Dockerfile
Simple FastAPI ML Service
# Dockerfile
# Use official Python runtime as base image
FROM python:3.10-slim
# Set working directory in container
WORKDIR /app
# Copy requirements file
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port 8000
EXPOSE 8000
# Command to run the application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Project Structure
ml-api/
├── Dockerfile
├── requirements.txt
├── app.py
└── models/
└── model.joblib
requirements.txt
fastapi==0.104.1
uvicorn[standard]==0.24.0
scikit-learn==1.3.2
joblib==1.3.2
numpy==1.26.2
pydantic==2.5.0
Building and Running
# Build the image
docker build -t ml-api:v1 .
# Run the container
docker run -d \
--name ml-api-container \
-p 8000:8000 \
ml-api:v1
# Test the API
curl http://localhost:8000/health
# View logs
docker logs ml-api-container
# Stop and remove
docker stop ml-api-container
docker rm ml-api-container
✅ What Just Happened: You packaged your ML API into a Docker image. Now anyone can run your exact environment with a single command - no Python installation, no pip install, no configuration needed!
🏗️ Multi-Stage Builds for Smaller Images
Multi-stage builds allow you to create leaner production images by separating build-time dependencies from runtime dependencies.
Problem: Large Image Sizes
# ❌ Simple Dockerfile - Results in 1.5GB+ image
FROM python:3.10
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
# Includes build tools, compilers, dev headers
# All unused in production!
Solution: Multi-Stage Build
# ✅ Multi-stage Dockerfile - Results in ~500MB image
# Stage 1: Build environment
FROM python:3.10 as builder
WORKDIR /app
# Install build dependencies
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# Stage 2: Production environment
FROM python:3.10-slim
WORKDIR /app
# Copy only the installed packages from builder
COPY --from=builder /root/.local /root/.local
# Copy application code
COPY app.py .
COPY models/ models/
# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH
# Non-root user for security
RUN useradd -m -u 1000 appuser && \
chown -R appuser:appuser /app
USER appuser
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Advanced: Compile NumPy/SciPy in Build Stage
# Multi-stage for ML with compiled dependencies
# Build stage
FROM python:3.10 as builder
# Install system dependencies for building
RUN apt-get update && apt-get install -y \
gcc \
g++ \
gfortran \
libopenblas-dev \
liblapack-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /build
COPY requirements.txt .
# Build wheels for expensive packages
RUN pip wheel --no-cache-dir --wheel-dir /wheels \
numpy==1.26.2 \
scipy==1.11.4 \
scikit-learn==1.3.2
RUN pip wheel --no-cache-dir --wheel-dir /wheels \
-r requirements.txt
# Runtime stage
FROM python:3.10-slim
# Install only runtime dependencies
RUN apt-get update && apt-get install -y \
libopenblas0 \
libgomp1 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy pre-built wheels
COPY --from=builder /wheels /wheels
# Install from wheels (much faster!)
RUN pip install --no-cache-dir /wheels/*.whl && \
rm -rf /wheels
# Copy application
COPY app.py .
COPY models/ models/
# Security: non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Image Size Comparison
| Approach | Image Size | Build Time |
|---|---|---|
| Simple (python:3.10) | 1.8 GB | 5 min |
| Slim base (python:3.10-slim) | 800 MB | 6 min |
| ✅ Multi-stage build | 500 MB | 7 min (first), 2 min (cached) |
| Alpine-based (advanced) | 300 MB | 15 min (compilation) |
🎼 Docker Compose for Multi-Container Apps
Real ML systems often need multiple services: API server, Redis cache, PostgreSQL database, monitoring tools. Docker Compose orchestrates them all.
Complete ML Stack with Docker Compose
# docker-compose.yml
version: '3.8'
services:
# ML API Service
api:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- REDIS_URL=redis://redis:6379
- POSTGRES_URL=postgresql://user:pass@postgres:5432/mlops
depends_on:
- redis
- postgres
volumes:
- ./models:/app/models
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
# Redis for caching predictions
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
restart: unless-stopped
# PostgreSQL for logging predictions
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=mlops
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
restart: unless-stopped
# Prometheus for metrics
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
restart: unless-stopped
# Grafana for dashboards
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- prometheus
restart: unless-stopped
volumes:
redis-data:
postgres-data:
prometheus-data:
grafana-data:
Enhanced API with Redis Caching
# app.py with Redis caching
from fastapi import FastAPI
import redis
import json
import hashlib
app = FastAPI()
# Connect to Redis
redis_client = redis.from_url("redis://redis:6379", decode_responses=True)
@app.post("/predict")
def predict_with_cache(request: PredictionRequest):
# Create cache key from input
input_str = json.dumps(request.dict(), sort_keys=True)
cache_key = f"pred:{hashlib.md5(input_str.encode()).hexdigest()}"
# Check cache
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# Make prediction
features = np.array([[
request.sepal_length,
request.sepal_width,
request.petal_length,
request.petal_width
]])
prediction = model.predict(features)[0]
probabilities = model.predict_proba(features)[0]
result = {
"prediction": int(prediction),
"confidence": float(max(probabilities)),
"cached": False
}
# Cache for 1 hour
redis_client.setex(cache_key, 3600, json.dumps(result))
return result
Docker Compose Commands
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f api
# Scale API instances
docker-compose up -d --scale api=3
# Stop all services
docker-compose down
# Stop and remove volumes (data)
docker-compose down -v
# Rebuild images
docker-compose up -d --build
# Check service status
docker-compose ps
💡 Benefits: With one command (docker-compose up), you start: API server, Redis cache, PostgreSQL database, Prometheus monitoring, and Grafana dashboards. Perfect for local development matching production!
🎮 GPU Support for Deep Learning
Running deep learning models in Docker requires GPU access for acceptable performance.
Prerequisites
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# Test GPU access
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Dockerfile for PyTorch with GPU
# GPU-enabled PyTorch Dockerfile
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
# Install Python
RUN apt-get update && apt-get install -y \
python3.10 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install PyTorch with CUDA support
RUN pip3 install --no-cache-dir \
torch==2.1.0 \
torchvision==0.16.0 \
--index-url https://download.pytorch.org/whl/cu118
# Install other dependencies
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .
# Test GPU on startup
RUN python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Running with GPU
# Run container with GPU access
docker run -d \
--name ml-gpu \
--gpus all \
-p 8000:8000 \
ml-api-gpu:v1
# Specify specific GPU
docker run -d \
--gpus '"device=0"' \
ml-api-gpu:v1
# Limit GPU memory
docker run -d \
--gpus all \
--memory=8g \
ml-api-gpu:v1
# Check GPU usage
docker exec ml-gpu nvidia-smi
Docker Compose with GPU
# docker-compose.yml with GPU
version: '3.8'
services:
ml-gpu-api:
build: .
ports:
- "8000:8000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
⚠️ GPU Image Sizes: CUDA images are large (5-10GB). Use cudnn-runtime instead of cudnn-devel (includes compilers) to save ~3GB. Only include CUDA if you actually need GPU inference.
⚡ Image Optimization Strategies
1. Layer Caching
# ❌ Poor caching - requirements change = rebuild everything
FROM python:3.10-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
# ✅ Good caching - code changes don't rebuild deps
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
2. Minimize Layers
# ❌ Many layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN apt-get clean
# ✅ Single layer
RUN apt-get update && apt-get install -y \
curl \
wget \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
3. .dockerignore File
# .dockerignore - exclude from image
__pycache__/
*.pyc
*.pyo
*.pyd
.git/
.gitignore
.vscode/
.idea/
*.md
tests/
docs/
.pytest_cache/
*.log
.env
venv/
.DS_Store
notebooks/
*.ipynb
4. Use Specific Base Images
| Base Image | Size | Use Case |
|---|---|---|
| python:3.10 | 920 MB | Development, includes build tools |
| ✅ python:3.10-slim | 130 MB | Production, minimal packages |
| python:3.10-alpine | 50 MB | Smallest, compilation issues common |
| nvidia/cuda:11.8-runtime | 1.5 GB | GPU inference only |
| nvidia/cuda:11.8-devel | 4.5 GB | GPU with compilation tools |
5. Remove Package Managers Cache
RUN pip install --no-cache-dir -r requirements.txt
RUN apt-get update && apt-get install -y package \
&& rm -rf /var/lib/apt/lists/*
6. Optimize Model Files
# Compress models before adding to image
import joblib
# Standard save (large)
joblib.dump(model, 'model.joblib')
# Compressed save (smaller)
joblib.dump(model, 'model.joblib', compress=3)
# Or use pickle protocol 5 for large arrays
joblib.dump(model, 'model.joblib', protocol=5)
Image Size Audit
# Analyze image layers
docker history ml-api:v1
# See which layers are largest
docker history ml-api:v1 --human --no-trunc
# Use dive for interactive analysis
brew install dive
dive ml-api:v1
✅ Docker Best Practices for ML
1. Security
# Run as non-root user
RUN useradd -m -u 1000 appuser && \
chown -R appuser:appuser /app
USER appuser
# Don't include secrets
# ❌ Never: COPY .env .
# ✅ Use: docker run -e API_KEY=$API_KEY ...
# Scan for vulnerabilities
# docker scan ml-api:v1
2. Health Checks
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
3. Logging
# Log to stdout (Docker captures it)
import logging
import sys
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[logging.StreamHandler(sys.stdout)]
)
logger = logging.getLogger(__name__)
4. Environment Variables
# Set defaults
ENV MODEL_PATH=/app/models/model.joblib \
LOG_LEVEL=INFO \
WORKERS=4
# Override at runtime
# docker run -e LOG_LEVEL=DEBUG ml-api:v1
5. Model Versioning
# Tag images with model version
docker build -t ml-api:v1.0-model-rf-2023-12 .
# Use semantic versioning
docker build -t ml-api:1.0.0 .
docker tag ml-api:1.0.0 ml-api:latest
# Include metadata
docker build \
--label "model.version=1.0" \
--label "model.type=random-forest" \
--label "training.date=2023-12-10" \
-t ml-api:v1 .
6. Volume Mounts for Development
# Mount code for live reloading (dev only)
docker run -d \
-v $(pwd):/app \
-p 8000:8000 \
ml-api:v1
# Mount model directory (swap models without rebuild)
docker run -d \
-v $(pwd)/models:/app/models:ro \
-p 8000:8000 \
ml-api:v1
7. CI/CD Integration
# .github/workflows/docker-build.yml
name: Build and Push Docker Image
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: |
myorg/ml-api:latest
myorg/ml-api:${{ github.sha }}
cache-from: type=registry,ref=myorg/ml-api:latest
cache-to: type=inline
🎯 Summary
You've mastered Docker for ML deployment:
Docker Basics
Images, containers, Dockerfiles, and essential commands
Multi-Stage Builds
Reduce image sizes from 1.8GB to 500MB
Docker Compose
Orchestrate multi-service ML stacks
GPU Support
Run deep learning models with CUDA
Optimization
Layer caching, minimal images, security
Best Practices
Production-ready containerization
Key Takeaways
- Docker solves "works on my machine" by packaging everything
- Use multi-stage builds to minimize image size
- Docker Compose orchestrates complex ML stacks
- GPU support requires nvidia-container-toolkit
- Optimize with layer caching and .dockerignore
- Always run as non-root user for security
- Tag images with model versions for traceability
🚀 Next Steps:
Your ML service is containerized! Next, you'll learn cloud deployment - taking your Docker containers to AWS, Google Cloud, and Azure for production-scale serving.
Test Your Knowledge
Q1: What's the main benefit of Docker for ML deployment?
Q2: What's the purpose of multi-stage builds?
Q3: Why should you copy requirements.txt before copying application code?
Q4: What does Docker Compose help with?
Q5: For GPU support in Docker, what do you need?