Docker & CI/CD Best Practices
Docker & CI/CD Best Practices
After three years of containerizing applications and countless hours debugging CI/CD pipelines, I've accumulated a mental checklist of practices that separate smooth deployments from 2 AM production incidents. Let me share the lessons that cost me sleep so you can rest easy.
The Wake-Up Call
It was 11 PM on a Friday when our deployment pipeline broke. The build that took 5 minutes that morning was now taking 45 minutes. Our developers were waiting to push critical bug fixes. I was frantically Googling "why is Docker build so slow." That night taught me that Docker optimization isn't optional—it's essential.
Multi-Stage Builds: The Game Changer
Multi-stage builds transformed our Docker workflow. Before discovering them, our Python application images were 1.2 GB. Afterward? 180 MB. Here's the pattern:
dockerfile# ❌ Bad: Single-stage build (1.2 GB) FROM python:3.11 WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["python", "app.py"] # ✅ Good: Multi-stage build (180 MB) # Stage 1: Build stage with all build dependencies FROM python:3.11-slim as builder WORKDIR /app # Install build dependencies RUN apt-get update && apt-get install -y --no-install-recommends \ gcc \ g++ \ && rm -rf /var/lib/apt/lists/* # Install Python dependencies COPY requirements.txt . RUN pip install --user --no-cache-dir -r requirements.txt # Stage 2: Runtime stage with only necessary files FROM python:3.11-slim WORKDIR /app # Copy only the installed packages from builder COPY --from=builder /root/.local /root/.local COPY . . # Make sure scripts in .local are usable ENV PATH=/root/.local/bin:$PATH CMD ["python", "app.py"]
For Node.js applications, the difference is even more dramatic:
dockerfile# Build stage FROM node:18-alpine as builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production # Runtime stage FROM node:18-alpine WORKDIR /app COPY --from=builder /app/node_modules ./node_modules COPY . . EXPOSE 3000 CMD ["node", "server.js"]
The
node_modulesnpm ciLayer Caching: The Hidden Performance Multiplier
Understanding Docker layer caching changed everything. Docker builds layers from top to bottom, and if a layer hasn't changed, it reuses the cache. The trick is ordering your Dockerfile intelligently.
The Wrong Way
dockerfileFROM python:3.11-slim WORKDIR /app # ❌ This invalidates cache every time code changes COPY . . RUN pip install -r requirements.txt CMD ["python", "app.py"]
Every code change rebuilds dependencies—even though
requirements.txtThe Right Way
dockerfileFROM python:3.11-slim WORKDIR /app # ✅ Copy dependency file first COPY requirements.txt . RUN pip install -r requirements.txt # Copy code last COPY . . CMD ["python", "app.py"]
Now dependencies are cached until
requirements.txtFor more complex projects, be even more strategic:
dockerfileFROM node:18-alpine WORKDIR /app # Layer 1: Package files (changes rarely) COPY package*.json ./ # Layer 2: Dependencies (cached unless package.json changes) RUN npm ci # Layer 3: Source code (changes frequently) COPY src/ ./src/ COPY public/ ./public/ # Layer 4: Build step (only reruns if source changes) RUN npm run build CMD ["npm", "start"]
Think about your layers in order of change frequency: least frequently changing first.
.dockerignore: The Forgotten Performance Booster
I once spent 30 minutes debugging why builds were slow, only to discover we were copying 500 MB of
node_modules.git.dockerignorecode# .dockerignore node_modules npm-debug.log .git .gitignore .env .env.local *.md .vscode .idea __pycache__ *.pyc .pytest_cache .coverage dist build .DS_Store
This reduced our build context from 800 MB to 15 MB. The Docker daemon thanks you for not sending gigabytes of unnecessary files.
GitHub Actions: From Slow to Fast
Our GitHub Actions workflow initially took 15 minutes. Here's how we got it under 4 minutes:
Strategy 1: Dependency Caching
yamlname: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
# Cache Python dependencies
- uses: actions/setup-python@v4
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r requirements-dev.txt
- name: Run tests
run: pytest --cov=. --cov-report=xml
That
cache: 'pip'For Node.js:
yaml- uses: actions/setup-node@v3
with:
node-version: '18'
cache: 'npm'
- run: npm ci
- run: npm test
Strategy 2: Docker Layer Caching in CI
yaml- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: myapp:latest
cache-from: type=registry,ref=myapp:buildcache
cache-to: type=registry,ref=myapp:buildcache,mode=max
This caches Docker layers between builds. Our Docker builds went from 12 minutes to 3 minutes.
Strategy 3: Parallel Jobs
yamljobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.9', '3.10', '3.11']
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- run: pip install -r requirements.txt
- run: pytest
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: flake8 .
- run: black --check .
build:
runs-on: ubuntu-latest
needs: [test, lint]
steps:
- uses: actions/checkout@v3
- run: docker build -t myapp .
Tests, linting, and builds run simultaneously. Total pipeline time reduced by 60%.
Security Scanning: Catching Vulnerabilities Early
Nothing ruins your day like a security audit finding critical vulnerabilities in production. Integrate scanning into CI:
yaml- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:latest'
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
- name: Upload Trivy results to GitHub Security
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
We caught a critical vulnerability in a Python dependency this way—before it hit production.
For even more comprehensive scanning:
yaml- name: Run Snyk to check for vulnerabilities
uses: snyk/actions/python@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high
Docker Compose for Local Development
One frustration I hear constantly: "It works on my machine." Docker Compose solves this:
yamlversion: '3.8'
services:
app:
build: .
ports:
- "8000:8000"
volumes:
- .:/app
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/mydb
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
command: uvicorn app.main:app --host 0.0.0.0 --reload
db:
image: postgres:15
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=mydb
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
postgres_data:
Now every developer runs
docker-compose upProduction Optimization: The Details Matter
1. Health Checks
dockerfileHEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1
This lets orchestrators like Kubernetes detect and restart unhealthy containers.
2. Non-Root User
dockerfile# Create a non-root user RUN adduser --disabled-password --gecos '' appuser # Change ownership of app files RUN chown -R appuser:appuser /app # Switch to non-root user USER appuser CMD ["python", "app.py"]
Running as root is a security risk. This simple change significantly improves your security posture.
3. Environment-Specific Configurations
yaml# docker-compose.prod.yml
version: '3.8'
services:
app:
image: myapp:latest
restart: always
environment:
- NODE_ENV=production
deploy:
replicas: 3
resources:
limits:
cpus: '1'
memory: 512M
The Build-Test-Deploy Pipeline
Here's our complete production pipeline:
yamlname: Production Deployment
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.11'
cache: 'pip'
- run: pip install -r requirements.txt
- run: pytest --cov=. --cov-report=xml
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: docker build -t myapp:scan .
- uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:scan'
severity: 'CRITICAL,HIGH'
exit-code: '1'
build:
needs: [test, security]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- uses: docker/build-push-action@v4
with:
context: .
push: true
tags: myapp:latest,myapp:${{ github.sha }}
cache-from: type=registry,ref=myapp:buildcache
cache-to: type=registry,ref=myapp:buildcache,mode=max
deploy:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to production
run: |
echo "Deploying to Kubernetes/ECS/your platform"
# Your deployment commands here
This pipeline ensures every production deployment is tested, secure, and reproducible.
Lessons from the Trenches
- Optimize for the common case: Most builds are incremental. Cache aggressively.
- Fail fast: Run quick tests first, slow tests later.
- Security is not optional: Scan every image before it reaches production.
- Local-prod parity: Docker Compose should mirror production as closely as possible.
- Monitor your builds: If builds start slowing down, investigate immediately.
After implementing these practices across a dozen projects, our deployment confidence went from "fingers crossed" to "ship it with confidence." Docker and CI/CD done right aren't obstacles—they're accelerators.