Rolling Updates and Zero Downtime

LEVEL 0

The Problem

You need to deploy a new version of your app.

Naive approach:

docker compose down  # Stop everything
docker compose up    # Start new version

Your app is down for 30 seconds while containers restart. Users see errors.

You need zero-downtime deployments.

LEVEL 2

The Mechanics — Rolling Update Strategy

The Mechanics

Docker Swarm / Kubernetes approach:

services:
  app:
    image: myapp:v2
    deploy:
      replicas: 6
      update_config:
        parallelism: 2        # Update 2 at a time
        delay: 10s            # Wait 10s between batches
        failure_action: rollback
        monitor: 30s
        max_failure_ratio: 0.3

What happens:

You have 6 replicas running v1
Update starts
Stop 2 replicas running v1
Start 2 replicas running v2
Wait for health checks
If healthy, wait 10s
Stop next 2 v1 replicas
Start 2 more v2 replicas
Repeat until all 6 are v2

At any point, you have 4-6 healthy replicas serving traffic. No downtime.

If update fails:

Health checks fail
Automatic rollback to v1
Alert operators

LEVEL 3

Blue-Green Deployment

Setup:

version: '3.9'

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
  
  blue:
    image: myapp:v1
    deploy:
      replicas: 3
  
  green:
    image: myapp:v2
    deploy:
      replicas: 3

nginx.conf (pointing to blue):

upstream backend {
    server blue:8000;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Deployment process:

Blue (v1) is live, serving traffic
Deploy green (v2) alongside
Test green internally
Update nginx config to point to green
Reload nginx: docker exec nginx nginx -s reload
Green is now live
Monitor for issues
If good, remove blue
If bad, switch back to blue

Instant rollback by switching nginx config back.

LEVEL 4

Canary Deployments

Gradual traffic shift:

upstream backend {
    server blue:8000 weight=90;   # 90% to old version
    server green:8000 weight=10;  # 10% to new version
}

Process:

Deploy v2 to small percentage (10%)
Monitor metrics: errors, latency, user feedback
If good, increase to 25%
If still good, increase to 50%
Then 75%
Finally 100%
Remove old version

If issues detected at any stage, rollback immediately.

LEVEL 5

Database Migrations

The hard part: Schema changes.

Bad approach:

# Stop app
docker compose down

# Run migration
docker compose run app python manage.py migrate

# Start app
docker compose up

Downtime while migration runs.

Good approach: Backward-compatible migrations

Step 1: Add new column (nullable)

ALTER TABLE users ADD COLUMN email VARCHAR(255);

Old code doesn’t know about email, but it’s nullable so old code still works.

Step 2: Deploy new app version

New code uses email column when present, falls back to old behavior if not.

Step 3: Backfill data

UPDATE users SET email = username || '@example.com' WHERE email IS NULL;

Step 4: Make column non-nullable (if needed)

ALTER TABLE users ALTER COLUMN email SET NOT NULL;

Zero downtime because old and new code work with both schemas.