Rolling Updates and Zero Downtime
LEVEL 0
The Problem
You need to deploy a new version of your app.
Naive approach:
docker compose down # Stop everything
docker compose up # Start new version
Your app is down for 30 seconds while containers restart. Users see errors.
You need zero-downtime deployments.
LEVEL 1
The Concept — The Bridge Replacement
The Concept
Imagine replacing a bridge while keeping traffic flowing.
Bad approach: Close bridge, demolish, rebuild, reopen. Traffic stops for months.
Good approach:
- Build new bridge next to old one
- Redirect traffic to new bridge
- Verify new bridge works
- Demolish old bridge
Rolling updates are gradual bridge replacement.
Old version keeps running while new version starts. Traffic gradually shifts. No downtime.
LEVEL 2
The Mechanics — Rolling Update Strategy
The Mechanics
Docker Swarm / Kubernetes approach:
services:
app:
image: myapp:v2
deploy:
replicas: 6
update_config:
parallelism: 2 # Update 2 at a time
delay: 10s # Wait 10s between batches
failure_action: rollback
monitor: 30s
max_failure_ratio: 0.3
What happens:
- You have 6 replicas running v1
- Update starts
- Stop 2 replicas running v1
- Start 2 replicas running v2
- Wait for health checks
- If healthy, wait 10s
- Stop next 2 v1 replicas
- Start 2 more v2 replicas
- Repeat until all 6 are v2
At any point, you have 4-6 healthy replicas serving traffic. No downtime.
If update fails:
- Health checks fail
- Automatic rollback to v1
- Alert operators
LEVEL 3
Blue-Green Deployment
Setup:
version: '3.9'
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
blue:
image: myapp:v1
deploy:
replicas: 3
green:
image: myapp:v2
deploy:
replicas: 3
nginx.conf (pointing to blue):
upstream backend {
server blue:8000;
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
Deployment process:
- Blue (v1) is live, serving traffic
- Deploy green (v2) alongside
- Test green internally
- Update nginx config to point to green
- Reload nginx:
docker exec nginx nginx -s reload - Green is now live
- Monitor for issues
- If good, remove blue
- If bad, switch back to blue
Instant rollback by switching nginx config back.
LEVEL 4
Canary Deployments
Gradual traffic shift:
upstream backend {
server blue:8000 weight=90; # 90% to old version
server green:8000 weight=10; # 10% to new version
}
Process:
- Deploy v2 to small percentage (10%)
- Monitor metrics: errors, latency, user feedback
- If good, increase to 25%
- If still good, increase to 50%
- Then 75%
- Finally 100%
- Remove old version
If issues detected at any stage, rollback immediately.
LEVEL 5
Database Migrations
The hard part: Schema changes.
Bad approach:
# Stop app
docker compose down
# Run migration
docker compose run app python manage.py migrate
# Start app
docker compose up
Downtime while migration runs.
Good approach: Backward-compatible migrations
Step 1: Add new column (nullable)
ALTER TABLE users ADD COLUMN email VARCHAR(255);
Old code doesn’t know about email, but it’s nullable so old code still works.
Step 2: Deploy new app version
New code uses email column when present, falls back to old behavior if not.
Step 3: Backfill data
UPDATE users SET email = username || '@example.com' WHERE email IS NULL;
Step 4: Make column non-nullable (if needed)
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
Zero downtime because old and new code work with both schemas.