High Availability and Scaling
LEVEL 0
The Problem
You have one instance of your app running. It’s handling requests fine.
Then:
- Traffic spikes 10x during a sale
- The container crashes due to a bug
- The host machine dies
Your app is down. Users get errors. Revenue is lost.
Single instances are single points of failure.
LEVEL 1
The Concept — The Restaurant Analogy
The Concept
Imagine a restaurant.
Single chef (no HA):
- One chef cooks all orders
- If chef gets sick, restaurant closes
- During dinner rush, orders pile up, customers wait 2 hours
Multiple chefs (HA + scaling):
- Three chefs share the work
- If one chef gets sick, others cover
- During rush, hire temporary chefs
- Orders processed faster
High availability = Multiple instances, automatic failover
Scaling = Adjust capacity based on load
LEVEL 2
The Mechanics — Running Multiple Replicas
The Mechanics
Docker Compose (simple scaling):
docker compose up --scale app=3
Creates 3 instances of the app service.
Requirements:
- No fixed
container_name - Don’t map to fixed host ports
services:
app:
image: myapp
# No container_name
# No ports: "8000:8000"
expose:
- "8000" # Expose to other services, not to host
deploy:
replicas: 3
Load balancer in front:
version: '3.9'
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- app
app:
image: myapp
deploy:
replicas: 3
expose:
- "8000"
nginx.conf:
upstream app_servers {
server app:8000; # Docker DNS load-balances across replicas
}
server {
listen 80;
location / {
proxy_pass http://app_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
LEVEL 3
Health Checks and Auto-Recovery
Restart policies:
services:
app:
image: myapp
restart: unless-stopped # Auto-restart if crashes
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 3s
retries: 3
deploy:
replicas: 3
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
What happens:
- Container crashes
- Docker detects exit
- Waits 5 seconds
- Restarts container
- Health check runs
- If healthy, mark as ready
- If fails 3 times, give up
LEVEL 4
Resource Limits and Autoscaling
Set resource limits:
services:
app:
image: myapp
deploy:
replicas: 3
resources:
limits:
cpus: '1'
memory: 512M
reservations:
cpus: '0.5'
memory: 256M
Prevents:
- One container using all CPU
- Memory leak crashing the host
- Resource starvation
Horizontal Pod Autoscaling (Kubernetes):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Automatically scales from 2 to 10 replicas based on CPU usage.
LEVEL 5
Distributed Systems Considerations
Session management:
With multiple replicas, users might hit different instances. Session data must be shared.
Solutions:
- Stateless apps: Store session in JWT token or client-side cookie
- Shared session store: Redis, Memcached
- Sticky sessions: Load balancer sends same user to same instance (not recommended)
# Shared session store
from flask import Flask
from flask_session import Session
import redis
app = Flask(__name__)
app.config['SESSION_TYPE'] = 'redis'
app.config['SESSION_REDIS'] = redis.from_url('redis://cache:6379')
Session(app)
Database connection pooling:
Each replica needs DB connections. Pool them:
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
'postgresql://db:5432/myapp',
poolclass=QueuePool,
pool_size=10,
max_overflow=20
)