High Availability and Scaling

Start reading

LEVEL 0

The Problem

You have one instance of your app running. It’s handling requests fine.

Then:

Traffic spikes 10x during a sale
The container crashes due to a bug
The host machine dies

Your app is down. Users get errors. Revenue is lost.

Single instances are single points of failure.

LEVEL 2

The Mechanics — Running Multiple Replicas

The Mechanics

Docker Compose (simple scaling):

docker compose up --scale app=3

Creates 3 instances of the app service.

Requirements:

No fixed container_name
Don’t map to fixed host ports

services:
  app:
    image: myapp
    # No container_name
    # No ports: "8000:8000"
    expose:
      - "8000"  # Expose to other services, not to host
    deploy:
      replicas: 3

Load balancer in front:

version: '3.9'

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - app
  
  app:
    image: myapp
    deploy:
      replicas: 3
    expose:
      - "8000"

nginx.conf:

upstream app_servers {
    server app:8000;  # Docker DNS load-balances across replicas
}

server {
    listen 80;
    
    location / {
        proxy_pass http://app_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

LEVEL 3

Health Checks and Auto-Recovery

Restart policies:

services:
  app:
    image: myapp
    restart: unless-stopped  # Auto-restart if crashes
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 3s
      retries: 3
    deploy:
      replicas: 3
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3

What happens:

Container crashes
Docker detects exit
Waits 5 seconds
Restarts container
Health check runs
If healthy, mark as ready
If fails 3 times, give up

LEVEL 4

Resource Limits and Autoscaling

Set resource limits:

services:
  app:
    image: myapp
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '1'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M

Prevents:

One container using all CPU
Memory leak crashing the host
Resource starvation

Horizontal Pod Autoscaling (Kubernetes):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Automatically scales from 2 to 10 replicas based on CPU usage.

LEVEL 5

Distributed Systems Considerations

Session management:

With multiple replicas, users might hit different instances. Session data must be shared.

Solutions:

Stateless apps: Store session in JWT token or client-side cookie
Shared session store: Redis, Memcached
Sticky sessions: Load balancer sends same user to same instance (not recommended)

# Shared session store
from flask import Flask
from flask_session import Session
import redis

app = Flask(__name__)
app.config['SESSION_TYPE'] = 'redis'
app.config['SESSION_REDIS'] = redis.from_url('redis://cache:6379')
Session(app)

Database connection pooling:

Each replica needs DB connections. Pool them:

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

engine = create_engine(
    'postgresql://db:5432/myapp',
    poolclass=QueuePool,
    pool_size=10,
    max_overflow=20
)