The challenge

What you are building

Business scenario

TaskFlow Inc. needs a collaborative task-management product with team collaboration, task assignment, deadlines, priorities, comments, and analytics. You are the lead DevOps engineer responsible for moving it from local development into a hardened production release.

Your role

You are making architecture and reliability decisions, not following a scripted checklist. You need to build, verify, secure, and iterate like you would on a real project.

System design

Architecture overview

A production-style stack with a reverse proxy, API, frontend, background worker, database, cache, and monitoring services. Requests should enter through ingress, flow through service boundaries with explicit dependencies, and expose enough signals for health, observability, and controlled deployment.

  • nginx (80/443) reverse proxy, SSL termination, static file entry point
  • api (8000) REST API and business logic service
  • frontend (3000) web application for team task workflows
  • worker async jobs for notifications, reporting, and maintenance tasks
  • PostgreSQL (5432) durable datastore and transactional source of truth
  • Redis (6379) cache and background queue broker
  • Prometheus (9090) metrics collection and scrape target
  • Grafana (3001) dashboards and operational visibility
  • Loki (3100) log aggregation and queryable operational stream

Before you start

Prerequisites

Confirm these checkpoints are complete before beginning Phase 1.

  • Module 0: understand containers vs VMs and why containers exist
  • Module 1: docker installed and `docker run hello-world` works
  • Module 2-6: write Dockerfiles, understand layering, caching, multi-stage, .dockerignore
  • Module 7: production image optimization and security-conscious image design
  • Module 8-10: persistence strategies and container networking
  • Module 11: compose orchestration and service wiring
  • Module 12: health checks, logging, and operational debugging
  • Module 13: security practices and image risk management
  • Module 14: scaling, zero-downtime deployment, observability patterns

7 phases

Phase sequence

Complete each phase before advancing. Each phase has explicit success criteria.

  1. Local Development Environment M2M3M4M5M6M8M11

    Run a baseline multi-container TaskFlow stack locally with repeatable startup behavior and end-to-end task management working through the UI.

    Open →
  2. Health Checks and Resilience M12

    Introduce lifecycle awareness, dependency sequencing, and graceful service shutdown behavior.

    Open →
  3. Background Workers and Job Queue M11

    Add asynchronous processing so expensive tasks do not block user-facing flows.

    Open →
  4. Logging and Debugging M12

    Make failures diagnosable by implementing structured JSON logs with request tracing and a queryable log aggregation service.

    Open →
  5. Security Hardening M13

    Turn the stack into a hardened, least-privilege runtime with policy and secrets discipline.

    Open →
  6. Monitoring and Observability M14

    Add dashboards, metrics, and alerting to make production behavior measurable.

    Open →
  7. Production Deployment and CI/CD M14

    Deliver a reproducible production release path with automated validation, rolling updates, and a documented rollback procedure.

    Open →