Phase 4 of 7

Logging and Debugging

Make failures diagnosable by implementing structured JSON logs with request tracing and a queryable log aggregation service.

Overview

Move all service output from unstructured text to structured JSON with consistent fields: timestamp, level, service, requestId, and message. Add Loki and Promtail to the compose stack so logs from every service are queryable in one place. Implement log-level filtering and strip sensitive values before they reach the log stream. After this phase, you should be able to trace a single HTTP request across the API, worker, and database layers using a shared requestId.

What to build

Deliverables

Advance only when these outputs exist in your code or compose definitions.

  1. Adopt structured JSON logging in the API and worker (use a library like pino or winston)
  2. Include requestId, service name, log level, and timestamp in every log entry
  3. Add Loki (loki:2.9) and Promtail to docker-compose.dev.yml with Promtail scraping all service containers
  4. Implement log level configuration via LOG_LEVEL environment variable (debug, info, warn, error)
  5. Add a log filter that removes any field containing 'password', 'secret', or 'token'
  6. Add a debug script (scripts/debug.sh) that opens a shell, tails logs, or checks resource usage for a named service
  7. Document how to query logs in Loki and how to trace a request by requestId

Done when

Success criteria

These are acceptance indicators, not a checklist to start from.

  • Every API log line is valid JSON with timestamp, level, service, requestId, and message fields
  • Loki is running and Promtail is successfully shipping logs from all containers
  • You can query logs by service name and log level in the Loki HTTP API or UI
  • A requestId set on an incoming API request appears in all related log entries for that request
  • Running docker compose logs api | grep password returns no results
  • Log retention is configured for at least 30 days in the Loki config

Verification

Testing and validation

Run these in order. Confirm each result before moving to the next step.

  1. docker compose -f docker-compose.dev.yml up -d

    — confirm all services including Loki start

  2. for i in {1..10}; do curl -s http://localhost:8000/api/tasks > /dev/null; done

    — generate log traffic

  3. curl -s http://localhost:3100/loki/api/v1/labels

    — expect a JSON response listing available label names including 'service'

  4. curl -G http://localhost:3100/loki/api/v1/query_range --data-urlencode 'query={service="api"}' | jq .data.result[0].values[0]

    — should return a structured JSON log entry

  5. Pick a requestId from a recent API log entry and search for it across all services to confirm propagation

  6. docker compose logs api | grep -i password

    — should return no output

  7. Set LOG_LEVEL=debug in .env, restart the api service, and confirm debug-level entries appear in docker compose logs api