Phase 6 of 7
Monitoring and Observability
Add dashboards, metrics, and alerting to make production behavior measurable.
Overview
Extend the stack with telemetry so behavior, latency, queue depth, and system health are visible and actionable.
What to build
Deliverables
Advance only when these outputs exist in your code or compose definitions.
- Add Prometheus metrics endpoints and service instrumentation
- Create Prometheus scrape configuration for core services
- Run Grafana with dashboard structure for latency, queues, DB, and workers
- Define alertmanager routing and sample alert rules
- Optionally wire tracing with Jaeger
Done when
Success criteria
These are acceptance indicators, not a checklist to start from.
- Prometheus successfully scrapes key service metrics
- Grafana reflects live request, latency, and worker data
- Key alerts fire on health and threshold breaches
- Alert notifications route to a configured endpoint
- Historical metric retention includes at least 30 days
Verification
Testing and validation
Run these in order. Confirm each result before moving to the next step.
-
docker compose -f docker-compose.dev.yml up -d`docker compose -f docker-compose.dev.yml up -d`
-
open http://localhost:3001`open http://localhost:3001`
-
open http://localhost:9090and confirm target discovery
-
Generate synthetic load and check dashboard movement
-
docker compose stop apiand confirm alerting behavior
-
Optionally open `http://localhost:16686` if Jaeger is wired