Monitoring
System monitoring for the stack — Prometheus, Grafana and Loki — with
everything as code in the repo. There is nothing to click together: scrape
config, datasources and dashboards are files under example/monitoring/,
provisioned read-only into the containers. Editing a dashboard means editing
JSON in a pull request, not exporting from a UI.
docker compose --profile monitoring up -dGrafana → http://localhost:3001 (admin / admin). The profile is opt-in —
day-to-day development doesn't pay for it.
The pieces
| Piece | What | As code in |
|---|---|---|
@repo/metrics | prom-client registry: process metrics + one HTTP histogram | packages/metrics/ |
/api/metrics | the exposition endpoint — 404 unless METRICS_ENABLED is set | example/src/routes/api.metrics.ts |
| request middleware | times every HTTP request (pages, server fns, API routes) | example/src/start.ts |
| Prometheus | scrapes app, Postgres (exporter), MinIO, Grafana, Loki, itself | monitoring/prometheus/prometheus.yml |
| Grafana | datasources + dashboard provisioned read-only; phone-home disabled | monitoring/grafana/ |
| Loki + Promtail | every container's logs, labeled by compose service | monitoring/promtail/promtail.yml |
The app side
The same graceful-degradation contract as jobs,
email and AI: without METRICS_ENABLED
nothing is collected and the endpoint answers 404. With it set, the request
middleware feeds one histogram with normalized routes — id-ish path
segments collapse to :id so label cardinality stays bounded:
/** Record one finished HTTP request. No-op while metrics are disabled. */
export function observeHttpRequest(opts: {
method: string;
pathname: string;
status: number;
durationMs: number;
}): void {
if (!isMetricsEnabled()) return;
const labels = {
method: opts.method.toUpperCase(),
route: normalizeRoute(opts.pathname),
status: String(opts.status),
};
const state = getState();
state.httpRequests.inc(labels);
state.httpDuration.observe(labels, opts.durationMs / 1000);
}
/** The Prometheus exposition document — what `/api/metrics` returns. */
export async function metricsText(): Promise<{ contentType: string; body: string }> {
const state = getState();
return { contentType: state.registry.contentType, body: await state.registry.metrics() };
}The metrics endpoint is for the internal network. In production, don't route
/api/metrics through the public proxy — scrape it from inside.
Dashboards as code
monitoring/grafana/provisioning/dashboards/dashboards.yml points Grafana at
the JSON files in monitoring/grafana/dashboards/. The provider sets
allowUiUpdates: false: the repo is the source of truth, the UI is a viewer.
The bundled Toolkit – Overview dashboard covers:
- HTTP request rate by route, p50/p95 latency, 5xx share — from the histogram
- app CPU, resident memory, event loop lag — from prom-client's defaults
- Postgres backends + transaction rate — from postgres-exporter
- MinIO bucket usage, scrape-target health
- a live container-log panel — from Loki
Datasources get fixed uids (prometheus, loki) in provisioning, so
dashboard JSON can reference them stably across environments.
Air-gapped notes
Grafana's phone-home is off in compose (GF_ANALYTICS_*, news feed). All
images are pinned — latest drift is how the local Hatchet engine once
broke. Everything is self-hosted; nothing in the profile needs the internet
once images are mirrored into an internal registry.
Try it
/sandbox/metrics shows the live exposition the profile scrapes — block spec
in e2e/sandbox/metrics.spec.ts. The spec asserts both the process metrics
and that the HTTP histogram saw the sign-in requests it just made.