Monitoring

System monitoring for the stack — Prometheus, Grafana and Loki — with everything as code in the repo. There is nothing to click together: scrape config, datasources and dashboards are files under example/monitoring/, provisioned read-only into the containers. Editing a dashboard means editing JSON in a pull request, not exporting from a UI.

docker compose --profile monitoring up -d

Grafana → http://localhost:3001 (admin / admin). The profile is opt-in — day-to-day development doesn't pay for it.

The pieces

Piece	What	As code in
`@repo/metrics`	prom-client registry: process metrics + one HTTP histogram	`packages/metrics/`
`/api/metrics`	the exposition endpoint — 404 unless `METRICS_ENABLED` is set	`example/src/routes/api.metrics.ts`
request middleware	times every HTTP request (pages, server fns, API routes)	`example/src/start.ts`
Prometheus	scrapes app, Postgres (exporter), MinIO, Grafana, Loki, itself	`monitoring/prometheus/prometheus.yml`
Grafana	datasources + dashboard provisioned read-only; phone-home disabled	`monitoring/grafana/`
Loki + Promtail	every container's logs, labeled by compose service	`monitoring/promtail/promtail.yml`

The app side

The same graceful-degradation contract as jobs, email and AI: without METRICS_ENABLED nothing is collected and the endpoint answers 404. With it set, the request middleware feeds one histogram with normalized routes — id-ish path segments collapse to :id so label cardinality stays bounded:

/** Record one finished HTTP request. No-op while metrics are disabled. */
export function observeHttpRequest(opts: {
  method: string;
  pathname: string;
  status: number;
  durationMs: number;
}): void {
  if (!isMetricsEnabled()) return;
  const labels = {
    method: opts.method.toUpperCase(),
    route: normalizeRoute(opts.pathname),
    status: String(opts.status),
  };
  const state = getState();
  state.httpRequests.inc(labels);
  state.httpDuration.observe(labels, opts.durationMs / 1000);
}
 
/** The Prometheus exposition document — what `/api/metrics` returns. */
export async function metricsText(): Promise<{ contentType: string; body: string }> {
  const state = getState();
  return { contentType: state.registry.contentType, body: await state.registry.metrics() };
}

The metrics endpoint is for the internal network. In production, don't route /api/metrics through the public proxy — scrape it from inside.

Dashboards as code

monitoring/grafana/provisioning/dashboards/dashboards.yml points Grafana at the JSON files in monitoring/grafana/dashboards/. The provider sets allowUiUpdates: false: the repo is the source of truth, the UI is a viewer. The bundled Toolkit – Overview dashboard covers:

HTTP request rate by route, p50/p95 latency, 5xx share — from the histogram
app CPU, resident memory, event loop lag — from prom-client's defaults
Postgres backends + transaction rate — from postgres-exporter
MinIO bucket usage, scrape-target health
a live container-log panel — from Loki

Datasources get fixed uids (prometheus, loki) in provisioning, so dashboard JSON can reference them stably across environments.

Air-gapped notes

Grafana's phone-home is off in compose (GF_ANALYTICS_*, news feed). All images are pinned — latest drift is how the local Hatchet engine once broke. Everything is self-hosted; nothing in the profile needs the internet once images are mirrored into an internal registry.

Try it

/sandbox/metrics shows the live exposition the profile scrapes — block spec in e2e/sandbox/metrics.spec.ts. The spec asserts both the process metrics and that the HTTP histogram saw the sign-in requests it just made.