Skip to content
Tikab's Toolkit

Monitoring

System monitoring for the stack — Prometheus, Grafana and Loki — with everything as code in the repo. There is nothing to click together: scrape config, datasources and dashboards are files under example/monitoring/, provisioned read-only into the containers. Editing a dashboard means editing JSON in a pull request, not exporting from a UI.

docker compose --profile monitoring up -d

Grafana → http://localhost:3001 (admin / admin). The profile is opt-in — day-to-day development doesn't pay for it.

The pieces

PieceWhatAs code in
@repo/metricsprom-client registry: process metrics + one HTTP histogrampackages/metrics/
/api/metricsthe exposition endpoint — 404 unless METRICS_ENABLED is setexample/src/routes/api.metrics.ts
request middlewaretimes every HTTP request (pages, server fns, API routes)example/src/start.ts
Prometheusscrapes app, Postgres (exporter), MinIO, Grafana, Loki, itselfmonitoring/prometheus/prometheus.yml
Grafanadatasources + dashboard provisioned read-only; phone-home disabledmonitoring/grafana/
Loki + Promtailevery container's logs, labeled by compose servicemonitoring/promtail/promtail.yml

The app side

The same graceful-degradation contract as jobs, email and AI: without METRICS_ENABLED nothing is collected and the endpoint answers 404. With it set, the request middleware feeds one histogram with normalized routes — id-ish path segments collapse to :id so label cardinality stays bounded:

/** Record one finished HTTP request. No-op while metrics are disabled. */
export function observeHttpRequest(opts: {
  method: string;
  pathname: string;
  status: number;
  durationMs: number;
}): void {
  if (!isMetricsEnabled()) return;
  const labels = {
    method: opts.method.toUpperCase(),
    route: normalizeRoute(opts.pathname),
    status: String(opts.status),
  };
  const state = getState();
  state.httpRequests.inc(labels);
  state.httpDuration.observe(labels, opts.durationMs / 1000);
}
 
/** The Prometheus exposition document — what `/api/metrics` returns. */
export async function metricsText(): Promise<{ contentType: string; body: string }> {
  const state = getState();
  return { contentType: state.registry.contentType, body: await state.registry.metrics() };
}

The metrics endpoint is for the internal network. In production, don't route /api/metrics through the public proxy — scrape it from inside.

Dashboards as code

monitoring/grafana/provisioning/dashboards/dashboards.yml points Grafana at the JSON files in monitoring/grafana/dashboards/. The provider sets allowUiUpdates: false: the repo is the source of truth, the UI is a viewer. The bundled Toolkit – Overview dashboard covers:

  • HTTP request rate by route, p50/p95 latency, 5xx share — from the histogram
  • app CPU, resident memory, event loop lag — from prom-client's defaults
  • Postgres backends + transaction rate — from postgres-exporter
  • MinIO bucket usage, scrape-target health
  • a live container-log panel — from Loki

Datasources get fixed uids (prometheus, loki) in provisioning, so dashboard JSON can reference them stably across environments.

Air-gapped notes

Grafana's phone-home is off in compose (GF_ANALYTICS_*, news feed). All images are pinnedlatest drift is how the local Hatchet engine once broke. Everything is self-hosted; nothing in the profile needs the internet once images are mirrored into an internal registry.

Try it

/sandbox/metrics shows the live exposition the profile scrapes — block spec in e2e/sandbox/metrics.spec.ts. The spec asserts both the process metrics and that the HTTP histogram saw the sign-in requests it just made.