Observability Setup

How to wire each observability surface, and how each one degrades when its credential is unset.

This section is intended for: Technical Team, Management. Unauthorised access is restricted.

Audit status: Implemented

How to wire each observability surface, and how each one degrades when its credential is unset.

Status: Implemented. Every adapter exists in src/services/infraHealth/* (Phase 1 audit §6); each is timeboxed at 3 s and degrades to unconfigured instead of crashing.

Surfaces, at a glance

SurfaceCode pathRequired envBehaviour when unset
Sentry (errors)src/observability/sentry.tsSENTRY_DSNNo-op; structured warn at boot
Structured logs (Pino)src/utils/logger.tsnoneAlways on; JSON to stdout
Request correlationsrc/utils/requestContext.ts, src/middleware/requestId.tsnoneAlways on; AsyncLocalStorage propagates request_id / tenant_id / user_id / route
Railway adaptersrc/services/infraHealth/railwayAdapter.tsRAILWAY_API_TOKEN, RAILWAY_PROJECT_IDunconfigured status surfaced in /super-admin/infrastructure/health
Vercel adaptersrc/services/infraHealth/vercelAdapter.tsVERCEL_API_TOKEN, VERCEL_TEAM_IDunconfigured
Neon adaptersrc/services/infraHealth/neonAdapter.tsNEON_API_KEY, NEON_PROJECT_IDFalls back to pg_stat_activity query
Health endpointssrc/app.ts, src/routes/health.tsnone/health always-on; /api/v1/health is boot-envelope-gated

1. Sentry

Sentry is initialised BEFORE any framework module loads. src/index.ts uses zero static imports and dynamically requires ./observability/sentry first, then ./bootApp. This is intentional — Sentry must wrap process exceptions, not just request errors.

src/observability/sentry.ts's beforeSend strips:

  • event.request.{data,cookies}
  • Authorization, Cookie, X-API-Key headers
  • event.user.{email,username,ip_address}

To enable: set SENTRY_DSN and redeploy. To disable: leave SENTRY_DSN unset.

2. Structured logs

Pino emits JSON on stdout. Every line carries the AsyncLocalStorage-propagated correlation triple.

Redaction list (src/utils/logger.ts:55-63) covers password, token, secret, authorization, cookie, apiKey at top-level, nested, and metadata.* paths.

Shipping logs to your aggregator:

  • Railway: logs are captured automatically; no setup. Forward to Datadog / Better Stack / etc. via Railway's log drains.
  • Vercel: frontend logs live in Vercel's dashboard; ship via Vercel Log Drains.
  • Self-hosted Docker: set logging.driver (see docker.md). For ELK, use gelf or fluentd.
  • Kubernetes: standard container-log harvesting (Fluent Bit / Vector → Loki / Elasticsearch).

3. Request correlation

Every HTTP request gets a request_id (header x-request-id, generated if absent). The ID propagates into:

  • Every Pino log line for the request
  • Every audit_log row written during the request
  • The Sentry breadcrumb tags

This makes a single request_id the join key across logs, audit, and error reporting. Operators can search audit_log by request_id directly (src/migrations/036_observability.sql adds the partial index).

4. Infra-health adapters

Aggregator at src/services/infraHealth/index.ts calls each adapter under Promise.allSettled so a slow vendor cannot block the operator surface. Each adapter wraps its HTTP call in a 3 s AbortController-backed timeout.

The output is exposed at /api/v1/super-admin/infrastructure/health and rendered in /super-admin/infrastructure/health (mounted as a sibling route BEFORE the broader /super-admin mount so the stricter RBAC gate wins — see ../audits/phase1-readiness-audit.md §1).

Configuring each adapter

# Railway visibility (service status, recent deploy state)
RAILWAY_API_TOKEN=<token from railway.app/account/tokens>
RAILWAY_PROJECT_ID=<project id from railway.app/dashboard>

# Vercel visibility (deployment state)
VERCEL_API_TOKEN=<token from vercel.com/account/tokens>
VERCEL_TEAM_ID=<team id, optional for personal accounts>

# Neon visibility (compute state, branch list, storage)
NEON_API_KEY=<key from neon.tech/app/settings/api-keys>
NEON_PROJECT_ID=<project id from neon.tech/app/projects>

Each token is read-only — none mutates infrastructure.

5. Alerts to actually configure

See ../monitoring-guide.md for the full table. The minimum viable alert set:

SignalThresholdSeverity
/api/v1/health non-200 for >2 minP1Page on-call
/api/v1/health data.rls != "ok"any single sampleP1 — RLS drift; cross-tenant isolation may be compromised
5xx rate > 1% over 5 minP2Inspect Railway logs
Founder alert duplicatesany session alerted >1 timeP2 — one-shot UPDATE has regressed
Boot loop > 3 restarts in 10 minP1Inspect logs for Production environment validation failed

6. Degraded-mode summary

When a vendor token is unset, that surface is unconfigured — not down. The /super-admin/infrastructure/health UI shows the per-surface status; operators get an at-a-glance "what's dark" without the system itself crashing.

This is the right trade-off for air-gapped or self-hosted deployments where reaching out to Railway / Vercel APIs is impossible.

Where to read more

Canonical source: docs/deployment/observability-setup.md

This page mirrors the markdown deployment hub on disk. The full markdown source includes additional code blocks, command examples, and embedded reference tables.

Hub index: /docs/deployment

You are here · Deploy · step 14
Production Hardeningnext step

Next in Deploy: Production Hardening.

What should I do next?

Activation Flowprimary

continues in "deploy"

Ranked using IA v1 graph + intent map + glossary density (deterministic; no AI inference).