How to wire each observability surface, and how each one degrades when its credential is unset.

This section is intended for: Technical Team, Management. Unauthorised access is restricted.

Audit status: Implemented

How to wire each observability surface, and how each one degrades when its credential is unset.

Status: Implemented. Every adapter exists in src/services/infraHealth/* (Phase 1 audit §6); each is timeboxed at 3 s and degrades to unconfigured instead of crashing.

Surfaces, at a glance

Surface	Code path	Required env	Behaviour when unset
Sentry (errors)	`src/observability/sentry.ts`	`SENTRY_DSN`	No-op; structured warn at boot
Structured logs (Pino)	`src/utils/logger.ts`	none	Always on; JSON to stdout
Request correlation	`src/utils/requestContext.ts`, `src/middleware/requestId.ts`	none	Always on; AsyncLocalStorage propagates `request_id` / `tenant_id` / `user_id` / `route`
Railway adapter	`src/services/infraHealth/railwayAdapter.ts`	`RAILWAY_API_TOKEN`, `RAILWAY_PROJECT_ID`	`unconfigured` status surfaced in `/super-admin/infrastructure/health`
Vercel adapter	`src/services/infraHealth/vercelAdapter.ts`	`VERCEL_API_TOKEN`, `VERCEL_TEAM_ID`	`unconfigured`
Neon adapter	`src/services/infraHealth/neonAdapter.ts`	`NEON_API_KEY`, `NEON_PROJECT_ID`	Falls back to `pg_stat_activity` query
Health endpoints	`src/app.ts`, `src/routes/health.ts`	none	`/health` always-on; `/api/v1/health` is boot-envelope-gated

1. Sentry

Sentry is initialised BEFORE any framework module loads. src/index.ts uses zero static imports and dynamically requires ./observability/sentry first, then ./bootApp. This is intentional — Sentry must wrap process exceptions, not just request errors.

src/observability/sentry.ts's beforeSend strips:

event.request.{data,cookies}
Authorization, Cookie, X-API-Key headers
event.user.{email,username,ip_address}

To enable: set SENTRY_DSN and redeploy. To disable: leave SENTRY_DSN unset.

2. Structured logs

Pino emits JSON on stdout. Every line carries the AsyncLocalStorage-propagated correlation triple.

Redaction list (src/utils/logger.ts:55-63) covers password, token, secret, authorization, cookie, apiKey at top-level, nested, and metadata.* paths.

Shipping logs to your aggregator:

Railway: logs are captured automatically; no setup. Forward to Datadog / Better Stack / etc. via Railway's log drains.
Vercel: frontend logs live in Vercel's dashboard; ship via Vercel Log Drains.
Self-hosted Docker: set logging.driver (see docker.md). For ELK, use gelf or fluentd.
Kubernetes: standard container-log harvesting (Fluent Bit / Vector → Loki / Elasticsearch).

3. Request correlation

Every HTTP request gets a request_id (header x-request-id, generated if absent). The ID propagates into:

Every Pino log line for the request
Every audit_log row written during the request
The Sentry breadcrumb tags

This makes a single request_id the join key across logs, audit, and error reporting. Operators can search audit_log by request_id directly (src/migrations/036_observability.sql adds the partial index).

4. Infra-health adapters

Aggregator at src/services/infraHealth/index.ts calls each adapter under Promise.allSettled so a slow vendor cannot block the operator surface. Each adapter wraps its HTTP call in a 3 s AbortController-backed timeout.

The output is exposed at /api/v1/super-admin/infrastructure/health and rendered in /super-admin/infrastructure/health (mounted as a sibling route BEFORE the broader /super-admin mount so the stricter RBAC gate wins — see ../audits/phase1-readiness-audit.md §1).

Configuring each adapter

# Railway visibility (service status, recent deploy state)
RAILWAY_API_TOKEN=<token from railway.app/account/tokens>
RAILWAY_PROJECT_ID=<project id from railway.app/dashboard>

# Vercel visibility (deployment state)
VERCEL_API_TOKEN=<token from vercel.com/account/tokens>
VERCEL_TEAM_ID=<team id, optional for personal accounts>

# Neon visibility (compute state, branch list, storage)
NEON_API_KEY=<key from neon.tech/app/settings/api-keys>
NEON_PROJECT_ID=<project id from neon.tech/app/projects>

Each token is read-only — none mutates infrastructure.

5. Alerts to actually configure

See ../monitoring-guide.md for the full table. The minimum viable alert set:

Signal	Threshold	Severity
`/api/v1/health` non-200 for >2 min	P1	Page on-call
`/api/v1/health` `data.rls != "ok"`	any single sample	P1 — RLS drift; cross-tenant isolation may be compromised
5xx rate > 1% over 5 min	P2	Inspect Railway logs
Founder alert duplicates	any session alerted >1 time	P2 — one-shot UPDATE has regressed
Boot loop > 3 restarts in 10 min	P1	Inspect logs for `Production environment validation failed`

6. Degraded-mode summary

When a vendor token is unset, that surface is unconfigured — not down. The /super-admin/infrastructure/health UI shows the per-surface status; operators get an at-a-glance "what's dark" without the system itself crashing.

This is the right trade-off for air-gapped or self-hosted deployments where reaching out to Railway / Vercel APIs is impossible.

Where to read more

../monitoring-guide.md — alerts + log greps + DB checks
production-hardening.md — what's enforced at boot
troubleshooting-matrix.md — symptom → fix
In-app: /docs/deployment/observability-setup

Canonical source: docs/deployment/observability-setup.md

This page mirrors the markdown deployment hub on disk. The full markdown source includes additional code blocks, command examples, and embedded reference tables.

Hub index: /docs/deployment

Observability Setup