Troubleshooting Matrix

Symptom → likely cause → diagnostic → fix. Verified against the canonical Railway+Vercel+Neon topology.

This section is intended for: Technical Team. Unauthorised access is restricted.

Audit status: Implemented

Symptom → likely cause → diagnostic command → fix. Every command runs against a deployed instance.

Status: Implemented. Every diagnostic below has been verified against the canonical Railway+Vercel+Neon topology.

Boot failures

SymptomLikely causeDiagnosticFix
Process exits immediately, log: Production environment validation failedA required env var is missing/shortInspect log — the message names the variableSet the variable in Railway / your platform; redeploy
[CONFIG ERROR] loadConfig failedJWT_SECRET missing or PORT non-numericSame log lineSet the variable correctly
[CONFIG ERROR] loadDeploymentConfig failedBad DEPLOYMENT_MODE value or invalid CORS origin formatInspect logFix the value; redeploy
Boot loop, > 3 restarts in 10 minOne of the above; platform keeps retryingRailway → service → LogsSet the missing var; the loop ends on next start
Server starts but /health 200 / /api/v1/health 503 with db:"down"Postgres unreachablepsql "$DATABASE_URL" from your laptopCheck Neon status; check DATABASE_URL; check IP allowlist

Live failures

SymptomLikely causeDiagnosticFix
/api/v1/health rls != "ok"RLS policy drift detectedcurl -s $BACKEND/api/v1/health | jqP1 — investigate immediately. Cross-tenant isolation may be compromised. Roll back recent schema change; check RLS drift detected log line
5xx rate spikesRecent deploy introduced an exceptionSearch logs for [ERROR] / Sentry release filterRoll back per ../rollback-plan.md
Persistent 403 on a known-good originOrigin not in CORS_ALLOWED_ORIGINScurl -i -H 'Origin: https://your-domain' $BACKEND/health then look for the CORS reject logAdd origin to env, redeploy. DO NOT roll back.
/health responds, /api/v1/health 503 with boot envelopeBoot integrity gate failed at startupSearch log for the boot envelope block; look at first failed preconditionFix the precondition (often DB or schema); restart
Founder alert email storm (same session ≥ 2 alerts)One-shot UPDATE claim regressedSELECT alerted_at, count(*) FROM investor_sessions WHERE alerted_at > now() - interval '1h' GROUP BY 1 HAVING count(*) > 1P1 — roll back backend immediately
Founder alerts silent for high-intent sessionResend transport unsetSearch log for founder_alert: no transport configured; check RESEND_API_KEY, ALERT_EMAIL_TO, EMAIL_FROMSet all three; engine releases its claim so a config fix heals retroactively
Frontend can't reach backend after deployWrong NEXT_PUBLIC_API_URL (build-time!)Browser network tab → check API request URLUpdate env on Vercel; rebuild (not just redeploy)
Hydration warnings on /, /investor, or /enterprise/pilotSSR/CSR mismatchBrowser consoleRoll back frontend per ../rollback-plan.md

Performance

SymptomLikely causeDiagnosticFix
p95 latency creeping upDB pool saturationSELECT count(*) FROM pg_stat_activity WHERE datname=current_database()Increase DB_MAX_CONNECTIONS if Postgres has headroom; otherwise vertical scale Postgres
OOM kills on backendMemory leak or large response bodyRailway memory graphInspect recent endpoints with large payloads; cap body size; vertical scale RAM
Frequent rate-limit 429s for one tenantLegit traffic above default limitPer-route limiter logsTune the per-route limiter; consider per-tenant override

Database

SymptomLikely causeDiagnosticFix
relation "<x>" does not exist after deploySchema bootstrap didn't runCheck boot log for ensureSchema linesRestart backend (idempotent); if persistent, run ensureSchema manually
column "<x>" does not exist in productionDev added a column; production schema driftCompare \d <table> between dev and prodRestart backend — ensureSchema adds missing columns idempotently
connection terminated during burstsPool exhaustedpg_stat_activity countIncrease pool, add backend replicas, or move to PgBouncer-pooled Neon URL
Slow audit_log queriesTable growth without indexCheck query planThe migrations include indexes; verify they exist; consider audit_log_partitioning (Aspirational)

Investor / tracking surface

SymptomLikely causeDiagnosticFix
POST /api/v1/investor/track returns 4xx stormFrontend payload shape changedBrowser network tab → request bodyRoll back frontend; tracking schema is in src/routes/investorTracking.ts
Tracking returns 413Body > 1 MBInspect payload sizeCap client-side; do NOT raise the server limit
Replay deep-link missing stakeholder=…generateReplayLink() regressionInspect alert emailRoll back backend; the helper is in src/services/investorResponseEngine.ts

Observability adapters

SymptomLikely causeDiagnosticFix
/super-admin/infrastructure/health shows unconfigured for RailwayRAILWAY_API_TOKEN unsetCheck envSet the token; redeploy
Same for Vercel / NeonRespective token unsetSameSet the token
Adapter shows down (not unconfigured)Vendor outage or token revokedVendor status pageWait for vendor or rotate token
Sentry receives no events after deploySENTRY_DSN not set or release filter mismatchSentry → ReleasesSet SENTRY_DSN; verify release tag in events

When to roll back vs. fix forward

See ../rollback-plan.md for the full decision matrix. Quick guide:

  • Hydration error / 5xx storm / founder-alert duplicates → roll back
  • CORS 403 from a legitimate origin → fix env, do not roll back
  • DB down → investigate Neon, do not roll back the backend
  • Resend transport missing → set the three env vars; engine heals

Where to read more

Canonical source: docs/deployment/troubleshooting-matrix.md

This page mirrors the markdown deployment hub on disk. The full markdown source includes additional code blocks, command examples, and embedded reference tables.

Hub index: /docs/deployment

You are here · Deploy · step 16
Tenant Setupstart here

Start of Deploy: Tenant Setup.

No direct successor — surfacing the section entry point.

What should I do next?

Activation Flowprimary

continues in "deploy"

Ranked using IA v1 graph + intent map + glossary density (deterministic; no AI inference).