Scaling

The backend is stateless by design. Scaling is mostly about Postgres capacity and rate-limit tuning.

This section is intended for: Technical Team. Unauthorised access is restricted.

Audit status: Implemented

How and when to scale Govula. The backend is stateless by design — every request can be served by any process — so scaling is mostly about Postgres capacity and rate-limit tuning.

Status: Implemented. Stateless backend, configurable pool, per-route + per-tenant rate limiters all exist on main.

Where state actually lives

StateLives inImplication for scaling
Sessions / refresh tokensPostgres (auth_sessions)No sticky sessions required; any backend can serve any request
Tenant contextPer-request, set inside a transaction (set_config('app.tenant_id', ..., true) in src/repositories/database.ts)No cross-pool leak risk
Entitlement cacheIn-process (60 s TTL, invalidated via EventEmitter)Per-process by default. Set ENTITLEMENT_BROADCAST_ENABLED=true to enable cross-replica invalidation via Postgres LISTEN/NOTIFY (Task #8) — see "Multi-process entitlements" below
Rate-limit countersIn-process (default)For multi-instance, switch to Redis-backed limiter (see "Per-tenant rate limiting")
Audit chainPostgres (audit_log, tenant_operation_log, operator_audit_events)Append-only; no scaling work
Recording-mode eventsBrowser localStorage + PostgresStateless from the backend's perspective

Vertical scaling first

For most deployments under ~500 RPS, vertical scaling is the right answer:

ResourceRecommended startWhen to upsize
Backend CPU2 vCPUp99 request latency > 500ms while CPU-bound
Backend RAM1 GBOOM kills, or sustained heap > 700 MB
PostgresNeon free / db.t4g.mediumpg_stat_activity count > pool max, or query latency p95 > 100ms

Horizontal scaling

The backend exposes /health (unconditional liveness) and /api/v1/health (boot-envelope-gated readiness). Use:

  • /health for the load balancer — never auth-gated, never blocks
  • /api/v1/health for "should this instance receive traffic?" — returns 503 if boot envelope failed

Replicas can be added without configuration changes. Round-robin is fine; no sticky sessions required.

Multi-process entitlements

entitlementEngine (src/services/entitlementEngine.ts) caches ALLOW decisions for 60s in-process and invalidates via a process-local EventEmitter. With multiple backend instances, an entitlement change on instance A is invisible to instance B for up to 60s — fine for most cases (bounded eventual consistency), but a real gap for high-stakes operations like tier downgrades that should take effect immediately.

Status: Implemented (opt-in). src/services/entitlementBroadcast.ts bridges emitEntitlementChange() across replicas using Postgres LISTEN/NOTIFY on the channel govula_entitlement_changed. Set ENTITLEMENT_BROADCAST_ENABLED=true in every replica to enable; default is OFF so single-process deployments and local dev pay zero connection cost.

AspectBehaviour
Channelgovula_entitlement_changed (single channel for all 3 event types: entitlement.changed, contract.changed, lifecycle.changed)
Echo suppressionPer-boot UUID originToken stamped on every NOTIFY; listener drops self-originated messages
Listener connectionDedicated long-lived pg.Client (Pool sessions can't hold LISTEN state across checkout/release)
ReconnectExponential backoff 1s → 2s → 4s → … capped at 30s on error/end
ShutdownClosed cleanly from the SIGTERM/SIGINT handler in src/bootApp.ts before database.disconnect()
Failure modeNever fatal. Listener-start failure logs a warning and continues; engine degrades to the pre-#8 60s-TTL behaviour
Audit ledgerUntouched. This layer operates purely on the in-memory cache; integrity hash chain is unaffected

Full architectural detail in §Entitlement Engine v2 — Distributed cache invalidation of the deep-dives doc.

Postgres pooling

src/config/index.ts exposes:

EnvDefaultPurpose
DB_MAX_CONNECTIONS20Max connections per backend instance
DB_IDLE_TIMEOUT30000Idle connection retention (ms)
DB_CONNECTION_TIMEOUT5000New-connection timeout (ms)

Sizing rule: DB_MAX_CONNECTIONS × backend_instances ≤ Postgres max_connections × 0.8. With Neon, prefer the pooled connection string (PgBouncer in transaction mode) so you can run higher per-instance pool sizes without exhausting the upstream.

Per-tenant rate limiting

Govula already exposes a per-tenant limiter at src/middleware/perTenantRateLimit.ts. Global limits live in src/app.ts. With multiple backend instances:

  • The default in-memory store is per-process, so a tenant can effectively get limit × instances requests/window.
  • For strict enforcement across a cluster, switch the store to Redis (already supported by the underlying express-rate-limit package via a custom store).

Cron / scheduled jobs

The in-process scheduler (gated by ENABLE_SCHEDULER=1) runs every cron in every backend that has the flag set. To avoid duplicate work in a multi-instance deployment, enable the scheduler on exactly one instance (or one dedicated worker process), and leave it off on the rest. The internal-cron HTTP mount (/api/internal/cron/*, gated by ENABLE_INTERNAL_CRON=1 + CRON_SECRET) is the safer alternative for HA — invoke it from an external scheduler (GitHub Actions cron, AWS EventBridge, etc.) on a single schedule.

Frontend scaling

Vercel handles frontend scaling automatically. For self-hosted, Next.js is also stateless — replicate freely. The page tree is statically prerendered where possible; CDN cache headers are set per-page.

Capacity benchmarks (rough)

These are guideline numbers from internal load tests; not SLOs:

RPSBackend pods (2 vCPU)Postgres
< 501Neon free / RDS t4g.medium
50–2002Neon paid / RDS m6i.large
200–5003–4Above + 1 read replica
> 500Talk to usDedicated provisioning

Where to read more

Canonical source: docs/deployment/scaling.md

This page mirrors the markdown deployment hub on disk. The full markdown source includes additional code blocks, command examples, and embedded reference tables.

Hub index: /docs/deployment

You are here · Deploy · step 10
Backup & Recoverynext step

Next in Deploy: Backup & Recovery.

What should I do next?

Activation Flowprimary

continues in "deploy"

Ranked using IA v1 graph + intent map + glossary density (deterministic; no AI inference).