The backend is stateless by design. Scaling is mostly about Postgres capacity and rate-limit tuning.

This section is intended for: Technical Team. Unauthorised access is restricted.

Audit status: Implemented

How and when to scale Govula. The backend is stateless by design — every request can be served by any process — so scaling is mostly about Postgres capacity and rate-limit tuning.

Status: Implemented. Stateless backend, configurable pool, per-route + per-tenant rate limiters all exist on main.

Where state actually lives

State	Lives in	Implication for scaling
Sessions / refresh tokens	Postgres (`auth_sessions`)	No sticky sessions required; any backend can serve any request
Tenant context	Per-request, set inside a transaction (`set_config('app.tenant_id', ..., true)` in `src/repositories/database.ts`)	No cross-pool leak risk
Entitlement cache	In-process (60 s TTL, invalidated via EventEmitter)	Per-process by default. Set `ENTITLEMENT_BROADCAST_ENABLED=true` to enable cross-replica invalidation via Postgres LISTEN/NOTIFY (Task #8) — see "Multi-process entitlements" below
Rate-limit counters	In-process (default)	For multi-instance, switch to Redis-backed limiter (see "Per-tenant rate limiting")
Audit chain	Postgres (`audit_log`, `tenant_operation_log`, `operator_audit_events`)	Append-only; no scaling work
Recording-mode events	Browser localStorage + Postgres	Stateless from the backend's perspective

Vertical scaling first

For most deployments under ~500 RPS, vertical scaling is the right answer:

Resource	Recommended start	When to upsize
Backend CPU	2 vCPU	p99 request latency > 500ms while CPU-bound
Backend RAM	1 GB	OOM kills, or sustained heap > 700 MB
Postgres	Neon free / db.t4g.medium	`pg_stat_activity` count > pool max, or query latency p95 > 100ms

Horizontal scaling

The backend exposes /health (unconditional liveness) and /api/v1/health (boot-envelope-gated readiness). Use:

/health for the load balancer — never auth-gated, never blocks
/api/v1/health for "should this instance receive traffic?" — returns 503 if boot envelope failed

Replicas can be added without configuration changes. Round-robin is fine; no sticky sessions required.

Multi-process entitlements

entitlementEngine (src/services/entitlementEngine.ts) caches ALLOW decisions for 60s in-process and invalidates via a process-local EventEmitter. With multiple backend instances, an entitlement change on instance A is invisible to instance B for up to 60s — fine for most cases (bounded eventual consistency), but a real gap for high-stakes operations like tier downgrades that should take effect immediately.

Status: Implemented (opt-in). src/services/entitlementBroadcast.ts bridges emitEntitlementChange() across replicas using Postgres LISTEN/NOTIFY on the channel govula_entitlement_changed. Set ENTITLEMENT_BROADCAST_ENABLED=true in every replica to enable; default is OFF so single-process deployments and local dev pay zero connection cost.

Aspect	Behaviour
Channel	`govula_entitlement_changed` (single channel for all 3 event types: `entitlement.changed`, `contract.changed`, `lifecycle.changed`)
Echo suppression	Per-boot UUID `originToken` stamped on every NOTIFY; listener drops self-originated messages
Listener connection	Dedicated long-lived `pg.Client` (Pool sessions can't hold LISTEN state across checkout/release)
Reconnect	Exponential backoff 1s → 2s → 4s → … capped at 30s on `error`/`end`
Shutdown	Closed cleanly from the SIGTERM/SIGINT handler in `src/bootApp.ts` before `database.disconnect()`
Failure mode	Never fatal. Listener-start failure logs a warning and continues; engine degrades to the pre-#8 60s-TTL behaviour
Audit ledger	Untouched. This layer operates purely on the in-memory cache; integrity hash chain is unaffected

Full architectural detail in §Entitlement Engine v2 — Distributed cache invalidation of the deep-dives doc.

Postgres pooling

src/config/index.ts exposes:

Env	Default	Purpose
`DB_MAX_CONNECTIONS`	20	Max connections per backend instance
`DB_IDLE_TIMEOUT`	30000	Idle connection retention (ms)
`DB_CONNECTION_TIMEOUT`	5000	New-connection timeout (ms)

Sizing rule: DB_MAX_CONNECTIONS × backend_instances ≤ Postgres max_connections × 0.8. With Neon, prefer the pooled connection string (PgBouncer in transaction mode) so you can run higher per-instance pool sizes without exhausting the upstream.

Per-tenant rate limiting

Govula already exposes a per-tenant limiter at src/middleware/perTenantRateLimit.ts. Global limits live in src/app.ts. With multiple backend instances:

The default in-memory store is per-process, so a tenant can effectively get limit × instances requests/window.
For strict enforcement across a cluster, switch the store to Redis (already supported by the underlying express-rate-limit package via a custom store).

Cron / scheduled jobs

The in-process scheduler (gated by ENABLE_SCHEDULER=1) runs every cron in every backend that has the flag set. To avoid duplicate work in a multi-instance deployment, enable the scheduler on exactly one instance (or one dedicated worker process), and leave it off on the rest. The internal-cron HTTP mount (/api/internal/cron/*, gated by ENABLE_INTERNAL_CRON=1 + CRON_SECRET) is the safer alternative for HA — invoke it from an external scheduler (GitHub Actions cron, AWS EventBridge, etc.) on a single schedule.

Frontend scaling

Vercel handles frontend scaling automatically. For self-hosted, Next.js is also stateless — replicate freely. The page tree is statically prerendered where possible; CDN cache headers are set per-page.

Capacity benchmarks (rough)

These are guideline numbers from internal load tests; not SLOs:

RPS	Backend pods (2 vCPU)	Postgres
< 50	1	Neon free / RDS t4g.medium
50–200	2	Neon paid / RDS m6i.large
200–500	3–4	Above + 1 read replica
> 500	Talk to us	Dedicated provisioning

Where to read more

backup-recovery.md — replica strategy
observability-setup.md — what to watch as load grows
production-hardening.md — every limiter / pool knob
In-app: /docs/deployment/scaling

Canonical source: docs/deployment/scaling.md

This page mirrors the markdown deployment hub on disk. The full markdown source includes additional code blocks, command examples, and embedded reference tables.

Hub index: /docs/deployment

Scaling