How and when to scale Govula. The backend is stateless by design — every request can be served by any process — so scaling is mostly about Postgres capacity and rate-limit tuning.
Status: Implemented. Stateless backend, configurable pool, per-route + per-tenant rate limiters all exist on main.
Where state actually lives
| State | Lives in | Implication for scaling |
|---|---|---|
| Sessions / refresh tokens | Postgres (auth_sessions) | No sticky sessions required; any backend can serve any request |
| Tenant context | Per-request, set inside a transaction (set_config('app.tenant_id', ..., true) in src/repositories/database.ts) | No cross-pool leak risk |
| Entitlement cache | In-process (60 s TTL, invalidated via EventEmitter) | Per-process by default. Set ENTITLEMENT_BROADCAST_ENABLED=true to enable cross-replica invalidation via Postgres LISTEN/NOTIFY (Task #8) — see "Multi-process entitlements" below |
| Rate-limit counters | In-process (default) | For multi-instance, switch to Redis-backed limiter (see "Per-tenant rate limiting") |
| Audit chain | Postgres (audit_log, tenant_operation_log, operator_audit_events) | Append-only; no scaling work |
| Recording-mode events | Browser localStorage + Postgres | Stateless from the backend's perspective |
Vertical scaling first
For most deployments under ~500 RPS, vertical scaling is the right answer:
| Resource | Recommended start | When to upsize |
|---|---|---|
| Backend CPU | 2 vCPU | p99 request latency > 500ms while CPU-bound |
| Backend RAM | 1 GB | OOM kills, or sustained heap > 700 MB |
| Postgres | Neon free / db.t4g.medium | pg_stat_activity count > pool max, or query latency p95 > 100ms |
Horizontal scaling
The backend exposes /health (unconditional liveness) and /api/v1/health (boot-envelope-gated readiness). Use:
/healthfor the load balancer — never auth-gated, never blocks/api/v1/healthfor "should this instance receive traffic?" — returns 503 if boot envelope failed
Replicas can be added without configuration changes. Round-robin is fine; no sticky sessions required.
Multi-process entitlements
entitlementEngine (src/services/entitlementEngine.ts) caches ALLOW decisions for 60s in-process and invalidates via a process-local EventEmitter. With multiple backend instances, an entitlement change on instance A is invisible to instance B for up to 60s — fine for most cases (bounded eventual consistency), but a real gap for high-stakes operations like tier downgrades that should take effect immediately.
Status: Implemented (opt-in). src/services/entitlementBroadcast.ts bridges emitEntitlementChange() across replicas using Postgres LISTEN/NOTIFY on the channel govula_entitlement_changed. Set ENTITLEMENT_BROADCAST_ENABLED=true in every replica to enable; default is OFF so single-process deployments and local dev pay zero connection cost.
| Aspect | Behaviour |
|---|---|
| Channel | govula_entitlement_changed (single channel for all 3 event types: entitlement.changed, contract.changed, lifecycle.changed) |
| Echo suppression | Per-boot UUID originToken stamped on every NOTIFY; listener drops self-originated messages |
| Listener connection | Dedicated long-lived pg.Client (Pool sessions can't hold LISTEN state across checkout/release) |
| Reconnect | Exponential backoff 1s → 2s → 4s → … capped at 30s on error/end |
| Shutdown | Closed cleanly from the SIGTERM/SIGINT handler in src/bootApp.ts before database.disconnect() |
| Failure mode | Never fatal. Listener-start failure logs a warning and continues; engine degrades to the pre-#8 60s-TTL behaviour |
| Audit ledger | Untouched. This layer operates purely on the in-memory cache; integrity hash chain is unaffected |
Full architectural detail in §Entitlement Engine v2 — Distributed cache invalidation of the deep-dives doc.
Postgres pooling
src/config/index.ts exposes:
| Env | Default | Purpose |
|---|---|---|
DB_MAX_CONNECTIONS | 20 | Max connections per backend instance |
DB_IDLE_TIMEOUT | 30000 | Idle connection retention (ms) |
DB_CONNECTION_TIMEOUT | 5000 | New-connection timeout (ms) |
Sizing rule: DB_MAX_CONNECTIONS × backend_instances ≤ Postgres max_connections × 0.8. With Neon, prefer the pooled connection string (PgBouncer in transaction mode) so you can run higher per-instance pool sizes without exhausting the upstream.
Per-tenant rate limiting
Govula already exposes a per-tenant limiter at src/middleware/perTenantRateLimit.ts. Global limits live in src/app.ts. With multiple backend instances:
- The default in-memory store is per-process, so a tenant can effectively get
limit × instancesrequests/window. - For strict enforcement across a cluster, switch the store to Redis (already supported by the underlying
express-rate-limitpackage via a custom store).
Cron / scheduled jobs
The in-process scheduler (gated by ENABLE_SCHEDULER=1) runs every cron in every backend that has the flag set. To avoid duplicate work in a multi-instance deployment, enable the scheduler on exactly one instance (or one dedicated worker process), and leave it off on the rest. The internal-cron HTTP mount (/api/internal/cron/*, gated by ENABLE_INTERNAL_CRON=1 + CRON_SECRET) is the safer alternative for HA — invoke it from an external scheduler (GitHub Actions cron, AWS EventBridge, etc.) on a single schedule.
Frontend scaling
Vercel handles frontend scaling automatically. For self-hosted, Next.js is also stateless — replicate freely. The page tree is statically prerendered where possible; CDN cache headers are set per-page.
Capacity benchmarks (rough)
These are guideline numbers from internal load tests; not SLOs:
| RPS | Backend pods (2 vCPU) | Postgres |
|---|---|---|
| < 50 | 1 | Neon free / RDS t4g.medium |
| 50–200 | 2 | Neon paid / RDS m6i.large |
| 200–500 | 3–4 | Above + 1 read replica |
| > 500 | Talk to us | Dedicated provisioning |
Where to read more
backup-recovery.md— replica strategyobservability-setup.md— what to watch as load growsproduction-hardening.md— every limiter / pool knob- In-app:
/docs/deployment/scaling