Backup & Recovery

Backups must preserve forensic-grade audit chains; restores must be drilled and the hash chain re-validated.

This section is intended for: Technical Team, Management, Auditor. Unauthorised access is restricted.

Audit status: MixedNeon PITR: Implemented. Self-hosted WAL shipping: Partial. Restore drill harness: Aspirational.

Govula's data is forensic-grade — audit_log, tenant_operation_log, and operator_audit_events are append-only via Postgres DO INSTEAD NOTHING rules and SHA-256 hash-chained (../audits/phase1-readiness-audit.md §4). Backups must preserve that integrity, and restores must be drilled.

Status:

  • Neon PITR: Implemented (managed by Neon; nothing for the operator to install)
  • Self-hosted pg_basebackup + WAL archiving: Partial (pattern documented; no shipped automation)
  • Restore drill: Aspirational (no automated drill harness; recommended manual cadence below)

RPO / RTO targets

TierRPO (data loss)RTO (time to restore)Topology
Standard5 min30 minNeon PITR
Tighter1 min15 minNeon paid + read replica failover
Self-hosted5–60 min depending on archive cadence30–120 minpg_basebackup + WAL shipping

1. Neon PITR (canonical Railway+Vercel+Neon path)

Neon takes continuous WAL backups; PITR is a UI operation:

  1. Neon → project → Branches tab → identify a branch named for the desired timestamp.
  2. Restore to create a new branch from a specific point in time (millisecond resolution within the retention window — typically 7 days on free, 30 days on paid).
  3. Update Railway's DATABASE_URL to the restored branch's connection string.
  4. Redeploy the backend.
  5. After verification, promote the restored branch as the new primary in Neon.

This is the same flow described in ../rollback-plan.md §"Database rollback".

2. Logical dumps (any topology)

For long-term archive (audit / regulatory):

# Daily, retain 365 days, store off-host (S3-compatible bucket)
pg_dump --format=custom --no-owner --no-privileges \
        --file="govula-$(date -u +%Y%m%d).dump" \
        "$DATABASE_URL_DIRECT"

aws s3 cp govula-*.dump s3://govula-archive/ \
        --storage-class STANDARD_IA

Logical dumps are slow to restore (≈ row-by-row insert) and lose physical replication state, so they're a complement to PITR, not a replacement.

3. Self-hosted: pg_basebackup + WAL archiving

For customer-controlled Postgres without a managed PITR feature:

  1. Configure WAL archiving in postgresql.conf:
    archive_mode = on
    archive_command = 'aws s3 cp %p s3://govula-wal/%f'
    wal_level = replica
    
  2. Take a base backup weekly:
    pg_basebackup -D /backups/$(date -u +%Y%m%d) -F tar -z -P
    
  3. Restore procedure (in DR runbook form):
    # 1. Stop Postgres
    systemctl stop postgresql
    
    # 2. Wipe the data dir (or move it aside)
    mv /var/lib/postgresql/16/main /var/lib/postgresql/16/main.bak
    
    # 3. Restore base backup
    tar xzf /backups/<date>/base.tar.gz -C /var/lib/postgresql/16/main
    
    # 4. Configure recovery target time in postgresql.conf:
    #    restore_command = 'aws s3 cp s3://govula-wal/%f %p'
    #    recovery_target_time = '2026-05-10 12:00:00 UTC'
    
    # 5. Start Postgres; it replays WAL up to the target
    systemctl start postgresql
    

This is the same procedure used by the backup sidecar in docker-compose.on-prem.yml (see ../ENTERPRISE-DEPLOYMENT.md §"Backup and Recovery").

4. Hash-chain integrity after restore

After ANY restore, run the audit hash-chain replay to confirm no audit row was silently lost:

# This walks audit_log in (created_at ASC, id ASC) order and
# recomputes integrity_hash for every row.
curl -X POST $BACKEND/api/v1/super-admin/tenants/<tenantId>/operations \
     -H "Authorization: Bearer $OPERATOR_JWT" \
     -d '{"action":"force_audit_replay_validation","reason":"post-restore drill"}'

The result is itself audited (the outcome of the audit is audited) — see ../audits/phase1-readiness-audit.md §4 finding F-AU2.

5. Backup of object-storage assets

Govula optionally archives reports to object storage (Replit Object Storage / S3-compatible). The bucket name is DEFAULT_OBJECT_STORAGE_BUCKET_ID. Backups for this surface depend on your provider:

  • Replit Object Storage: managed; no operator action.
  • S3: enable Versioning + cross-region Replication.
  • Self-hosted MinIO: mc mirror to a second site.

6. Restore drill cadence

Backups you have not restored are not backups. Recommended cadence:

  • Quarterly for the canonical Neon path: PITR-restore to a throwaway branch, point a staging backend at it, run the post-deploy verification from ../production-checklist.md.
  • Quarterly for self-hosted: full restore on an isolated VM, verify hash chain.

7. What the system does NOT back up

  • In-flight HTTP requests — the load balancer drains in-flight requests on instance shutdown but a sudden kill drops them. Idempotent endpoints retry safely.
  • In-process entitlement cache — rebuilt on first request; no operator action.
  • Sentry breadcrumbs / Pino in-flight logs — Sentry retains for its configured window; Pino logs are only as durable as your log shipper.

Where to read more

Canonical source: docs/deployment/backup-recovery.md

This page mirrors the markdown deployment hub on disk. The full markdown source includes additional code blocks, command examples, and embedded reference tables.

Hub index: /docs/deployment

You are here · Deploy · step 11
CI / CDnext step

Next in Deploy: CI / CD.

What should I do next?

Activation Flowprimary

continues in "deploy"

Ranked using IA v1 graph + intent map + glossary density (deterministic; no AI inference).