Skip to content

Production Deployment Checklist

Zero-Downtime Deploy Strategy

Cloud Run Settings (Production)

# Engine: single-instance, no-cpu-throttling, graceful shutdown
gcloud run deploy oracle-engine \
  --min-instances=1 --max-instances=1 \
  --no-cpu-throttling \
  --timeout=300 \
  --revision-suffix=$(date +%Y%m%d-%H%M%S)

# Gateway: multi-instance, gradual rollout
gcloud run deploy oracle-gateway \
  --min-instances=1 --max-instances=5 \
  --no-traffic \  # Deploy without routing traffic
  # Then: gcloud run services update-traffic oracle-gateway --to-revisions=NEW=100

# WS: session affinity, min-instances for always-on connections
gcloud run deploy oracle-ws \
  --min-instances=1 --max-instances=3 \
  --session-affinity

# Settlement: single-instance (one writer to chain)
gcloud run deploy oracle-settlement \
  --min-instances=1 --max-instances=1 \
  --no-cpu-throttling

What Protects Existing State

  1. Engine checkpoint on SIGTERM: When Cloud Run sends SIGTERM (during deploy), the engine saves all state to Redis before exiting. The new revision loads from Redis on startup. No data loss.

  2. Redis is external: All state lives in Redis (Memorystore), not in the container. Containers are disposable.

  3. Gateway is stateless: No state to lose. Just proxies to Redis.

  4. Settlement idempotent batches: Batch IDs are sequential. If a batch is partially submitted and the service restarts, the next instance skips already-submitted batches (on-chain batch_id check).

  5. Frontend versioning: Static assets are served from nginx. Old assets remain cached in browsers until hard refresh. New deploys only affect new page loads.

Deploy Procedure (Production)

# 1. Merge staging → main
git checkout main && git merge staging && git push origin main

# 2. Cloud Build triggers on main branch deploy to prod
# (Create separate triggers for main branch → prod GCP project)

# 3. Verify health
curl https://api.parti.com/v1/health

# 4. Verify state restored
curl https://api.parti.com/v1/markets | jq '.markets | length'

Rollback

# Instant rollback — route traffic to previous revision
gcloud run services update-traffic oracle-engine \
  --to-revisions=oracle-engine-PREVIOUS=100

Environment Variables (Production)

# Engine
SKIP_SIG_VERIFY=false  # ENFORCE signatures in prod
ADMIN_API_KEY=<strong-random-key>
RUST_LOG=info

# Gateway
ADMIN_API_KEY=<same-key>

# Settlement
CHAIN=fogo  # or solana for mainnet

Pre-Deploy Checklist

  • [ ] All tests pass (cargo test on engine)
  • [ ] cargo audit clean (or known exceptions documented)
  • [ ] npm audit clean on frontend
  • [ ] ADMIN_API_KEY set (not empty)
  • [ ] SKIP_SIG_VERIFY=false
  • [ ] Fee treasury wallet set to production wallet
  • [ ] Operator keypair is production keypair (not staging)
  • [ ] Redis is production Memorystore (not staging)
  • [ ] Vault program deployed to production chain