Production Deployment Checklist
Zero-Downtime Deploy Strategy
Cloud Run Settings (Production)
# Engine: single-instance, no-cpu-throttling, graceful shutdown
gcloud run deploy oracle-engine \
--min-instances=1 --max-instances=1 \
--no-cpu-throttling \
--timeout=300 \
--revision-suffix=$(date +%Y%m%d-%H%M%S)
# Gateway: multi-instance, gradual rollout
gcloud run deploy oracle-gateway \
--min-instances=1 --max-instances=5 \
--no-traffic \ # Deploy without routing traffic
# Then: gcloud run services update-traffic oracle-gateway --to-revisions=NEW=100
# WS: session affinity, min-instances for always-on connections
gcloud run deploy oracle-ws \
--min-instances=1 --max-instances=3 \
--session-affinity
# Settlement: single-instance (one writer to chain)
gcloud run deploy oracle-settlement \
--min-instances=1 --max-instances=1 \
--no-cpu-throttling
What Protects Existing State
-
Engine checkpoint on SIGTERM: When Cloud Run sends SIGTERM (during deploy), the engine saves all state to Redis before exiting. The new revision loads from Redis on startup. No data loss.
-
Redis is external: All state lives in Redis (Memorystore), not in the container. Containers are disposable.
-
Gateway is stateless: No state to lose. Just proxies to Redis.
-
Settlement idempotent batches: Batch IDs are sequential. If a batch is partially submitted and the service restarts, the next instance skips already-submitted batches (on-chain batch_id check).
-
Frontend versioning: Static assets are served from nginx. Old assets remain cached in browsers until hard refresh. New deploys only affect new page loads.
Deploy Procedure (Production)
# 1. Merge staging → main
git checkout main && git merge staging && git push origin main
# 2. Cloud Build triggers on main branch deploy to prod
# (Create separate triggers for main branch → prod GCP project)
# 3. Verify health
curl https://api.parti.com/v1/health
# 4. Verify state restored
curl https://api.parti.com/v1/markets | jq '.markets | length'
Rollback
# Instant rollback — route traffic to previous revision
gcloud run services update-traffic oracle-engine \
--to-revisions=oracle-engine-PREVIOUS=100
Environment Variables (Production)
# Engine
SKIP_SIG_VERIFY=false # ENFORCE signatures in prod
ADMIN_API_KEY=<strong-random-key>
RUST_LOG=info
# Gateway
ADMIN_API_KEY=<same-key>
# Settlement
CHAIN=fogo # or solana for mainnet
Pre-Deploy Checklist
- [ ] All tests pass (
cargo teston engine) - [ ]
cargo auditclean (or known exceptions documented) - [ ]
npm auditclean on frontend - [ ] ADMIN_API_KEY set (not empty)
- [ ] SKIP_SIG_VERIFY=false
- [ ] Fee treasury wallet set to production wallet
- [ ] Operator keypair is production keypair (not staging)
- [ ] Redis is production Memorystore (not staging)
- [ ] Vault program deployed to production chain