Deploying a Region
This document describes the ordered steps to stand up a new region (or a new environment in an existing region). It assumes the architecture overview is understood. For infrastructure detail see Infrastructure (Terraform); for workload deployment see GitOps (ArgoCD + Helm).
Global Prerequisites
These must exist once, before any region is deployed. They are shared across all environments and regions.
| Prerequisite | Owner | Notes |
|---|---|---|
cogrion.com root domain registered (currently GoDaddy — where to manage this is undecided, see overview) | Platform | Registrar NS must point to whichever DNS provider manages the zone |
cogrion.com DNS zone active (currently Cloudflare) | Platform | Parent DNS zone for all services |
Cloudflare Workers deployed (auth.cogrion.com, cplane.cogrion.com/lookup) | Platform | Global auth proxy and tenant→region routing |
| Cloudflare KV namespace (tenant→region map) | Platform | Written to at workspace provision time |
Dashboard UI deployed (app.cogrion.com) | Platform | Cloudflare Pages or AWS equivalent |
Root CA in AWS Secrets Manager (ap-southeast-1) | Platform | Shared PKI root used by all regional OpenBao instances — TODO: link to mTLS / PKI doc |
| Primary region (sgp-1) live | Platform | All other regions call back to sgp-1 for principals resolution and provisioning metadata |
Per-Region Deployment Sequence
Each region goes through these phases in order. Steps within a phase may run in parallel where noted.
Phase 1 — AWS Infrastructure (Terraform)
Provisions the AWS primitives the Kubernetes layer runs on. See Infrastructure (Terraform).
- Copy
envs/prod-sgp-1/→envs/prod-{region}/, updatetfvarsandbackend.tf. - Run
terraform apply— provisions:- VPC, subnets, NAT, security groups
- EKS cluster + node groups
- RDS instance and parameter groups
- Route53 hosted zone (
{env.}{region}.cogrion.com) - S3 buckets (
cogrion-{env}-{region}-artifacts,cogrion-{env}-{region}-exports) - IAM roles (cluster, node, cross-account)
Phase 2 — DNS Delegation
- Add NS records in Cloudflare DNS pointing
{env.}{region}.cogrion.com→ the new Route53 zone nameservers.
Phase 3 — Cluster Bootstrap
- Bootstrap ArgoCD into the new cluster. See GitOps → Bootstrap: New Cluster.
- Apply cluster-level secrets (image pull, RDS credentials, inter-region service token).
Phase 4 — PKI and TLS
- Deploy OpenBao into the cluster.
- Import the shared root CA from AWS Secrets Manager into OpenBao's PKI backend (
pki→pki_int). TODO: link to OpenBao setup doc - Deploy cert-manager and configure the issuer to use OpenBao (for Cogrion-cluster subdomain certs).
Phase 5 — Platform Services (ArgoCD)
- Copy ArgoCD app manifests and Helm values from an existing region, update all domain and bucket values.
- Sync ArgoCD — deploys in dependency order:
- Keycloak (auth service)
- Temporal (server + workers)
- Control plane API
- Observability stack (Prometheus, Grafana, Loki)
Phase 6 — Region Registration
- Update Cloudflare KV to route new tenants to this region at signup.
- If this is a secondary region (not sgp-1): set
PRIMARY_CPLANE_API_URL=https://cplane.sgp-1.cogrion.comandINTER_REGION_SERVICE_TOKENin Helm values so the auth middleware resolves principals from sgp-1.
Phase 7 — Smoke Test
- Create a test tenant assigned to the new region and verify the full auth flow end-to-end.
- Provision a test workspace and confirm DNS delegation, ACM cert issuance, and cluster agent bootstrap token flow.
Environment vs Region
The steps above apply whether you are adding a new region (new geography, new cluster) or a new environment in an existing region (e.g. staging.sgp-1). The only difference:
- New environment: Phase 1 is a subset (no new VPC if sharing; new RDS and S3 only). Route53 zone is a subdomain of the existing regional zone.
- New region: Full Phase 1 including VPC and EKS. Phase 2 adds a new top-level NS delegation.
Teardown
To decommission a region, reverse the sequence: drain workspaces, remove the Cloudflare KV entry (stop new tenant assignments), delete ArgoCD apps, destroy Terraform state. The shared root CA and global Cloudflare layer are unaffected.