Skip to main content

Deploying a Region

This document describes the ordered steps to stand up a new region (or a new environment in an existing region). It assumes the architecture overview is understood. For infrastructure detail see Infrastructure (Terraform); for workload deployment see GitOps (ArgoCD + Helm).


Global Prerequisites

These must exist once, before any region is deployed. They are shared across all environments and regions.

PrerequisiteOwnerNotes
cogrion.com root domain registered (currently GoDaddy — where to manage this is undecided, see overview)PlatformRegistrar NS must point to whichever DNS provider manages the zone
cogrion.com DNS zone active (currently Cloudflare)PlatformParent DNS zone for all services
Cloudflare Workers deployed (auth.cogrion.com, cplane.cogrion.com/lookup)PlatformGlobal auth proxy and tenant→region routing
Cloudflare KV namespace (tenant→region map)PlatformWritten to at workspace provision time
Dashboard UI deployed (app.cogrion.com)PlatformCloudflare Pages or AWS equivalent
Root CA in AWS Secrets Manager (ap-southeast-1)PlatformShared PKI root used by all regional OpenBao instances — TODO: link to mTLS / PKI doc
Primary region (sgp-1) livePlatformAll other regions call back to sgp-1 for principals resolution and provisioning metadata

Per-Region Deployment Sequence

Each region goes through these phases in order. Steps within a phase may run in parallel where noted.

Phase 1 — AWS Infrastructure (Terraform)

Provisions the AWS primitives the Kubernetes layer runs on. See Infrastructure (Terraform).

  1. Copy envs/prod-sgp-1/envs/prod-{region}/, update tfvars and backend.tf.
  2. Run terraform apply — provisions:
    • VPC, subnets, NAT, security groups
    • EKS cluster + node groups
    • RDS instance and parameter groups
    • Route53 hosted zone ({env.}{region}.cogrion.com)
    • S3 buckets (cogrion-{env}-{region}-artifacts, cogrion-{env}-{region}-exports)
    • IAM roles (cluster, node, cross-account)

Phase 2 — DNS Delegation

  1. Add NS records in Cloudflare DNS pointing {env.}{region}.cogrion.com → the new Route53 zone nameservers.

Phase 3 — Cluster Bootstrap

  1. Bootstrap ArgoCD into the new cluster. See GitOps → Bootstrap: New Cluster.
  2. Apply cluster-level secrets (image pull, RDS credentials, inter-region service token).

Phase 4 — PKI and TLS

  1. Deploy OpenBao into the cluster.
  2. Import the shared root CA from AWS Secrets Manager into OpenBao's PKI backend (pkipki_int). TODO: link to OpenBao setup doc
  3. Deploy cert-manager and configure the issuer to use OpenBao (for Cogrion-cluster subdomain certs).

Phase 5 — Platform Services (ArgoCD)

  1. Copy ArgoCD app manifests and Helm values from an existing region, update all domain and bucket values.
  2. Sync ArgoCD — deploys in dependency order:
    • Keycloak (auth service)
    • Temporal (server + workers)
    • Control plane API
    • Observability stack (Prometheus, Grafana, Loki)

Phase 6 — Region Registration

  1. Update Cloudflare KV to route new tenants to this region at signup.
  2. If this is a secondary region (not sgp-1): set PRIMARY_CPLANE_API_URL=https://cplane.sgp-1.cogrion.com and INTER_REGION_SERVICE_TOKEN in Helm values so the auth middleware resolves principals from sgp-1.

Phase 7 — Smoke Test

  1. Create a test tenant assigned to the new region and verify the full auth flow end-to-end.
  2. Provision a test workspace and confirm DNS delegation, ACM cert issuance, and cluster agent bootstrap token flow.

Environment vs Region

The steps above apply whether you are adding a new region (new geography, new cluster) or a new environment in an existing region (e.g. staging.sgp-1). The only difference:

  • New environment: Phase 1 is a subset (no new VPC if sharing; new RDS and S3 only). Route53 zone is a subdomain of the existing regional zone.
  • New region: Full Phase 1 including VPC and EKS. Phase 2 adds a new top-level NS delegation.

Teardown

To decommission a region, reverse the sequence: drain workspaces, remove the Cloudflare KV entry (stop new tenant assignments), delete ArgoCD apps, destroy Terraform state. The shared root CA and global Cloudflare layer are unaffected.