GitOps (ArgoCD + Helm)
AWS infrastructure is provisioned first via Terraform (see Infrastructure (Terraform)). Everything below runs on top of that layer.
Two repositories:
| Repo | Purpose |
|---|---|
cogrion-app | Application source code, Dockerfiles, CI pipelines |
cogrion-infra | Helm charts, ArgoCD applications, environment values |
ArgoCD runs inside each cluster and watches cogrion-infra. It does not manage itself across clusters — each cluster's ArgoCD is bootstrapped once manually (or via Terraform) and then self-manages from the infra repo.
Repository: cogrion-infra
cogrion-infra/
├── argocd/
│ ├── bootstrap/ # one-time cluster bootstrap
│ │ ├── dev-sgp-1.yaml
│ │ └── prod-sgp-1.yaml
│ │
│ ├── root/ # App of Apps root — one per cluster
│ │ ├── dev-sgp-1.yaml
│ │ └── prod-sgp-1.yaml
│ │
│ └── apps/ # Individual ArgoCD Application manifests
│ ├── control-plane.yaml
│ ├── keycloak.yaml
│ ├── temporal.yaml
│ ├── observability.yaml
│ └── ingress.yaml
│
├── charts/ # Helm charts (source of truth)
│ ├── control-plane/
│ │ ├── Chart.yaml
│ │ ├── templates/
│ │ └── values.yaml # defaults only, no env-specific values
│ ├── keycloak/
│ ├── temporal/
│ ├── observability/
│ └── ingress/
│
└── values/ # Environment + region specific overrides
├── dev-sgp-1/
│ ├── control-plane.yaml
│ ├── keycloak.yaml
│ ├── temporal.yaml
│ ├── observability.yaml
│ └── ingress.yaml
└── prod-sgp-1/
├── control-plane.yaml
├── keycloak.yaml
├── temporal.yaml
├── observability.yaml
└── ingress.yaml
App of Apps Pattern
How it works
Each cluster has one root ArgoCD Application — the "App of Apps". It points at argocd/apps/ and renders all child Applications. Each child Application points at its Helm chart + the correct values file for that cluster.
ArgoCD (in cluster)
└── root app [argocd/root/prod-sgp-1.yaml]
├── control-plane app [argocd/apps/prod-sgp-1/control-plane.yaml]
├── keycloak app [argocd/apps/prod-sgp-1/keycloak.yaml]
├── temporal app [argocd/apps/prod-sgp-1/temporal.yaml]
├── observability app [argocd/apps/prod-sgp-1/observability.yaml]
└── ingress app [argocd/apps/prod-sgp-1/ingress.yaml]
Root Application
argocd/root/prod-sgp-1.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/cogrion/cogrion-infra
targetRevision: main
path: argocd/apps
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
This is the only manifest applied manually when bootstrapping a cluster. Everything else flows from it.
Child Application (example: control-plane)
argocd/apps/control-plane.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: control-plane
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/cogrion/cogrion-infra
targetRevision: main
path: charts/control-plane
helm:
valueFiles:
- ../../values/prod-sgp-1/control-plane.yaml # ← region+env specific
destination:
server: https://kubernetes.default.svc
namespace: control-plane
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
The region+env problem with App of Apps
Child Application manifests in argocd/apps/ reference a hardcoded values path (prod-sgp-1). This means argocd/apps/ cannot be shared as-is across clusters — each cluster needs its own version of the apps directory, or the path needs to be templated.
Recommended approach for a small team: one apps directory per cluster.
argocd/
apps/
dev-sgp-1/
control-plane.yaml # valueFiles: ../../values/dev-sgp-1/...
keycloak.yaml
temporal.yaml
observability.yaml
ingress.yaml
prod-sgp-1/
control-plane.yaml # valueFiles: ../../values/prod-sgp-1/...
keycloak.yaml
temporal.yaml
observability.yaml
ingress.yaml
Root app for each cluster points at its own apps directory:
# argocd/root/prod-sgp-1.yaml
path: argocd/apps/prod-sgp-1
# argocd/root/dev-sgp-1.yaml
path: argocd/apps/dev-sgp-1
The Helm charts themselves (charts/) remain shared and unmodified. Only the ArgoCD Application manifests and values files are per-cluster. Adding prod-fra-1 means:
- Copy
argocd/apps/prod-sgp-1/→argocd/apps/prod-fra-1/, update valueFiles paths - Copy
values/prod-sgp-1/→values/prod-fra-1/, update domain and bucket values - Create
argocd/root/prod-fra-1.yaml, update path - Bootstrap the new cluster with
argocd/bootstrap/prod-fra-1.yaml
Values Files
Values files override chart defaults for a specific environment and region. They are the only place environment-specific configuration lives.
Structure
values/prod-sgp-1/control-plane.yaml
global:
region: sgp-1
environment: prod
baseDomain: cplane.sgp-1.cogrion.com
workspaceDomain: sgp-1.cogrion.com
authDomain: auth.sgp-1.cogrion.com
temporalDomain: temporal.sgp-1.cogrion.com
grafanaDomain: grafana.sgp-1.cogrion.com
artifactsBucket: cogrion-prod-sgp-1-artifacts
exportsBucket: cogrion-prod-sgp-1-exports
image:
repository: ghcr.io/cogrion/control-plane
tag: "" # set by CI at deploy time
replicaCount: 2
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
memory: 1Gi
values/dev-sgp-1/control-plane.yaml
global:
region: sgp-1
environment: dev
baseDomain: cplane.dev.sgp-1.cogrion.com
workspaceDomain: dev.sgp-1.cogrion.com
authDomain: auth.dev.sgp-1.cogrion.com
temporalDomain: temporal.dev.sgp-1.cogrion.com
grafanaDomain: grafana.dev.sgp-1.cogrion.com
artifactsBucket: cogrion-dev-sgp-1-artifacts
exportsBucket: cogrion-dev-sgp-1-exports
image:
repository: ghcr.io/cogrion/control-plane
tag: ""
replicaCount: 1
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 512Mi
Secrets
Secrets are never stored in the infra repo. Use one of:
- AWS Secrets Manager + External Secrets Operator (recommended — you're already on AWS)
- Sealed Secrets if you prefer git-native
External Secrets pattern:
# charts/control-plane/templates/externalsecret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: control-plane-secrets
spec:
refreshInterval: 5m
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: control-plane-secrets
data:
- secretKey: DATABASE_URL
remoteRef:
key: cogrion/{{ .Values.global.environment }}/{{ .Values.global.region }}/database-url
- secretKey: KEYCLOAK_CLIENT_SECRET
remoteRef:
key: cogrion/{{ .Values.global.environment }}/{{ .Values.global.region }}/keycloak-client-secret
Secret paths follow the same {env}/{region} convention as everything else.
Repository: cogrion-app
cogrion-app/
├── services/
│ ├── control-plane/
│ ├── temporal-workers/
│ └── bff/ # Helm chart for client deployment
│
├── .github/
│ └── workflows/
│ ├── ci.yaml # test + build + push image
│ └── deploy.yaml # update image tag in infra repo
│
└── docker/
└── ...
CI/CD Flow
1. Engineer pushes to main (or PR merges)
2. GitHub Actions: test → build → push image to GHCR
tag: sha-{git_sha} (immutable, traceable)
3. GitHub Actions: open PR against cogrion-infra
updates image.tag in values/dev-sgp-1/control-plane.yaml
4. PR auto-merges (dev) or requires approval (prod)
5. ArgoCD detects change in infra repo, syncs cluster
Image tagging
ghcr.io/cogrion/control-plane:sha-a1b2c3d ← immutable, used everywhere
ghcr.io/cogrion/control-plane:latest ← never used in k8s manifests
Never use latest in values files. Every deployment is pinned to a specific SHA so rollback is a one-line revert in the infra repo.
Promoting dev → prod
dev cluster running: sha-a1b2c3d (auto-deployed on merge to main)
prod cluster running: sha-9f8e7d6 (last approved prod deploy)
Promote:
1. Open PR in cogrion-infra
values/prod-sgp-1/control-plane.yaml
image.tag: sha-a1b2c3d
2. Engineer reviews + approves
3. Merge → ArgoCD syncs prod cluster
No separate pipeline for prod — it's just a PR changing one line in the infra repo. Full audit trail in git.
Bootstrap: New Cluster
When spinning up a new cluster (e.g. prod-fra-1), AWS infrastructure must already be provisioned via Terraform first — see Infrastructure (Terraform) → Adding a New Region. Once the EKS cluster is live, continue here.
1. Infra repo — add cluster files
# Copy and update apps directory
cp -r argocd/apps/prod-sgp-1 argocd/apps/prod-fra-1
# Update valueFiles paths in each app to point at prod-fra-1
# Copy and update values
cp -r values/prod-sgp-1 values/prod-fra-1
# Update all domain and bucket names to fra-1
# Create root app manifest
cp argocd/root/prod-sgp-1.yaml argocd/root/prod-fra-1.yaml
# Update path: argocd/apps/prod-fra-1
# Create bootstrap manifest
cp argocd/bootstrap/prod-sgp-1.yaml argocd/bootstrap/prod-fra-1.yaml
# Update cluster name and server URL
2. Bootstrap ArgoCD into the new cluster
# Point kubectl at the new cluster
kubectl apply -f argocd/bootstrap/prod-fra-1.yaml
# Apply the root app — this is the only manual apply ever needed
kubectl apply -f argocd/root/prod-fra-1.yaml -n argocd
# ArgoCD takes over from here — all child apps deploy automatically
3. Verify
argocd app list
# Should show: root, control-plane, keycloak, temporal, observability, ingress
# All should reach Synced / Healthy within ~5 minutes
Branch Strategy
| Branch | Deploys to | Auto-sync |
|---|---|---|
main | dev-sgp-1 | Yes |
| PR against main | nothing (CI only) | No |
| Infra repo PR | prod (on merge, after approval) | Yes |
No long-lived environment branches. Environment differences live entirely in values files, not in code branches.
Directory Reference
cogrion-infra/
├── argocd/
│ ├── bootstrap/ # kubectl apply once per cluster
│ │ ├── dev-sgp-1.yaml
│ │ └── prod-sgp-1.yaml
│ ├── root/ # App of Apps root per cluster
│ │ ├── dev-sgp-1.yaml
│ │ └── prod-sgp-1.yaml
│ └── apps/ # Child Application manifests
│ ├── dev-sgp-1/
│ │ ├── control-plane.yaml
│ │ ├── keycloak.yaml
│ │ ├── temporal.yaml
│ │ ├── observability.yaml
│ │ └── ingress.yaml
│ └── prod-sgp-1/
│ ├── control-plane.yaml
│ ├── keycloak.yaml
│ ├── temporal.yaml
│ ├── observability.yaml
│ └── ingress.yaml
│
├── charts/ # Helm charts — shared, no env values
│ ├── control-plane/
│ ├── keycloak/
│ ├── temporal/
│ ├── observability/
│ └── ingress/
│
└── values/ # All env+region specific config
├── dev-sgp-1/
│ ├── control-plane.yaml
│ ├── keycloak.yaml
│ ├── temporal.yaml
│ ├── observability.yaml
│ └── ingress.yaml
└── prod-sgp-1/
├── control-plane.yaml
├── keycloak.yaml
├── temporal.yaml
├── observability.yaml
└── ingress.yaml
Summary
cogrion-infrais the single source of truth for all cluster state- ArgoCD in each cluster watches its own slice of the infra repo
- One
argocd/apps/{cluster}/directory per cluster — no templating complexity - One
values/{cluster}/directory per cluster — all env/region config here - Helm charts in
charts/are shared and contain no environment-specific values - Secrets never in git — AWS Secrets Manager via External Secrets Operator
- Image tags are immutable SHAs — dev auto-deploys, prod requires PR approval
- Adding a region = copy two directories, update values, bootstrap one cluster