Skip to main content

GitOps (ArgoCD + Helm)

AWS infrastructure is provisioned first via Terraform (see Infrastructure (Terraform)). Everything below runs on top of that layer.

Two repositories:

RepoPurpose
cogrion-appApplication source code, Dockerfiles, CI pipelines
cogrion-infraHelm charts, ArgoCD applications, environment values

ArgoCD runs inside each cluster and watches cogrion-infra. It does not manage itself across clusters — each cluster's ArgoCD is bootstrapped once manually (or via Terraform) and then self-manages from the infra repo.


Repository: cogrion-infra

cogrion-infra/
├── argocd/
│ ├── bootstrap/ # one-time cluster bootstrap
│ │ ├── dev-sgp-1.yaml
│ │ └── prod-sgp-1.yaml
│ │
│ ├── root/ # App of Apps root — one per cluster
│ │ ├── dev-sgp-1.yaml
│ │ └── prod-sgp-1.yaml
│ │
│ └── apps/ # Individual ArgoCD Application manifests
│ ├── control-plane.yaml
│ ├── keycloak.yaml
│ ├── temporal.yaml
│ ├── observability.yaml
│ └── ingress.yaml

├── charts/ # Helm charts (source of truth)
│ ├── control-plane/
│ │ ├── Chart.yaml
│ │ ├── templates/
│ │ └── values.yaml # defaults only, no env-specific values
│ ├── keycloak/
│ ├── temporal/
│ ├── observability/
│ └── ingress/

└── values/ # Environment + region specific overrides
├── dev-sgp-1/
│ ├── control-plane.yaml
│ ├── keycloak.yaml
│ ├── temporal.yaml
│ ├── observability.yaml
│ └── ingress.yaml
└── prod-sgp-1/
├── control-plane.yaml
├── keycloak.yaml
├── temporal.yaml
├── observability.yaml
└── ingress.yaml

App of Apps Pattern

How it works

Each cluster has one root ArgoCD Application — the "App of Apps". It points at argocd/apps/ and renders all child Applications. Each child Application points at its Helm chart + the correct values file for that cluster.

ArgoCD (in cluster)
└── root app [argocd/root/prod-sgp-1.yaml]
├── control-plane app [argocd/apps/prod-sgp-1/control-plane.yaml]
├── keycloak app [argocd/apps/prod-sgp-1/keycloak.yaml]
├── temporal app [argocd/apps/prod-sgp-1/temporal.yaml]
├── observability app [argocd/apps/prod-sgp-1/observability.yaml]
└── ingress app [argocd/apps/prod-sgp-1/ingress.yaml]

Root Application

argocd/root/prod-sgp-1.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/cogrion/cogrion-infra
targetRevision: main
path: argocd/apps
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true

This is the only manifest applied manually when bootstrapping a cluster. Everything else flows from it.

Child Application (example: control-plane)

argocd/apps/control-plane.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: control-plane
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/cogrion/cogrion-infra
targetRevision: main
path: charts/control-plane
helm:
valueFiles:
- ../../values/prod-sgp-1/control-plane.yaml # ← region+env specific
destination:
server: https://kubernetes.default.svc
namespace: control-plane
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

The region+env problem with App of Apps

Child Application manifests in argocd/apps/ reference a hardcoded values path (prod-sgp-1). This means argocd/apps/ cannot be shared as-is across clusters — each cluster needs its own version of the apps directory, or the path needs to be templated.

Recommended approach for a small team: one apps directory per cluster.

argocd/
apps/
dev-sgp-1/
control-plane.yaml # valueFiles: ../../values/dev-sgp-1/...
keycloak.yaml
temporal.yaml
observability.yaml
ingress.yaml
prod-sgp-1/
control-plane.yaml # valueFiles: ../../values/prod-sgp-1/...
keycloak.yaml
temporal.yaml
observability.yaml
ingress.yaml

Root app for each cluster points at its own apps directory:

# argocd/root/prod-sgp-1.yaml
path: argocd/apps/prod-sgp-1

# argocd/root/dev-sgp-1.yaml
path: argocd/apps/dev-sgp-1

The Helm charts themselves (charts/) remain shared and unmodified. Only the ArgoCD Application manifests and values files are per-cluster. Adding prod-fra-1 means:

  1. Copy argocd/apps/prod-sgp-1/argocd/apps/prod-fra-1/, update valueFiles paths
  2. Copy values/prod-sgp-1/values/prod-fra-1/, update domain and bucket values
  3. Create argocd/root/prod-fra-1.yaml, update path
  4. Bootstrap the new cluster with argocd/bootstrap/prod-fra-1.yaml

Values Files

Values files override chart defaults for a specific environment and region. They are the only place environment-specific configuration lives.

Structure

values/prod-sgp-1/control-plane.yaml

global:
region: sgp-1
environment: prod
baseDomain: cplane.sgp-1.cogrion.com
workspaceDomain: sgp-1.cogrion.com
authDomain: auth.sgp-1.cogrion.com
temporalDomain: temporal.sgp-1.cogrion.com
grafanaDomain: grafana.sgp-1.cogrion.com
artifactsBucket: cogrion-prod-sgp-1-artifacts
exportsBucket: cogrion-prod-sgp-1-exports

image:
repository: ghcr.io/cogrion/control-plane
tag: "" # set by CI at deploy time

replicaCount: 2

resources:
requests:
cpu: 250m
memory: 512Mi
limits:
memory: 1Gi

values/dev-sgp-1/control-plane.yaml

global:
region: sgp-1
environment: dev
baseDomain: cplane.dev.sgp-1.cogrion.com
workspaceDomain: dev.sgp-1.cogrion.com
authDomain: auth.dev.sgp-1.cogrion.com
temporalDomain: temporal.dev.sgp-1.cogrion.com
grafanaDomain: grafana.dev.sgp-1.cogrion.com
artifactsBucket: cogrion-dev-sgp-1-artifacts
exportsBucket: cogrion-dev-sgp-1-exports

image:
repository: ghcr.io/cogrion/control-plane
tag: ""

replicaCount: 1

resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 512Mi

Secrets

Secrets are never stored in the infra repo. Use one of:

  • AWS Secrets Manager + External Secrets Operator (recommended — you're already on AWS)
  • Sealed Secrets if you prefer git-native

External Secrets pattern:

# charts/control-plane/templates/externalsecret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: control-plane-secrets
spec:
refreshInterval: 5m
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: control-plane-secrets
data:
- secretKey: DATABASE_URL
remoteRef:
key: cogrion/{{ .Values.global.environment }}/{{ .Values.global.region }}/database-url
- secretKey: KEYCLOAK_CLIENT_SECRET
remoteRef:
key: cogrion/{{ .Values.global.environment }}/{{ .Values.global.region }}/keycloak-client-secret

Secret paths follow the same {env}/{region} convention as everything else.


Repository: cogrion-app

cogrion-app/
├── services/
│ ├── control-plane/
│ ├── temporal-workers/
│ └── bff/ # Helm chart for client deployment

├── .github/
│ └── workflows/
│ ├── ci.yaml # test + build + push image
│ └── deploy.yaml # update image tag in infra repo

└── docker/
└── ...

CI/CD Flow

1. Engineer pushes to main (or PR merges)
2. GitHub Actions: test → build → push image to GHCR
tag: sha-{git_sha} (immutable, traceable)
3. GitHub Actions: open PR against cogrion-infra
updates image.tag in values/dev-sgp-1/control-plane.yaml
4. PR auto-merges (dev) or requires approval (prod)
5. ArgoCD detects change in infra repo, syncs cluster

Image tagging

ghcr.io/cogrion/control-plane:sha-a1b2c3d ← immutable, used everywhere
ghcr.io/cogrion/control-plane:latest ← never used in k8s manifests

Never use latest in values files. Every deployment is pinned to a specific SHA so rollback is a one-line revert in the infra repo.

Promoting dev → prod

dev cluster running: sha-a1b2c3d (auto-deployed on merge to main)
prod cluster running: sha-9f8e7d6 (last approved prod deploy)

Promote:
1. Open PR in cogrion-infra
values/prod-sgp-1/control-plane.yaml
image.tag: sha-a1b2c3d
2. Engineer reviews + approves
3. Merge → ArgoCD syncs prod cluster

No separate pipeline for prod — it's just a PR changing one line in the infra repo. Full audit trail in git.


Bootstrap: New Cluster

When spinning up a new cluster (e.g. prod-fra-1), AWS infrastructure must already be provisioned via Terraform first — see Infrastructure (Terraform) → Adding a New Region. Once the EKS cluster is live, continue here.

1. Infra repo — add cluster files

# Copy and update apps directory
cp -r argocd/apps/prod-sgp-1 argocd/apps/prod-fra-1
# Update valueFiles paths in each app to point at prod-fra-1

# Copy and update values
cp -r values/prod-sgp-1 values/prod-fra-1
# Update all domain and bucket names to fra-1

# Create root app manifest
cp argocd/root/prod-sgp-1.yaml argocd/root/prod-fra-1.yaml
# Update path: argocd/apps/prod-fra-1

# Create bootstrap manifest
cp argocd/bootstrap/prod-sgp-1.yaml argocd/bootstrap/prod-fra-1.yaml
# Update cluster name and server URL

2. Bootstrap ArgoCD into the new cluster

# Point kubectl at the new cluster
kubectl apply -f argocd/bootstrap/prod-fra-1.yaml

# Apply the root app — this is the only manual apply ever needed
kubectl apply -f argocd/root/prod-fra-1.yaml -n argocd

# ArgoCD takes over from here — all child apps deploy automatically

3. Verify

argocd app list
# Should show: root, control-plane, keycloak, temporal, observability, ingress
# All should reach Synced / Healthy within ~5 minutes

Branch Strategy

BranchDeploys toAuto-sync
maindev-sgp-1Yes
PR against mainnothing (CI only)No
Infra repo PRprod (on merge, after approval)Yes

No long-lived environment branches. Environment differences live entirely in values files, not in code branches.


Directory Reference

cogrion-infra/
├── argocd/
│ ├── bootstrap/ # kubectl apply once per cluster
│ │ ├── dev-sgp-1.yaml
│ │ └── prod-sgp-1.yaml
│ ├── root/ # App of Apps root per cluster
│ │ ├── dev-sgp-1.yaml
│ │ └── prod-sgp-1.yaml
│ └── apps/ # Child Application manifests
│ ├── dev-sgp-1/
│ │ ├── control-plane.yaml
│ │ ├── keycloak.yaml
│ │ ├── temporal.yaml
│ │ ├── observability.yaml
│ │ └── ingress.yaml
│ └── prod-sgp-1/
│ ├── control-plane.yaml
│ ├── keycloak.yaml
│ ├── temporal.yaml
│ ├── observability.yaml
│ └── ingress.yaml

├── charts/ # Helm charts — shared, no env values
│ ├── control-plane/
│ ├── keycloak/
│ ├── temporal/
│ ├── observability/
│ └── ingress/

└── values/ # All env+region specific config
├── dev-sgp-1/
│ ├── control-plane.yaml
│ ├── keycloak.yaml
│ ├── temporal.yaml
│ ├── observability.yaml
│ └── ingress.yaml
└── prod-sgp-1/
├── control-plane.yaml
├── keycloak.yaml
├── temporal.yaml
├── observability.yaml
└── ingress.yaml

Summary

  • cogrion-infra is the single source of truth for all cluster state
  • ArgoCD in each cluster watches its own slice of the infra repo
  • One argocd/apps/{cluster}/ directory per cluster — no templating complexity
  • One values/{cluster}/ directory per cluster — all env/region config here
  • Helm charts in charts/ are shared and contain no environment-specific values
  • Secrets never in git — AWS Secrets Manager via External Secrets Operator
  • Image tags are immutable SHAs — dev auto-deploys, prod requires PR approval
  • Adding a region = copy two directories, update values, bootstrap one cluster