Airflow
Apache Airflow is the workflow orchestration engine. Data engineers author DAGs that define pipelines — ingestion, transformation, and ML training jobs — and Airflow schedules and executes them on Kubernetes.
Executor
Airflow uses the KubernetesExecutor. Each task runs in an isolated Kubernetes pod that is created on demand and destroyed on completion. There are no persistent worker nodes — compute scales to zero between runs.
DAG Delivery
DAGs are loaded via gitSync — a sidecar that continuously polls a configured Git repository and syncs DAG files into the Airflow scheduler. This means DAG deployments are git-push operations with no manual file copying.
| Config | Description |
|---|---|
dag_git_repo | Git repo URL |
dag_git_branch | Branch to track |
dag_git_sub_path | Subpath within the repo (optional) |
dag_git_ssh_key | Base64-encoded SSH private key for private repos |
Authentication
Airflow uses Keycloak OIDC. Realm roles are mapped to Airflow roles:
| Keycloak Realm Role | Airflow Role |
|---|---|
platform_admin | Admin |
data_engineer | Op (via workflow_editor) |
workflow_viewer | Viewer |
workflow_admin | Admin |
Storage
| Store | Purpose |
|---|---|
| PostgreSQL (KubeBlocks) | Airflow metadata — DAG state, task history, connections, variables |
| S3 log bucket | Remote task logs — stored in S3 so logs persist after a pod is destroyed |
Airflow has an IRSA role granting it read/write access to its S3 log bucket and read access to AWS Secrets Manager (for Datahub and other service connections seeded at deploy time).
Datahub Integration
Airflow connects to Datahub via a datahub_rest_default connection seeded into AWS Secrets Manager at Datahub deploy time. When the Datahub Airflow plugin is installed, DAG runs automatically emit table lineage events to Datahub GMS — making lineage visible in the Cogrion Catalog without any manual step.
Go Deeper
- Datahub — lineage is emitted to Datahub by the Airflow plugin
- Catalog — Lineage — where Airflow lineage appears in the UI
- Features — Workflow — the user-facing workflow feature