Capability-Centric Platform — Overview
The platform is a collection of data infrastructure stacks (Datahub, Airflow, Trino, JupyterHub, etc.). As the stack count grew, a structural problem emerged: the compose model required operators to think in terms of stack topology — which stacks exist, what they depend on, how to wire them together. This is the wrong abstraction for the intended user.
This document covers the problem, the decision, and the mental model that drives everything else in this ADR set. For what changes in platform-stacks see Bundle Structure. For what changes in the control plane see Control Plane.
The Problem
Integration logic lived inside app bundles
Cross-stack wiring was implemented as post-deploy phases inside the owning app's bundle:
aws/datahubpost-deploy: seeds a PAT, registers Airflow as a lineage source, registers Trino as a data sourceaws/rangerpost-deploy: seeds default policies
This created two problems:
-
Redeploying the app reruns the seeds. The Datahub bundle had no way to distinguish "first install" from "upgrade". Seeds were protected only by Kubernetes Job semantics (
ttlSecondsAfterFinished), which is implicit and fragile. -
The owning app had to know about every app it connects to. The Datahub bundle contained Airflow-specific logic. If a workspace deployed Airflow without Datahub, or Datahub without Airflow, the bundle still carried dead configuration for the missing stack.
The compose was a wiring diagram, not a product surface
The compose file (compose/aws/delta-spark.yaml) was a list of stack slugs and dependency edges. Adding a new integration meant editing the compose YAML — an operator concern, not a user concern. There was no way to represent "this workspace uses data lineage" without knowing which two stacks data lineage connects.
The Decision
Users configure capabilities, not stacks. The platform is responsible for knowing which stacks a capability requires and how to wire them.
A user enabling "Data Lineage" does not need to know that this requires Datahub, Airflow, and a PAT seed job. They see a toggle. The platform handles the rest.
This is the same model VS Code uses for extensions: an extension (capability) has its own settings panel. The user enables it. The editor handles loading, dependency resolution, and lifecycle. The user never edits a wiring file.
Three-Layer Model
Every stack in the platform belongs to one of three layers:
Layer 1 — Infrastructure
Shared cluster-level resources with no user-facing features.
Examples: karpenter, spark-operator, observability, kafka
Layer 2 — Applications
Self-contained services that expose a user-facing feature.
No knowledge of other Layer 2 apps.
Examples: datahub, airflow, jupyterhub, trino, superset
Layer 3 — Capabilities
Cross-stack wiring that delivers a user-visible outcome.
Depends on two or more Layer 2 apps being deployed.
Contains no Helm releases — only Jobs, SparkApplications, API calls.
Examples: data-lineage, pii-scanning, query-federation
Layer 2 apps are fully independent. A workspace can deploy Airflow without Datahub and vice versa. When both are present and the operator enables "Data Lineage", the Layer 3 capability stack deploys and wires them.
Capability Naming
Layer 3 bundles are named by the user-facing outcome, not by the stacks they connect.
| Do | Don't |
|---|---|
aws/data-lineage | aws/datahub-airflow-integration |
aws/pii-scanning | aws/datahub-spark-integration |
aws/query-federation | aws/trino-hive-integration |
aws/ranger-policies | aws/ranger-post-deploy |
This matters because the same capability may connect different stacks in different compose configurations. data-lineage might wire Datahub to Airflow in one workspace and Datahub to a different orchestrator in another. The name should survive that variation.
Two Orthogonal Dimensions
A platform configuration is described by two independent axes. Confusing them is the root cause of poorly named compose files and bloated bundles.
Dimension 1 — Architecture Variant (the compute model)
Represented by the compose kind. Defines which Layer 1 and Layer 2 stacks are present. Different variants use fundamentally different compute engines and cannot be derived from each other by toggling optional members.
| Compose kind | Compute model | Key stacks |
|---|---|---|
aws/spark-platform | Distributed Spark, Delta Lake | Karpenter (large pools), Spark operator, Hive Metastore, Trino, S3 |
aws/serverless-platform | Serverless, embedded analytics | Karpenter (minimal), DuckDB/MotherDuck, S3, JupyterHub |
You cannot reach a serverless platform by disabling optional members in the Spark platform — the underlying infrastructure is different. These are separate compose kinds.
Dimension 2 — Capabilities (what features are enabled)
Represented by optional members within a compose kind. Layer 3 bundles that wire Layer 2 apps together. Any compose kind that shares the same Layer 2 apps can offer the same capabilities.
| Capability | What it does | Required Layer 2 apps |
|---|---|---|
| Data Lineage | Airflow pipeline lineage in Datahub | datahub + airflow |
| PII Scanning | Scheduled PII detection across datasets | datahub + spark-operator |
| Catalog Ingestion | Hive and Trino source registration in Datahub | datahub + hive-metastore + trino |
| Ranger Policies | Default authorisation policy bootstrap | ranger |
Why these are orthogonal
The same capability (data-lineage) can exist in any compose kind that includes both Datahub and Airflow. The architecture variant determines which stacks are available; the capability layer determines which cross-stack wiring is active. Neither axis implies the other.
Capabilities (optional members)
────────────────────────────────────────►
none data-lineage pii-scan full
Architecture spark-platform [ variant A ] [ A + L ] [ A + P ] [ A + all ]
Variant serverless [ variant B ] [ B + L ] n/a [ B + all ]
(compose kind)
pii-scanning is not available in serverless-platform because that variant has no Spark operator — the dependency is simply absent, and the optional member is omitted from that compose kind entirely.
Compose as a Capability Declaration
The compose file is a declaration of which capabilities are enabled for an architecture variant, not a wiring diagram.
# Before: wiring diagram
members:
- name: datahub
stackTemplateSlug: aws/datahub
- name: airflow
stackTemplateSlug: aws/airflow
dependsOn: [storages]
# (no explicit data lineage — it was buried in datahub's post-deploy)
# After: capability declaration
members:
- name: datahub
stackTemplateSlug: aws/datahub
- name: airflow
stackTemplateSlug: aws/airflow
dependsOn: [storages]
- name: pipeline-lineage
stackTemplateSlug: aws/pipeline-lineage
optional: true
enabled: true
label: "Pipeline Lineage"
description: "Tracks Airflow DAG runs as lineage events in Datahub."
group: "Data Lineage"
dependsOn: [datahub, airflow]
- name: job-lineage
stackTemplateSlug: aws/job-lineage
optional: true
enabled: true
label: "Job Lineage"
description: "Captures Spark job read/write lineage via OpenLineage."
group: "Data Lineage"
dependsOn: [datahub, spark-operator]
- name: query-lineage
stackTemplateSlug: aws/query-lineage
optional: true
enabled: false
label: "Query Lineage"
description: "Captures SQL column-level lineage from Trino via OpenLineage."
group: "Data Lineage"
dependsOn: [datahub, trino]
A compose kind that does not include datahub simply omits data-lineage. Airflow is unmodified. Datahub is unmodified. The capability is absent because its dependencies are absent — no conditional logic required inside either app bundle.
Naming Conventions
Compose kinds — name by compute model, not technology stack
The technology inside the compose is an implementation detail. The name should describe the compute model the user is choosing.
| Do | Don't |
|---|---|
aws/spark-platform | aws/delta-spark |
aws/serverless-platform | aws/duckdb-serverless |
Layer 3 bundles — name by user-facing outcome
| Do | Don't |
|---|---|
aws/data-lineage | aws/datahub-airflow-integration |
aws/pii-scanning | aws/datahub-spark-integration |
aws/query-federation | aws/trino-hive-integration |
aws/ranger-policies | aws/ranger-post-deploy |
Summary
| Before | After |
|---|---|
| Integration logic inside app bundles | Integration logic in dedicated Layer 3 bundles |
| Compose = stack wiring diagram | Compose = architecture variant + enabled capabilities |
| One dimension: which stacks | Two dimensions: compute model × feature set |
| Operator configures connections | User picks a platform, enables features |
| Bundle named by stacks it connects | Bundle named by outcome it delivers |
| Seeds rerun on every app redeploy | Seeds isolated to their own lifecycle |