Capability-Centric Platform — Overview

The platform is a collection of data infrastructure stacks (Datahub, Airflow, Trino, JupyterHub, etc.). As the stack count grew, a structural problem emerged: the compose model required operators to think in terms of stack topology — which stacks exist, what they depend on, how to wire them together. This is the wrong abstraction for the intended user.

This document covers the problem, the decision, and the mental model that drives everything else in this ADR set. For what changes in platform-stacks see Bundle Structure. For what changes in the control plane see Control Plane.

The Problem

Integration logic lived inside app bundles

Cross-stack wiring was implemented as post-deploy phases inside the owning app's bundle:

aws/datahub post-deploy: seeds a PAT, registers Airflow as a lineage source, registers Trino as a data source
aws/ranger post-deploy: seeds default policies

This created two problems:

Redeploying the app reruns the seeds. The Datahub bundle had no way to distinguish "first install" from "upgrade". Seeds were protected only by Kubernetes Job semantics (ttlSecondsAfterFinished), which is implicit and fragile.
The owning app had to know about every app it connects to. The Datahub bundle contained Airflow-specific logic. If a workspace deployed Airflow without Datahub, or Datahub without Airflow, the bundle still carried dead configuration for the missing stack.

The compose was a wiring diagram, not a product surface

The compose file (compose/aws/delta-spark.yaml) was a list of stack slugs and dependency edges. Adding a new integration meant editing the compose YAML — an operator concern, not a user concern. There was no way to represent "this workspace uses data lineage" without knowing which two stacks data lineage connects.

The Decision

Users configure capabilities, not stacks. The platform is responsible for knowing which stacks a capability requires and how to wire them.

A user enabling "Data Lineage" does not need to know that this requires Datahub, Airflow, and a PAT seed job. They see a toggle. The platform handles the rest.

This is the same model VS Code uses for extensions: an extension (capability) has its own settings panel. The user enables it. The editor handles loading, dependency resolution, and lifecycle. The user never edits a wiring file.

Three-Layer Model

Every stack in the platform belongs to one of three layers:

Layer 1 — Infrastructure
  Shared cluster-level resources with no user-facing features.
  Examples: karpenter, spark-operator, observability, kafka

Layer 2 — Applications
  Self-contained services that expose a user-facing feature.
  No knowledge of other Layer 2 apps.
  Examples: datahub, airflow, jupyterhub, trino, superset

Layer 3 — Capabilities
  Cross-stack wiring that delivers a user-visible outcome.
  Depends on two or more Layer 2 apps being deployed.
  Contains no Helm releases — only Jobs, SparkApplications, API calls.
  Examples: data-lineage, pii-scanning, query-federation

Layer 2 apps are fully independent. A workspace can deploy Airflow without Datahub and vice versa. When both are present and the operator enables "Data Lineage", the Layer 3 capability stack deploys and wires them.

Capability Naming

Layer 3 bundles are named by the user-facing outcome, not by the stacks they connect.

Do	Don't
`aws/data-lineage`	`aws/datahub-airflow-integration`
`aws/pii-scanning`	`aws/datahub-spark-integration`
`aws/query-federation`	`aws/trino-hive-integration`
`aws/ranger-policies`	`aws/ranger-post-deploy`

This matters because the same capability may connect different stacks in different compose configurations. data-lineage might wire Datahub to Airflow in one workspace and Datahub to a different orchestrator in another. The name should survive that variation.

Two Orthogonal Dimensions

A platform configuration is described by two independent axes. Confusing them is the root cause of poorly named compose files and bloated bundles.

Dimension 1 — Architecture Variant (the compute model)

Represented by the compose kind. Defines which Layer 1 and Layer 2 stacks are present. Different variants use fundamentally different compute engines and cannot be derived from each other by toggling optional members.

Compose kind	Compute model	Key stacks
`aws/spark-platform`	Distributed Spark, Delta Lake	Karpenter (large pools), Spark operator, Hive Metastore, Trino, S3
`aws/serverless-platform`	Serverless, embedded analytics	Karpenter (minimal), DuckDB/MotherDuck, S3, JupyterHub

You cannot reach a serverless platform by disabling optional members in the Spark platform — the underlying infrastructure is different. These are separate compose kinds.

Dimension 2 — Capabilities (what features are enabled)

Represented by optional members within a compose kind. Layer 3 bundles that wire Layer 2 apps together. Any compose kind that shares the same Layer 2 apps can offer the same capabilities.

Capability	What it does	Required Layer 2 apps
Data Lineage	Airflow pipeline lineage in Datahub	datahub + airflow
PII Scanning	Scheduled PII detection across datasets	datahub + spark-operator
Catalog Ingestion	Hive and Trino source registration in Datahub	datahub + hive-metastore + trino
Ranger Policies	Default authorisation policy bootstrap	ranger

Why these are orthogonal

The same capability (data-lineage) can exist in any compose kind that includes both Datahub and Airflow. The architecture variant determines which stacks are available; the capability layer determines which cross-stack wiring is active. Neither axis implies the other.

                    Capabilities (optional members)
                    ────────────────────────────────────────►
                    none      data-lineage   pii-scan   full

Architecture  spark-platform   [ variant A ]  [ A + L ]  [ A + P ]  [ A + all ]
Variant       serverless        [ variant B ]  [ B + L ]     n/a     [ B + all ]
(compose kind)

pii-scanning is not available in serverless-platform because that variant has no Spark operator — the dependency is simply absent, and the optional member is omitted from that compose kind entirely.

Compose as a Capability Declaration

The compose file is a declaration of which capabilities are enabled for an architecture variant, not a wiring diagram.

# Before: wiring diagram
members:
  - name: datahub
    stackTemplateSlug: aws/datahub
  - name: airflow
    stackTemplateSlug: aws/airflow
    dependsOn: [storages]
  # (no explicit data lineage — it was buried in datahub's post-deploy)

# After: capability declaration
members:
  - name: datahub
    stackTemplateSlug: aws/datahub
  - name: airflow
    stackTemplateSlug: aws/airflow
    dependsOn: [storages]
  - name: pipeline-lineage
    stackTemplateSlug: aws/pipeline-lineage
    optional: true
    enabled: true
    label: "Pipeline Lineage"
    description: "Tracks Airflow DAG runs as lineage events in Datahub."
    group: "Data Lineage"
    dependsOn: [datahub, airflow]

  - name: job-lineage
    stackTemplateSlug: aws/job-lineage
    optional: true
    enabled: true
    label: "Job Lineage"
    description: "Captures Spark job read/write lineage via OpenLineage."
    group: "Data Lineage"
    dependsOn: [datahub, spark-operator]

  - name: query-lineage
    stackTemplateSlug: aws/query-lineage
    optional: true
    enabled: false
    label: "Query Lineage"
    description: "Captures SQL column-level lineage from Trino via OpenLineage."
    group: "Data Lineage"
    dependsOn: [datahub, trino]

A compose kind that does not include datahub simply omits data-lineage. Airflow is unmodified. Datahub is unmodified. The capability is absent because its dependencies are absent — no conditional logic required inside either app bundle.

Naming Conventions

Compose kinds — name by compute model, not technology stack

The technology inside the compose is an implementation detail. The name should describe the compute model the user is choosing.

Do	Don't
`aws/spark-platform`	`aws/delta-spark`
`aws/serverless-platform`	`aws/duckdb-serverless`

Layer 3 bundles — name by user-facing outcome

Do	Don't
`aws/data-lineage`	`aws/datahub-airflow-integration`
`aws/pii-scanning`	`aws/datahub-spark-integration`
`aws/query-federation`	`aws/trino-hive-integration`
`aws/ranger-policies`	`aws/ranger-post-deploy`

Summary

Before	After
Integration logic inside app bundles	Integration logic in dedicated Layer 3 bundles
Compose = stack wiring diagram	Compose = architecture variant + enabled capabilities
One dimension: which stacks	Two dimensions: compute model × feature set
Operator configures connections	User picks a platform, enables features
Bundle named by stacks it connects	Bundle named by outcome it delivers
Seeds rerun on every app redeploy	Seeds isolated to their own lifecycle

The Problem​

Integration logic lived inside app bundles​

The compose was a wiring diagram, not a product surface​

The Decision​

Three-Layer Model​

Capability Naming​

Two Orthogonal Dimensions​

Dimension 1 — Architecture Variant (the compute model)​

Dimension 2 — Capabilities (what features are enabled)​

Why these are orthogonal​

Compose as a Capability Declaration​

Naming Conventions​

Compose kinds — name by compute model, not technology stack​

Layer 3 bundles — name by user-facing outcome​

Summary​