Capability-Centric Platform — Product SKUs
A SKU is a specific combination of an architecture variant (compute model) and a set of enabled capabilities. The two-dimension model means SKUs are not hardcoded products — they emerge from which variant a workspace is on and which optional members are toggled on.
Architecture Variants
Full distributed analytics platform. Spark, Delta Lake, Hive Metastore, Trino. For teams running large-scale batch processing and SQL analytics across large datasets. Replaces platform-delta-spark-aws.
Lightweight embedded analytics. No Spark operator, no Hive Metastore. DuckDB/MotherDuck + S3. For teams that need interactive SQL without cluster-scale infrastructure. Status: planned.
Always included
| Member | Stack | Purpose |
|---|
| storages | aws/storages | S3 buckets |
| karpenter | aws/karpenter | Node autoscaling |
| kafka | aws/kafka | Event streaming |
| compute-profile | aws/compute-profiles | Node pool size presets |
| observability | aws/observability | Metrics, logs |
| spark-operator | aws/spark-operator | Spark execution |
| spark-team | aws/spark-team | Spark namespace + IRSA |
| hive-metastore | aws/hive-metastore | Table metadata |
| airflow | aws/airflow | Orchestration |
| jupyterhub | aws/jupyterhub | Notebooks |
| workspace-file-management | aws/workspace-file-management | File sync |
Optional — Applications (Layer 2)
| Member | Stack | Group | Default |
|---|
| trino | aws/trino | Query Engine | on |
| superset | aws/superset | BI & Dashboards | on |
| dashboard-access-management | aws/dashboard-access-management | BI & Dashboards | on |
| bff | aws/quantdata-bff | Platform API | on |
| ranger | aws/ranger | Access Control | off |
| datahub | aws/datahub | Data Catalog | off |
| mlflow | aws/mlflow | ML Tracking | off |
Optional — Capabilities (Layer 3)
| Label | Bundle slug | Requires | Default |
|---|
| Fine-Grained Access Control | aws/ranger-fgac-spark | ranger + trino | off |
| Pipeline Lineage (group: Data Lineage) | aws/pipeline-lineage | datahub + airflow | off |
| Job Lineage (group: Data Lineage) | aws/job-lineage | datahub + spark-operator | off |
| Query Lineage (group: Data Lineage) | aws/query-lineage | datahub + trino | off |
| PII Scanning | aws/pii-scanning | datahub + spark-operator | off |
| Data Catalog Ingestion | aws/data-catalog-ingestion | datahub + hive-metastore + trino | off |
Named SKU Groupings
Not enforced tiers — named presets used when provisioning a new workspace.
| SKU | Adds to previous |
|---|
| Spark Core | always-included + trino + superset + bff |
| Spark Analytics | Spark Core + ranger + fine-grained-access-control |
| Spark Intelligence | Spark Analytics + datahub + pipeline-lineage + job-lineage + data-catalog-ingestion + pii-scanning |
| Spark ML | Spark Core + mlflow |
| Member | Notes |
|---|
| storages, karpenter (minimal), observability | Same as spark-platform |
| airflow | Lighter DAGs, no Spark submit |
| jupyterhub | DuckDB kernel |
| duckdb / motherduck | Embedded SQL engine |
| Fine-Grained Access Control | Different implementation than Ranger — TBD |
The FGAC capability label is identical to spark-platform. The bundle slug and implementation differ. The user sees one toggle either way.
Naming Reference
| Thing | Named by | Example |
|---|
| Architecture variant | Compute model | aws/spark-platform |
| Optional member label | User-facing capability | "Fine-Grained Access Control" |
| Layer 3 bundle slug | Implementation + context | aws/ranger-fgac-spark |
| SKU | Variant + capability preset | Spark Intelligence |