Skip to main content

Capability-Centric Platform — Product SKUs

A SKU is a specific combination of an architecture variant (compute model) and a set of enabled capabilities. The two-dimension model means SKUs are not hardcoded products — they emerge from which variant a workspace is on and which optional members are toggled on.


Architecture Variants

aws/spark-platform

Full distributed analytics platform. Spark, Delta Lake, Hive Metastore, Trino. For teams running large-scale batch processing and SQL analytics across large datasets. Replaces platform-delta-spark-aws.

aws/serverless-platform

Lightweight embedded analytics. No Spark operator, no Hive Metastore. DuckDB/MotherDuck + S3. For teams that need interactive SQL without cluster-scale infrastructure. Status: planned.


aws/spark-platform — Member Breakdown

Always included

MemberStackPurpose
storagesaws/storagesS3 buckets
karpenteraws/karpenterNode autoscaling
kafkaaws/kafkaEvent streaming
compute-profileaws/compute-profilesNode pool size presets
observabilityaws/observabilityMetrics, logs
spark-operatoraws/spark-operatorSpark execution
spark-teamaws/spark-teamSpark namespace + IRSA
hive-metastoreaws/hive-metastoreTable metadata
airflowaws/airflowOrchestration
jupyterhubaws/jupyterhubNotebooks
workspace-file-managementaws/workspace-file-managementFile sync

Optional — Applications (Layer 2)

MemberStackGroupDefault
trinoaws/trinoQuery Engineon
supersetaws/supersetBI & Dashboardson
dashboard-access-managementaws/dashboard-access-managementBI & Dashboardson
bffaws/quantdata-bffPlatform APIon
rangeraws/rangerAccess Controloff
datahubaws/datahubData Catalogoff
mlflowaws/mlflowML Trackingoff

Optional — Capabilities (Layer 3)

LabelBundle slugRequiresDefault
Fine-Grained Access Controlaws/ranger-fgac-sparkranger + trinooff
Pipeline Lineage (group: Data Lineage)aws/pipeline-lineagedatahub + airflowoff
Job Lineage (group: Data Lineage)aws/job-lineagedatahub + spark-operatoroff
Query Lineage (group: Data Lineage)aws/query-lineagedatahub + trinooff
PII Scanningaws/pii-scanningdatahub + spark-operatoroff
Data Catalog Ingestionaws/data-catalog-ingestiondatahub + hive-metastore + trinooff

Named SKU Groupings

Not enforced tiers — named presets used when provisioning a new workspace.

SKUAdds to previous
Spark Corealways-included + trino + superset + bff
Spark AnalyticsSpark Core + ranger + fine-grained-access-control
Spark IntelligenceSpark Analytics + datahub + pipeline-lineage + job-lineage + data-catalog-ingestion + pii-scanning
Spark MLSpark Core + mlflow

aws/serverless-platform — Sketch

MemberNotes
storages, karpenter (minimal), observabilitySame as spark-platform
airflowLighter DAGs, no Spark submit
jupyterhubDuckDB kernel
duckdb / motherduckEmbedded SQL engine
Fine-Grained Access ControlDifferent implementation than Ranger — TBD

The FGAC capability label is identical to spark-platform. The bundle slug and implementation differ. The user sees one toggle either way.


Naming Reference

ThingNamed byExample
Architecture variantCompute modelaws/spark-platform
Optional member labelUser-facing capability"Fine-Grained Access Control"
Layer 3 bundle slugImplementation + contextaws/ranger-fgac-spark
SKUVariant + capability presetSpark Intelligence