Skip to main content

Storages

The storages bundle provisions the three foundational S3 buckets that are shared across the tenant cluster. All other stacks that need object storage reference these buckets via dependency outputs rather than creating their own.

Buckets

BucketName patternPurpose
Default warehouse{platform_id}-default-whPrimary data warehouse — holds Delta Lake table data written by Trino and Spark jobs
Workspace{platform_id}-workspaceUser file storage — JupyterHub notebooks, uploaded files, rclone-synced workspace content
Log archive{platform_id}-log-archiveLong-term log retention — Airflow task logs, audit logs, and other archived output

All three buckets are created with skipDeletion: true — they are not destroyed when the stack is torn down, protecting data from accidental deletion during redeployments.

Who Uses Each Bucket

StackBucket
Hive MetastoreDefault warehouse (table data storage)
TrinoDefault warehouse (query results, exchange spill)
Spark jobsDefault warehouse (Delta tables), workspace
AirflowLog archive (remote task logs)
JupyterHubWorkspace (user file sync via rclone)
ObservabilityLog archive (long-term metrics and log retention)

Go Deeper

  • Hive Metastore — uses the warehouse bucket as its table data store
  • Airflow — writes task logs to the log archive bucket
  • JupyterHub — syncs user files to the workspace bucket