Storages
The storages bundle provisions the three foundational S3 buckets that are shared across the tenant cluster. All other stacks that need object storage reference these buckets via dependency outputs rather than creating their own.
Buckets
| Bucket | Name pattern | Purpose |
|---|---|---|
| Default warehouse | {platform_id}-default-wh | Primary data warehouse — holds Delta Lake table data written by Trino and Spark jobs |
| Workspace | {platform_id}-workspace | User file storage — JupyterHub notebooks, uploaded files, rclone-synced workspace content |
| Log archive | {platform_id}-log-archive | Long-term log retention — Airflow task logs, audit logs, and other archived output |
All three buckets are created with skipDeletion: true — they are not destroyed when the stack is torn down, protecting data from accidental deletion during redeployments.
Who Uses Each Bucket
| Stack | Bucket |
|---|---|
| Hive Metastore | Default warehouse (table data storage) |
| Trino | Default warehouse (query results, exchange spill) |
| Spark jobs | Default warehouse (Delta tables), workspace |
| Airflow | Log archive (remote task logs) |
| JupyterHub | Workspace (user file sync via rclone) |
| Observability | Log archive (long-term metrics and log retention) |
Go Deeper
- Hive Metastore — uses the warehouse bucket as its table data store
- Airflow — writes task logs to the log archive bucket
- JupyterHub — syncs user files to the workspace bucket