Trino
Trino is the SQL query engine for the tenant cluster. It executes queries against Delta Lake tables stored in S3, using Hive Metastore for table definitions, and enforces access policies through the Ranger plugin at query time.
Two-Bundle Architecture
Trino is split across two bundles in the compose:
| Bundle | What it deploys |
|---|---|
| aws/trino (Layer 2) | Trino Gateway, exchange S3 bucket, Keycloak OAuth client, namespace |
| aws/trino-cluster (Layer 3) | The actual Trino coordinator and worker pods |
This separation lets multiple Trino clusters exist behind a single gateway — the gateway routes SQL traffic to the appropriate cluster based on routing rules.
Trino Gateway
The Trino Gateway is the entry point for all SQL connections from Superset, SQL Lab, and other clients. It handles:
- Authentication — OAuth2 via Keycloak. Clients authenticate through the gateway; the exchanged JWT is forwarded to the backend Trino cluster.
- Routing — Traffic is routed to backend clusters based on routing rules. An
oauth2-handlerrouting rule handles OAuth redirect flows. - TLS — Gateway exposes HTTPS (port 8443) externally with an ALB; internally it also speaks HTTP (port 8080).
The gateway has its own PostgreSQL database (via KubeBlocks) for storing cluster registration and routing state.
Exchange Storage
An S3 bucket is provisioned for Trino's exchange spill. When a query exceeds in-memory limits, intermediate data is spilled to this bucket. The bucket is scoped to the workspace.
Ranger Integration
Trino loads the Ranger system access control plugin at startup. Every query is checked against Ranger before execution — the user identity is taken from the X-Trino-User header (propagated by Superset) or from the authenticated OAuth principal.
See SQL Auth: Superset, Trino & Ranger for the full enforcement flow.
Go Deeper
- Hive Metastore — the catalog backend Trino reads schema definitions from
- Ranger — policy enforcement at query time
- SQL Auth: Superset, Trino & Ranger — end-to-end auth flow