Skip to main content

Trino

Trino is the SQL query engine for the tenant cluster. It executes queries against Delta Lake tables stored in S3, using Hive Metastore for table definitions, and enforces access policies through the Ranger plugin at query time.

Two-Bundle Architecture

Trino is split across two bundles in the compose:

BundleWhat it deploys
aws/trino (Layer 2)Trino Gateway, exchange S3 bucket, Keycloak OAuth client, namespace
aws/trino-cluster (Layer 3)The actual Trino coordinator and worker pods

This separation lets multiple Trino clusters exist behind a single gateway — the gateway routes SQL traffic to the appropriate cluster based on routing rules.

Trino Gateway

The Trino Gateway is the entry point for all SQL connections from Superset, SQL Lab, and other clients. It handles:

  • Authentication — OAuth2 via Keycloak. Clients authenticate through the gateway; the exchanged JWT is forwarded to the backend Trino cluster.
  • Routing — Traffic is routed to backend clusters based on routing rules. An oauth2-handler routing rule handles OAuth redirect flows.
  • TLS — Gateway exposes HTTPS (port 8443) externally with an ALB; internally it also speaks HTTP (port 8080).

The gateway has its own PostgreSQL database (via KubeBlocks) for storing cluster registration and routing state.

Exchange Storage

An S3 bucket is provisioned for Trino's exchange spill. When a query exceeds in-memory limits, intermediate data is spilled to this bucket. The bucket is scoped to the workspace.

Ranger Integration

Trino loads the Ranger system access control plugin at startup. Every query is checked against Ranger before execution — the user identity is taken from the X-Trino-User header (propagated by Superset) or from the authenticated OAuth principal.

See SQL Auth: Superset, Trino & Ranger for the full enforcement flow.

Go Deeper