Skip to main content

Metadata Authentication & Authorization: DataHub, Trino, and Ranger

This page covers how a user's identity flows when performing metadata operations (adding tags, updating descriptions, reviewing PII) that require Trino to introspect table schemas, and how Ranger enforces access at query time.

For a higher-level overview of the token exchange model, see Token Exchange and Data Access.


Routes Covered

These BFF API routes trigger a Trino schema introspection query as part of their execution:

MethodRoutePurpose
POST/metadata/column/tags/addAdd tag to a column
POST/metadata/column/tags/removeRemove tag from a column
POST/metadata/table/tags/addAdd tag to a table
POST/metadata/table/tags/removeRemove tag from a table
POST/metadata/column/pii/reviewApprove or reject a column PII review
POST/metadata/column/description/updateUpdate a column description
POST/metadata/table/description/updateUpdate a table description

All routes are defined in src/routes/datahubRoute.js and protected by the validateJWT middleware.


End-to-End Flow


Step 1: User Token Extraction — BFF

When the user submits a metadata action, the BFF receives the request with the user's Keycloak JWT in the Authorization header.

The route handler extracts the raw token directly:

const userToken = req.headers['authorization']?.split(' ')[1];

This token is passed to datahubService.ensureDatasetSchema(), which calls trinoOauthClient.executeQuery({ sql, userToken }).

Unlike other flows, there is no separate exchangeTokenForBackendMiddleware applied at the route level. The token exchange to Trino happens inside the OAuth client itself.


Step 2: Token Exchange — BFF to Trino

Inside trinoOauthClient.js, the user token is exchanged for a Trino-scoped JWT before the query is submitted:

  • Audience: ${extWorkspaceId}-${config.trino.gatewayAudience} (e.g. w-abc123-trino-gw)
  • The exchanged token is used as the Authorization: Bearer header on Trino REST API calls
  • The username is extracted from preferred_username or sub in the decoded JWT and set as X-Trino-User

The user's original session token is never sent to Trino.


Step 3: Policy Enforcement — Trino to Ranger

Trino forwards each query to the Ranger system access control plugin before execution. Ranger evaluates:

What Ranger checksHow it's resolved
User identityUsername from X-Trino-User header
ResourceCatalog → Schema → Table → Column
Access typeselect, show (schema introspection)
Policy matchFirst matching policy wins

If no policy matches, Ranger denies by default. The metadata operation fails — the user sees an error from the BFF.

info

The Trino query issued here is a schema introspection query (e.g. DESCRIBE catalog.schema.table), not a data query. Ranger still evaluates it — the user must have at least show access to the target resource for the metadata operation to proceed.

TODO: Confirm the exact Ranger access type required for schema introspection queries (show vs select on information_schema). Check with the platform team what the default seed policies grant for metadata operations.


Step 4: Metadata Write — Trino to DataHub

After schema introspection succeeds, the BFF calls DataHub APIs to apply the metadata change (tag, description, PII flag). DataHub authentication is separate from the Trino flow.

TODO: Document how the BFF authenticates to DataHub for metadata writes (service account, user token, or separate exchange). Add a cross-reference once the DataHub auth doc exists.


Client Used

src/clients/trinoOauthClient.js — implements the Trino REST protocol directly (POST /v1/statement, poll nextUri). Used when config.trino.authMode === 'oauth'.

The legacy src/clients/trinoClient.js (BasicAuth) is not used on this path when authMode is oauth.

SECURITY GAP

If TRINO_AUTH_MODE is not set to oauth, the BFF falls back to a shared service account for all Trino queries. Ranger cannot enforce per-user policies and all queries appear under the same identity in audit logs. See Security Gaps.


Go Deeper