Metadata Authentication & Authorization: DataHub, Trino, and Ranger
This page covers how a user's identity flows when performing metadata operations (adding tags, updating descriptions, reviewing PII) that require Trino to introspect table schemas, and how Ranger enforces access at query time.
For a higher-level overview of the token exchange model, see Token Exchange and Data Access.
Routes Covered
These BFF API routes trigger a Trino schema introspection query as part of their execution:
| Method | Route | Purpose |
|---|---|---|
POST | /metadata/column/tags/add | Add tag to a column |
POST | /metadata/column/tags/remove | Remove tag from a column |
POST | /metadata/table/tags/add | Add tag to a table |
POST | /metadata/table/tags/remove | Remove tag from a table |
POST | /metadata/column/pii/review | Approve or reject a column PII review |
POST | /metadata/column/description/update | Update a column description |
POST | /metadata/table/description/update | Update a table description |
All routes are defined in src/routes/datahubRoute.js and protected by the validateJWT middleware.
End-to-End Flow
Step 1: User Token Extraction — BFF
When the user submits a metadata action, the BFF receives the request with the user's Keycloak JWT in the Authorization header.
The route handler extracts the raw token directly:
const userToken = req.headers['authorization']?.split(' ')[1];
This token is passed to datahubService.ensureDatasetSchema(), which calls trinoOauthClient.executeQuery({ sql, userToken }).
Unlike other flows, there is no separate exchangeTokenForBackendMiddleware applied at the route level. The token exchange to Trino happens inside the OAuth client itself.
Step 2: Token Exchange — BFF to Trino
Inside trinoOauthClient.js, the user token is exchanged for a Trino-scoped JWT before the query is submitted:
- Audience:
${extWorkspaceId}-${config.trino.gatewayAudience}(e.g.w-abc123-trino-gw) - The exchanged token is used as the
Authorization: Bearerheader on Trino REST API calls - The username is extracted from
preferred_usernameorsubin the decoded JWT and set asX-Trino-User
The user's original session token is never sent to Trino.
Step 3: Policy Enforcement — Trino to Ranger
Trino forwards each query to the Ranger system access control plugin before execution. Ranger evaluates:
| What Ranger checks | How it's resolved |
|---|---|
| User identity | Username from X-Trino-User header |
| Resource | Catalog → Schema → Table → Column |
| Access type | select, show (schema introspection) |
| Policy match | First matching policy wins |
If no policy matches, Ranger denies by default. The metadata operation fails — the user sees an error from the BFF.
The Trino query issued here is a schema introspection query (e.g. DESCRIBE catalog.schema.table), not a data query. Ranger still evaluates it — the user must have at least show access to the target resource for the metadata operation to proceed.
TODO: Confirm the exact Ranger access type required for schema introspection queries (
showvsselectoninformation_schema). Check with the platform team what the default seed policies grant for metadata operations.
Step 4: Metadata Write — Trino to DataHub
After schema introspection succeeds, the BFF calls DataHub APIs to apply the metadata change (tag, description, PII flag). DataHub authentication is separate from the Trino flow.
TODO: Document how the BFF authenticates to DataHub for metadata writes (service account, user token, or separate exchange). Add a cross-reference once the DataHub auth doc exists.
Client Used
src/clients/trinoOauthClient.js — implements the Trino REST protocol directly (POST /v1/statement, poll nextUri). Used when config.trino.authMode === 'oauth'.
The legacy src/clients/trinoClient.js (BasicAuth) is not used on this path when authMode is oauth.
If TRINO_AUTH_MODE is not set to oauth, the BFF falls back to a shared service account for all Trino queries. Ranger cannot enforce per-user policies and all queries appear under the same identity in audit logs. See Security Gaps.
Go Deeper
- Token Exchange and Data Access — token exchange model
- SQL Auth — Superset, Trino & Ranger — Ranger policy enforcement detail
- Trino Gateway Auth — gateway audience and token scoping
- Ranger → Keycloak Role Sync — how user roles reach Ranger