Skip to main content

JupyterHub — Cogrion SDK Authentication

When a data scientist opens a Jupyter notebook on the Cogrion platform, they get more than just a Python environment. They get a fully authenticated session: their notebook code can call Cogrion platform APIs, read from the data warehouse, and manage workspace files — all without the user ever typing a password or managing a credential.

This page explains how that works, why it was harder than it sounds, and what the platform does to keep it working for the life of the notebook session.


What happens when you open a notebook

When a user clicks Start Server in the Cogrion UI, several things happen in rapid succession before the notebook is ready:

  1. The UI sends a spawn request to the BFF (Backend for Frontend) API, including the user's current Keycloak access token.
  2. The BFF exchanges that token for one scoped specifically to the JupyterHub client, using Keycloak's RFC 8693 Token Exchange.
  3. The BFF calls the JupyterHub spawn API, passing the exchanged token in the request body.
  4. Before the notebook pod starts, the JupyterHub hub runs a spawn hook — a Python function that reads the token and injects it as environment variables into the pod.
  5. The notebook pod starts with those environment variables already set.

By the time the user sees their notebook, their pod already holds a valid, user-scoped Keycloak access token. The Cogrion SDK reads it automatically.


How the SDK uses the token

The CogrionClient SDK reads its authentication configuration from environment variables injected at spawn time:

VariablePurpose
COGRION_OAUTH_ACCESS_TOKENThe user's Keycloak access token, injected at spawn
COGRION_OAUTH_TOKEN_URLKeycloak token endpoint
COGRION_OAUTH_CLIENT_IDJupyterHub OAuth client ID
COGRION_OAUTH_CLIENT_SECRETJupyterHub OAuth client secret
COGRION_OAUTH_GRANT_TYPESet to token_exchange

On every API call, the SDK checks whether the cached access token is still valid. If it is, it uses it. If it is about to expire, the SDK silently obtains a fresh one before making the call — without interrupting the user.


The token lifetime problem

Keycloak access tokens are deliberately short-lived. The realm is configured with a 5-minute access token lifespan. This is a security control: a stolen access token stops working in minutes.

But a notebook session can run for hours. A data scientist might kick off a long-running computation and come back later. Their notebook code needs to be able to make API calls the whole time — not just in the first five minutes.

The naive solution — just inject the token and let it expire — would cause 401 Unauthorized errors when the user's code calls the SDK after the token expires. This is a bad experience and, depending on the workload, could cause data loss or silent failures mid-computation.


Why refresh tokens don't work here

The standard OAuth2 answer to expiring access tokens is a refresh token: a longer-lived credential that lets the client quietly obtain a new access token without user interaction.

JupyterHub's built-in OAuth flow (when a user logs in through a browser) does obtain a refresh token. But notebook pods spawned via the Cogrion API — which is how all Cogrion notebooks launch — go through a different path. The BFF calls the JupyterHub spawn API directly on behalf of the user. No browser OAuth flow happens for the pod itself, so no refresh token is ever issued to the pod.

Several approaches were explored:

  • Passing the UI's refresh token into the pod — fails because Keycloak refresh tokens are bound to the client that issued them. The UI's token is bound to the frontend client; the pod only has JupyterHub client credentials. Keycloak rejects the mismatch.
  • Token exchange requesting a refresh token (requested_token_type=refresh_token) — Keycloak 26+ standard token exchange returns 400 Bad Request for this parameter.
  • Requesting offline_access scope — Keycloak standard token exchange does not return a refresh token regardless of scope.
  • Service account (client_credentials) grant — works for token refresh, but the token carries service account identity, not the user's identity. API calls that check user permissions (workspace file access, data operations) return 403 Forbidden.

Each of these dead ends points at the same underlying constraint: JupyterHub's API-based spawn has no browser OAuth flow, and Keycloak's standard token exchange (RFC 8693) does not return refresh tokens.


The solution: proactive token exchange in the SDK

The solution shifts the refresh responsibility away from refresh tokens entirely. Instead of waiting for the access token to expire and then trying to renew it, the SDK proactively re-exchanges the current access token for a fresh one before it expires — while the current token is still valid.

This works because the pod already has everything it needs:

  • A valid user-scoped access token (injected at spawn)
  • The JupyterHub OAuth client credentials (CLIENT_ID, CLIENT_SECRET)
  • The Keycloak token endpoint (TOKEN_URL)

When the SDK detects that the cached token will expire within the next 60 seconds, it performs a token exchange with Keycloak:

POST /realms/<realm>/protocol/openid-connect/token

grant_type = urn:ietf:params:oauth:grant-type:token-exchange
client_id = <COGRION_OAUTH_CLIENT_ID>
client_secret = <COGRION_OAUTH_CLIENT_SECRET>
subject_token = <current access token>
subject_token_type = urn:ietf:params:oauth:token-type:access_token
requested_token_type = urn:ietf:params:oauth:token-type:access_token

Keycloak responds with a fresh access token. The SDK caches it and uses it for the next API call. The user's code sees nothing — no errors, no delays, no re-authentication prompts.

This cycle repeats for the life of the session. As long as the user's Keycloak SSO session remains active (governed by the realm's SSO session idle timeout, typically 30 minutes of no browser activity), the exchange succeeds and the notebook stays authenticated.


Token lifetimes in practice

The access token injected at spawn comes from the BFF's token exchange. Its lifetime is determined by the BFF OAuth client's access token lifespan in Keycloak — not the realm default of 5 minutes. This is because Keycloak sets the exchanged token's lifetime based on the authenticating client (the BFF), not the target audience (JupyterHub).

The BFF client inherits the realm's default 5-minute access token lifespan. This means:

  • The token injected at spawn is valid for 5 minutes
  • The SDK's 60-second proactive buffer kicks in at the 4-minute mark
  • A fresh exchange gives another 5 minutes
  • A notebook session that runs continuously will re-exchange roughly every 4 minutes, transparently

This is the tightest security posture: a leaked token is dead within minutes regardless of when it was stolen. The re-exchange overhead is negligible — it is a single HTTP call to Keycloak that completes in milliseconds.

The token lifespan is not yet tunable per-workspace from the Cogrion control plane. When that knob is added, the BFF client lifespan will be the lever to adjust it.


What the spawn hook does

The spawn hook (inject_keycloak_tokens) runs inside the JupyterHub hub just before each notebook pod is created. It:

  1. Reads the exchanged access token from the spawn request (oauth_access_token in user_options)
  2. Sets COGRION_OAUTH_ACCESS_TOKEN, OQULLUS_OAUTH_ACCESS_TOKEN, and OAUTH_ACCESS_TOKEN in the pod environment
  3. Decodes the JWT to extract the user's sub claim and sets OPENBAO_USER_SUB
  4. Performs a token exchange to get a token scoped to the OpenBao client, then logs into OpenBao to obtain a vault token (OPENBAO_TOKEN)

Steps 3 and 4 handle secret access (OpenBao) separately from SDK authentication. The two are independent: the SDK uses the access token directly; OpenBao uses a separately exchanged vault token. See JupyterHub Spawner — OpenBao Secret Auth for details on that flow.


SDK grant type: token_exchange

The COGRION_OAUTH_GRANT_TYPE=token_exchange setting tells the SDK which token renewal strategy to use. This is a grant type added to the Cogrion SDK (cogrion-sdk-py >= 0.1.2) specifically for this use case.

The SDK supports three grant types:

Grant typeWhen to use
refresh_tokenStandard OAuth2 client with a refresh token (e.g. service running with user credentials from a browser login)
client_credentialsService-to-service calls where user identity is not needed
token_exchangeJupyterHub notebook pods — user-scoped token, no refresh token available

The token_exchange grant type is 100% backwards compatible. Existing deployments using refresh_token or client_credentials are unaffected.


Verifying the SDK is working

From inside a running notebook pod, run:

from cogrion_sdk import CogrionClient
import time

c = CogrionClient()
t = c.oauth.access_token()
print(t[:10], "...")
print("expires in:", int(c.oauth._jwt_exp(t) - time.time()), "seconds")
print("grant type:", c.oauth.grant_type())

To test that re-exchange works without waiting for the token to naturally expire:

c.oauth._cached_access_token_expiry_ts = 0 # force cache miss
t2 = c.oauth.access_token() # triggers a live Keycloak exchange
print("new token:", t2[:10], "...")
print("changed:", t != t2)

If changed: True, the re-exchange mechanism is working correctly.


Go deeper


Appendix: Approaches that didn't work

Getting here required ruling out every other reasonable approach first. This section records what was tried and why each failed — so future engineers don't retread the same ground.

Passing the UI's refresh token into the pod

The Cogrion UI (oqullus) holds a Keycloak refresh token for the logged-in user. The obvious idea: capture that token at spawn time and pass it through to the pod.

It fails for two reasons.

Client binding. Keycloak refresh tokens are bound to the OAuth client that issued them. The UI's refresh token was issued to the frontend client. Inside the pod, the only credentials available are for the JupyterHub client. Presenting the UI's refresh token with JupyterHub client credentials produces a REFRESH_TOKEN_ERROR: Token is not active rejection from Keycloak — the token was simply not issued to that client.

Rotation staleness. Even setting the client mismatch aside, the oqullus axios interceptor calls keycloak.updateToken() on every HTTP request made by the frontend. Each call rotates the refresh token — the old one is immediately invalidated. By the time the captured token reaches the pod, the frontend has likely rotated it several times. The token is dead before it is ever used.

Requesting a refresh token via standard token exchange

RFC 8693 token exchange supports a requested_token_type parameter. The logical request: exchange the user's access token for a refresh token that the pod can hold long-term.

Keycloak 26+ rejects this with 400 Bad Request. The standard token exchange implementation in Keycloak does not support requested_token_type=urn:ietf:params:oauth:token-type:refresh_token. The "Allow refresh token in Standard Token Exchange" Keycloak setting only controls whether a refresh token is returned alongside an access token within the same SSO session — it does not enable issuing a standalone refresh token to a different client.

Requesting offline_access scope

Offline tokens are a special class of refresh token that survive SSO session expiry. Requesting the offline_access scope during a token exchange should, in theory, return one.

In practice, Keycloak's standard token exchange ignores the offline_access scope parameter and returns only an access token. No refresh token, offline or otherwise, is included in the response.

Using client_credentials grant type

If user identity isn't strictly required, the pod could use the JupyterHub client's own service account credentials (grant_type=client_credentials) to obtain tokens indefinitely. No refresh token needed — just rotate via client credentials on demand.

This works for token renewal but fails at the API layer. Tokens obtained via client_credentials carry service account identity, not the user's identity. API calls that are user-scoped — workspace file management, data access permissions, anything that checks who the caller is — return 403 Forbidden. Notebooks on Cogrion are user-scoped by design, so this approach is not viable.

Setting a long access token lifespan

If the realm's access token lifespan were set to 8 or 12 hours, the injected token would last the whole session and no renewal mechanism would be needed.

This was ruled out because it undermines the security model. A short access token lifespan is a containment control: if a token is stolen or leaked, it stops working in minutes. Extending it to hours means a compromised token is dangerous for the entire session. The proactive exchange approach preserves the short-lifespan security model while still keeping notebooks continuously authenticated.