JupyterHub

JupyterHub provides multi-user notebook servers for data scientists and ML engineers. Each user gets their own isolated notebook pod with configurable compute resources, S3 access, and Spark connectivity.

Authentication

JupyterHub uses Keycloak OAuth. Users log in via the Cogrion UI; JupyterHub provisions a notebook server on first access. Server idle timeout is configurable (default: 30 minutes of inactivity before the pod is culled).

Compute Profiles

Server sizes are sourced from the compute-profiles stack. A small service runs inside the JupyterHub pod that exposes available profiles as JSON — the spawner reads this at notebook launch time and presents the available instance sizes to the user.

Storage Access

Each user's notebook pod has IRSA access to the workspace S3 bucket ({platform_id}-workspace) — users can read and write S3 paths directly from notebook code using standard S3A or boto3 without managing credentials.

A rclone sidecar handles file sync between S3 and the notebook's local filesystem on pod start and stop, making workspace files available as a local directory.

Spark Connectivity

Notebook pods are bound to spark-cluster-role via RBAC, giving users the ability to submit Spark jobs from notebooks. JupyterHub integrates with the Enterprise Gateway for distributed kernel execution across Spark workers.

PostgreSQL

JupyterHub uses a PostgreSQL database (via KubeBlocks) for its internal state — active servers, user sessions, and spawner configuration.

Go Deeper

Spark Operator — the operator that runs Spark jobs submitted from notebooks
Features — Workspace — the user-facing workspace feature

Authentication​

Compute Profiles​

Storage Access​

Spark Connectivity​

PostgreSQL​

Go Deeper​