JupyterHub
JupyterHub provides multi-user notebook servers for data scientists and ML engineers. Each user gets their own isolated notebook pod with configurable compute resources, S3 access, and Spark connectivity.
Authentication
JupyterHub uses Keycloak OAuth. Users log in via the Cogrion UI; JupyterHub provisions a notebook server on first access. Server idle timeout is configurable (default: 30 minutes of inactivity before the pod is culled).
Compute Profiles
Server sizes are sourced from the compute-profiles stack. A small service runs inside the JupyterHub pod that exposes available profiles as JSON — the spawner reads this at notebook launch time and presents the available instance sizes to the user.
Storage Access
Each user's notebook pod has IRSA access to the workspace S3 bucket ({platform_id}-workspace) — users can read and write S3 paths directly from notebook code using standard S3A or boto3 without managing credentials.
A rclone sidecar handles file sync between S3 and the notebook's local filesystem on pod start and stop, making workspace files available as a local directory.
Spark Connectivity
Notebook pods are bound to spark-cluster-role via RBAC, giving users the ability to submit Spark jobs from notebooks. JupyterHub integrates with the Enterprise Gateway for distributed kernel execution across Spark workers.
PostgreSQL
JupyterHub uses a PostgreSQL database (via KubeBlocks) for its internal state — active servers, user sessions, and spawner configuration.
Go Deeper
- Spark Operator — the operator that runs Spark jobs submitted from notebooks
- Features — Workspace — the user-facing workspace feature