Karpenter — Overview
Karpenter is the node autoscaler used across all data platform clusters. It replaces static node groups with dynamic, workload-aware provisioning: nodes are created on demand when workloads need them and removed shortly after they go idle.
Provider-specific deployment guides:
Why Karpenter
Data platform workloads have highly variable and workload-specific compute needs:
- Spark jobs burst to many nodes for minutes or hours, then go to zero
- Trino clusters are often ephemeral — spun up for a query session and torn down
- Airflow workers need consistent, moderate instances with fast boot times
- JupyterHub needs burstable instances for interactive notebooks
Static node groups cannot handle this efficiently — you either over-provision (wasting money at idle) or under-provision (workloads queue). Karpenter solves this by:
| Feature | Benefit |
|---|---|
| Just-in-time provisioning | Nodes join the cluster in ~60s, triggered by a pending pod |
| Workload-aware selection | Each NodePool targets specific instance families and sizes |
| Spot + On-Demand mixing | Spot instances reduce cost; failover to On-Demand when needed |
| Bin-packing consolidation | Underutilized nodes are drained and removed automatically |
| Taint-based isolation | Each workload type runs on its own NodePool — no noisy neighbors |
How It Fits in the Platform
NodePool Design
Each NodePool targets a specific workload type. Pods request their NodePool via a matching nodeSelector + toleration.
| NodePool | Workload | Instance Family | Capacity | Consolidation |
|---|---|---|---|---|
airflow-worker | Airflow KubernetesExecutor pods, Spark driver/executor | m5.large (compute) | Spot + On-Demand | WhenEmpty after 2m |
trino-xsmall | Trino coordinator + worker pods | r8g.large (memory) | On-Demand + Spot | WhenEmptyOrUnderutilized after 5m |
jupyterhub-small | JupyterHub single-user notebook servers | t3.large–2xlarge (burstable) | Spot + On-Demand | WhenEmptyOrUnderutilized after 5m |
Each NodePool name also doubles as a taint key. A pod must declare a matching toleration to be scheduled onto that pool's nodes.
Workload Scheduling Pattern
Every pod that targets a Karpenter NodePool must include:
nodeSelector:
NodePool: <pool-name>
tolerations:
- key: <pool-name>
operator: Exists
effect: NoSchedule
This ensures the pod lands on the correct pool and that Karpenter knows which NodePool to use when provisioning a new node.