Skip to main content

Karpenter — Overview

Karpenter is the node autoscaler used across all data platform clusters. It replaces static node groups with dynamic, workload-aware provisioning: nodes are created on demand when workloads need them and removed shortly after they go idle.

Provider-specific deployment guides:


Why Karpenter

Data platform workloads have highly variable and workload-specific compute needs:

  • Spark jobs burst to many nodes for minutes or hours, then go to zero
  • Trino clusters are often ephemeral — spun up for a query session and torn down
  • Airflow workers need consistent, moderate instances with fast boot times
  • JupyterHub needs burstable instances for interactive notebooks

Static node groups cannot handle this efficiently — you either over-provision (wasting money at idle) or under-provision (workloads queue). Karpenter solves this by:

FeatureBenefit
Just-in-time provisioningNodes join the cluster in ~60s, triggered by a pending pod
Workload-aware selectionEach NodePool targets specific instance families and sizes
Spot + On-Demand mixingSpot instances reduce cost; failover to On-Demand when needed
Bin-packing consolidationUnderutilized nodes are drained and removed automatically
Taint-based isolationEach workload type runs on its own NodePool — no noisy neighbors

How It Fits in the Platform


NodePool Design

Each NodePool targets a specific workload type. Pods request their NodePool via a matching nodeSelector + toleration.

NodePoolWorkloadInstance FamilyCapacityConsolidation
airflow-workerAirflow KubernetesExecutor pods, Spark driver/executorm5.large (compute)Spot + On-DemandWhenEmpty after 2m
trino-xsmallTrino coordinator + worker podsr8g.large (memory)On-Demand + SpotWhenEmptyOrUnderutilized after 5m
jupyterhub-smallJupyterHub single-user notebook serverst3.large–2xlarge (burstable)Spot + On-DemandWhenEmptyOrUnderutilized after 5m

Each NodePool name also doubles as a taint key. A pod must declare a matching toleration to be scheduled onto that pool's nodes.


Workload Scheduling Pattern

Every pod that targets a Karpenter NodePool must include:

nodeSelector:
NodePool: <pool-name>
tolerations:
- key: <pool-name>
operator: Exists
effect: NoSchedule

This ensures the pod lands on the correct pool and that Karpenter knows which NodePool to use when provisioning a new node.


Use Case Sequences

Spark Job via Airflow

Ephemeral Trino Cluster

Airflow KubernetesExecutor Worker