Karpenter — Overview

Karpenter is the node autoscaler used across all data platform clusters. It replaces static node groups with dynamic, workload-aware provisioning: nodes are created on demand when workloads need them and removed shortly after they go idle.

Provider-specific deployment guides:

Why Karpenter

Data platform workloads have highly variable and workload-specific compute needs:

Spark jobs burst to many nodes for minutes or hours, then go to zero
Trino clusters are often ephemeral — spun up for a query session and torn down
Airflow workers need consistent, moderate instances with fast boot times
JupyterHub needs burstable instances for interactive notebooks

Static node groups cannot handle this efficiently — you either over-provision (wasting money at idle) or under-provision (workloads queue). Karpenter solves this by:

Feature	Benefit
Just-in-time provisioning	Nodes join the cluster in ~60s, triggered by a pending pod
Workload-aware selection	Each NodePool targets specific instance families and sizes
Spot + On-Demand mixing	Spot instances reduce cost; failover to On-Demand when needed
Bin-packing consolidation	Underutilized nodes are drained and removed automatically
Taint-based isolation	Each workload type runs on its own NodePool — no noisy neighbors

How It Fits in the Platform

NodePool Design

Each NodePool targets a specific workload type. Pods request their NodePool via a matching nodeSelector + toleration.

NodePool	Workload	Instance Family	Capacity	Consolidation
`airflow-worker`	Airflow KubernetesExecutor pods, Spark driver/executor	`m5.large` (compute)	Spot + On-Demand	`WhenEmpty` after 2m
`trino-xsmall`	Trino coordinator + worker pods	`r8g.large` (memory)	On-Demand + Spot	`WhenEmptyOrUnderutilized` after 5m
`jupyterhub-small`	JupyterHub single-user notebook servers	`t3.large–2xlarge` (burstable)	Spot + On-Demand	`WhenEmptyOrUnderutilized` after 5m

Each NodePool name also doubles as a taint key. A pod must declare a matching toleration to be scheduled onto that pool's nodes.

Workload Scheduling Pattern

Every pod that targets a Karpenter NodePool must include:

nodeSelector:
  NodePool: <pool-name>
tolerations:
  - key: <pool-name>
    operator: Exists
    effect: NoSchedule

This ensures the pod lands on the correct pool and that Karpenter knows which NodePool to use when provisioning a new node.

Karpenter — Overview

Why Karpenter

How It Fits in the Platform

NodePool Design

Workload Scheduling Pattern

Use Case Sequences

Spark Job via Airflow

Ephemeral Trino Cluster

Airflow KubernetesExecutor Worker

Why Karpenter​

How It Fits in the Platform​

NodePool Design​

Workload Scheduling Pattern​

Use Case Sequences​

Spark Job via Airflow​

Ephemeral Trino Cluster​

Airflow KubernetesExecutor Worker​

Why Karpenter

How It Fits in the Platform

NodePool Design

Workload Scheduling Pattern

Use Case Sequences

Spark Job via Airflow

Ephemeral Trino Cluster

Airflow KubernetesExecutor Worker