Celeborn

Celeborn is a remote shuffle service for distributed compute frameworks. It offloads Spark's shuffle phase — the expensive inter-stage data exchange — from executor pods to a dedicated, persistent shuffle cluster.

Why It Exists

In a standard Spark job, shuffle data (intermediate results between stages) is written to each executor's local disk and read by executors in the next stage. When executors are Kubernetes pods, local disks are small and ephemeral — executor failures lose shuffle data and require full task restarts.

Celeborn replaces local shuffle with a centralized service:

Shuffle data is pushed to Celeborn workers during task execution
Downstream tasks pull from Celeborn rather than the original executor
Executor failures do not lose shuffle data — it is already in Celeborn

Components

Component	Description
Celeborn Master	Coordinates shuffle job registration and worker assignment. Supports HA mode with multiple replicas.
Celeborn Worker	Stores shuffle data on local persistent volumes (PVCs). Horizontally scalable.

Configuration

Parameter	Purpose
`master_replicas`	Number of master replicas (HA requires ≥ 2)
`worker_replicas`	Number of worker replicas
`master_heap_memory`	JVM heap for master pods
`worker_heap_memory`	JVM heap for worker pods
`worker_offheap_memory`	Off-heap memory for shuffle data buffering
`worker_disk_size`	PVC size per worker for shuffle data

Supported Frameworks

Celeborn supports Apache Spark, Flink, and Hadoop MapReduce as shuffle clients.

Go Deeper

Spark Team — Spark jobs from team namespaces use Celeborn for shuffle
Spark Operator — manages the Spark jobs that delegate shuffle to Celeborn

Why It Exists​

Components​

Configuration​

Supported Frameworks​

Go Deeper​

Why It Exists

Components

Configuration

Supported Frameworks

Go Deeper