Skip to main content

Observability

The observability stack provides metrics collection, log aggregation, and dashboards for the tenant cluster. It is a prerequisite for most application stacks — they depend on it for scrape targets and log shipping.

Components

ComponentRole
PrometheusScrapes metrics from all cluster workloads — application pods, Kafka, Spark, Trino, and more
GrafanaDashboard UI for metrics. Pre-built dashboards for platform components are included.
LokiLog aggregation backend — stores logs from all pods for querying via Grafana
AlloyOpenTelemetry-based log collector — ships pod logs to Loki
PromtailSecondary log shipper (legacy support)

Deployed via kube-prometheus-stack (Prometheus + Grafana + Alertmanager) and separate Loki + Alloy Helm releases.

Storage

Three S3 buckets from the storages stack back long-term retention. The log archive bucket stores exported metrics and logs beyond the in-cluster Loki retention window.

Sizing

The observability stack supports four capacity sizes:

SizeIntended use
devBudget / single-node — minimal resource footprint
smallStaging environments
mediumStandard production
largeHigh-volume production

Go Deeper

  • Storages — the S3 buckets used for log and metrics archiving
  • Kafka — Kafka JMX metrics are scraped by Prometheus