cerberus
Three-headed query gateway: translate PromQL, LogQL, and TraceQL into native ClickHouse SQL against OpenTelemetry data.
Features •
Quick start •
How it works •
Docs •
Architecture •
Deploy •
Migrate
Cerberus accepts the canonical Prometheus, Loki, and Tempo query languages and rewrites them into ClickHouse SQL targeting tables created by the OpenTelemetry Collector ClickHouse exporter. Existing Grafana dashboards, alert rules, and operator muscle memory keep working when the storage backend is ClickHouse.
What you get
- PromQL v0.2: vector / matrix selectors, the full aggregation set (
sum, avg, topk, quantile, stddev, count_values, …), rate/irate/increase/delta, *_over_time, histogram_quantile, label_replace/label_join, arithmetic + comparison + set binary operators with scalar broadcasting plus cross-cardinality vector matching (on(...), ignoring(...), group_left, group_right), the @ modifier (@ ts, @ start(), @ end()), the offset modifier, subqueries (<func>_over_time(<expr>[D:R])), and a function library (abs, ceil, floor, exp, ln, log2, log10, sqrt, sort, sort_desc, absent, absent_over_time, changes, deriv).
- LogQL alpha: matchers, line filters with
or-chains, label filters with regex / numeric / duration / bytes comparisons, JSON / logfmt / regex / pattern parsers, unwrap, every range aggregation (count_over_time, rate, bytes_over_time, bytes_rate, sum_over_time, avg_over_time, min/max/quantile/stddev/stdvar/first/last_over_time), every vector aggregation (sum, avg, min, max, count, stddev, stdvar, topk, bottomk) with by(...) push-down, and binops (<vec> <op> <lit> and <vec> <op> <vec>).
- TraceQL v0.2: spanset filters with the full comparator set including regex, intrinsic mapping (
name, duration, status), Map and JSON layout-aware attribute scopes (resource.X, span.X, .X), spanset operators (&&, ||, >>, >, ~), pipeline aggregates as ScalarFilter (| count() > N, | sum(duration) > 1s, | avg/min/max(...) …) with optional | by(<attr>) grouping, and trace-summary projection.
- OpenTelemetry-native schema: targets the
otel_metrics_*, otel_logs, and otel_traces tables emitted by the collector's ClickHouse exporter, in either the Map (default) or JSON (CH 25+) attribute layout.
- Production deployment shapes out of the box: container image at
ghcr.io/tsouza/cerberus, a Helm chart with TLS / Ingress / HPA / PDB / NetworkPolicy / ServiceMonitor, sidecar-with-ClickHouse and standalone topologies, and a k3s end-to-end harness as a reference deployment.
How it works
Cerberus is a single Go binary that imports each query language's canonical upstream parser directly:
- PromQL —
github.com/prometheus/prometheus/promql/parser
- LogQL —
github.com/grafana/loki/v3/pkg/logql/syntax
- TraceQL —
github.com/grafana/tempo/pkg/traceql
Each translator package walks the parser AST in-process and emits a Plan IR: an ordered pipeline of ClickHouse SQL steps with optional Go-side post-processing. The HTTP server runs the plan via the official clickhouse-go/v2 client and reshapes results into whatever response shape the source language's data source expects.
query string JSON response
│ ▲
▼ │
┌──────────────────┐ ┌──────────────────────┐ ┌──────────────────────────┐
│ upstream parser │ ─► │ translator (per │ ─► │ HTTP server │
│ (in-process) │ │ language) │ │ (executes Plan via │
│ PromQL / LogQL │ │ AST → Plan IR │ │ clickhouse-go/v2, │
│ TraceQL │ │ (multi-step) │ │ shapes Prom/Loki/Tempo │
└──────────────────┘ └──────────────────────┘ │ response envelopes) │
└──────────────────────────┘
Importing the canonical parsers (rather than reimplementing each grammar) keeps cerberus bug-for-bug compatible with what Prometheus, Loki, and Tempo themselves accept.
See docs/architecture.md for a deeper dive into the Plan IR.
Data source: the OTel ClickHouse exporter
Cerberus does not write data — it only reads. The expected producer is the
OpenTelemetry Collector ClickHouse exporter
(opentelemetry-collector-contrib/exporter/clickhouseexporter). That
exporter is the source of truth for the table layout cerberus targets:
| Signal |
Tables |
| Metrics |
otel_metrics_gauge, otel_metrics_sum, otel_metrics_histogram, otel_metrics_exponential_histogram, otel_metrics_summary |
| Logs |
otel_logs |
| Traces |
otel_traces (+ optional otel_traces_trace_id_ts lookup) |
Two attribute storage shapes are supported, mirroring the exporter's
json config knob: Map (the default — Map(LowCardinality(String), String) columns) and JSON (ClickHouse v25+ native JSON type, logs
and traces only). Pick one with --layout map | json on the cerberus
binary.
To run the e2e harness without a live collector pipeline, the project
vendors the exporter's DDL verbatim under
e2e/schema/upstream/, pinned to
v0.150.0 of opentelemetry-collector-contrib. The e2e/schema
package renders those templates with single-node defaults
(MergeTree(), no cluster, no TTL) and the e2e tests apply them at
bootstrap. To bump the pin, change UpstreamVersion in
e2e/schema/schema.go and re-fetch the files
from the matching tag.
For production, run the exporter with create_schema: true (the
default) so it manages the DDL itself, then point cerberus at the same
ClickHouse host, database, and layout. The exporter's
README.md documents every configuration knob.
HTTP feature matrix
Point a stock Prometheus / Loki / Tempo data source at http://cerberus:9090 and queries are translated and executed against ClickHouse.
| Endpoint |
Source compat |
Translator status |
HTTP status |
GET/POST /api/v1/query |
Prometheus instant query |
full alpha+ coverage |
200/4xx (envelope) |
GET/POST /api/v1/query_range |
Prometheus range query |
translator emits matrix-shaped Plan; HTTP returns matrix envelope |
200/4xx (envelope) |
GET/POST /api/v1/series |
Prometheus series metadata |
wired against match[]; SELECT DISTINCT over metrics tables |
200/4xx (envelope) |
GET/POST /api/v1/labels |
Prometheus label names |
wired; union of label keys across metrics tables |
200/4xx (envelope) |
GET /api/v1/label/{name}/values |
Prometheus label values |
wired; SELECT DISTINCT for the named key |
200/4xx (envelope) |
GET/POST /loki/api/v1/query |
Loki instant query |
translator covers matchers, pipelines, range + vector aggs, binops; HTTP wired |
200/4xx (envelope) |
GET/POST /loki/api/v1/query_range |
Loki range query |
translator covers matchers, pipelines, range + vector aggs, binops; HTTP wired |
200/4xx (envelope) |
GET /loki/api/v1/labels |
Loki label names |
wired; union of label keys across LogAttributes + ResourceAttributes |
200/4xx (envelope) |
GET /loki/api/v1/label/{name}/values |
Loki label values |
wired; SELECT DISTINCT for the named key |
200/4xx (envelope) |
GET /loki/api/v1/tail |
Loki streaming logs (websocket) |
deferred to v1.x |
not registered |
GET /api/search |
Tempo trace search |
translator covers spansets, operators, pipeline aggregates; HTTP wired |
200/4xx (envelope) |
GET /api/traces/{id} |
Tempo trace lookup |
wired; OTLP-encoded batch shape grouped by ResourceAttributes |
200/4xx (envelope) |
GET /healthz, GET /readyz |
liveness / readiness |
always 200 |
200 |
The HTTP error envelope matches Prometheus: {"status":"error","errorType":"bad_data" | "internal","error":"…"}. Translator parse / unsupported errors map to 400; everything else to 500.
The translator-side feature streams (PromQL alpha+, LogQL alpha, TraceQL v0.2) and the HTTP routing layer are now both wired against main. Each route translates the source query, executes the resulting Plan via clickhouse-go/v2, and reshapes rows into the data-source-specific response envelope. Loki /tail (websocket streaming) remains the only deliberately deferred route.
Compatibility matrix
| Component |
Tested |
Notes |
| ClickHouse |
24.x, 25.x |
25+ is required for the JSON attribute layout. e2e harness pins clickhouse/clickhouse-server:24.8. |
| Prometheus parser |
prometheus/prometheus@v0.55.0 |
replaced via replace in go.mod to dodge the upstream v1.8.x tag. |
| Loki parser |
grafana/loki/v3@v3.6.10 |
dependabot pinned to < 3.7.0 until dskit's memberlist fork lands a tagged release. |
| Tempo parser |
grafana/tempo@v1.5.0 |
TraceQL v2 syntax (&>, ~>, structural { A } { B }, bare ` |
| OTel collector exporter DDL |
v0.150.0 |
vendored in e2e/schema/upstream/. Bump via e2e/schema/schema.go. |
| Kubernetes |
>= 1.27 |
declared in Chart.yaml. e2e-k8s harness validates against rancher/k3s:v1.31.4-k3s1. |
| Go toolchain |
1.25.0 |
repo-wide; LogQL bumped the minimum to 1.23. |
Repository layout
cmd/
cerberus/ CLI + HTTP-server entry point
internal/
schema/ OTel ClickHouse exporter table/column constants;
attribute_lookup helper for both Map and JSON layouts
translator/ Translator interface + cross-language types
plan/ Plan IR: multi-step Step.SQL / Step.Build / Step.Transform
promql/ PromQL AST → Plan
logql/ LogQL AST → Plan
traceql/ TraceQL → Plan (with reflection-based AST accessors
+ compat canary against the pinned tempo build)
server/ HTTP server with Prometheus / Loki / Tempo APIs
plus the plan.Runner abstraction (real impl
backed by clickhouse-go/v2; tests use a stub)
deploy/
cerberus-chart/ Helm chart (sidecar + standalone modes)
docs/ architecture, deployment, migration, tuning, troubleshooting
e2e/ integration harness against a real ClickHouse
schema/upstream/ OTel exporter DDL vendored from
opentelemetry-collector-contrib v0.150.0
schema/ Go renderer that fills the upstream templates
with single-node defaults
seed/ deterministic fixture writer (metrics + logs +
traces) used by `e2e_test.go`
e2e_test.go builds the cerberus HTTP server in-process and
exercises the public API against ClickHouse
(build tag `e2e`)
e2e-k8s/ k3s + OTel collector + Altinity ClickHouse harness
(build tag `k8se2e`); separate Go module
Prerequisites
- Go ≥ 1.25 (
go.mod directive). Build / test only — runtime needs none.
just — task runner. Every command in this README maps to a just recipe.
- ClickHouse for end-to-end tests and at runtime when serving.
Quick start
Three ways to run cerberus, in order of production-readiness:
1. Helm chart (recommended for production)
The chart at deploy/cerberus-chart/ supports two
topologies: as a sidecar container alongside ClickHouse (zero network
hop, the e2e-k8s reference layout), or as a standalone Deployment
fronted by a Service.
# Standalone Deployment + Service.
helm install cerberus ./deploy/cerberus-chart \
--set sidecar.enabled=false \
--set clickhouse.addr=clickhouse:9000 \
--set clickhouse.database=otel
For the sidecar mode and the full TLS / Ingress / HPA / PDB /
NetworkPolicy / ServiceMonitor knob set, see
docs/deployment.md and the
chart README.
2. Container image
Cerberus publishes a static, distroless nonroot image to
ghcr.io/tsouza/cerberus for linux/amd64 and linux/arm64 on every
v*.*.* tag.
docker run --rm -p 9090:9090 \
-e CLICKHOUSE_ADDR=clickhouse:9000 \
-e CLICKHOUSE_DATABASE=otel \
-e CLICKHOUSE_USER=default \
ghcr.io/tsouza/cerberus:latest \
serve --listen 0.0.0.0:9090
3. Binary (single-node demo / dev)
# Build into ./bin/cerberus (or just download from GH releases).
just build
./bin/cerberus serve --listen 0.0.0.0:9090 \
--clickhouse-addr localhost:9000 \
--clickhouse-database otel \
--clickhouse-user default
Server flags also fall back to env vars: CERBERUS_LISTEN,
CERBERUS_LAYOUT, CLICKHOUSE_ADDR, CLICKHOUSE_DATABASE,
CLICKHOUSE_USER, CLICKHOUSE_PASSWORD.
Migrating from an existing stack
If you're already running Prometheus / Loki / Tempo and want to move the
storage backend to ClickHouse without rewriting your dashboards or alert
rules, see the migration guides:
Building from source
# Build the cerberus binary into ./bin/cerberus
just build
# Release build (-ldflags='-s -w' -trimpath)
just release
# Run all tests
just test
# Lint: gofmt --check + go vet + staticcheck (if installed)
just lint
# Auto-format the source
just fmt
# Quick CI gate locally
just ci
# Mutation testing via gremlins (advisory; coverage-aware).
# Whole module (slow), single package, or diff-vs-main:
just mutate
just mutate-pkg internal/translator/promql
just mutate-diff
# Translate a query to ClickHouse SQL
just translate promql 'up{job="api"}'
# Inspect the parsed AST for a query
just parse-ast promql 'rate(http_requests_total{job="api"}[5m])'
# Run the HTTP server (default :9090)
just serve
# Build the container image
just build-image
# Render the Helm chart locally
just helm-template
just (no args) lists every recipe.
CLI
cerberus translate --lang promql 'up{job="api"}'
cerberus translate --lang logql '{job="api"} |= "error"'
cerberus translate --lang traceql '{ duration > 1s } | count() > 10'
cerberus parse-ast --lang promql 'rate(up[5m])'
cerberus serve --listen 0.0.0.0:9090 \
--clickhouse-addr localhost:9000 \
--clickhouse-database otel \
--clickhouse-user default
--layout {map | json} switches between the OTel ClickHouse exporter's default Map-typed attribute storage and the newer JSON-typed variant. Defaults to map.
For multi-step plans, cerberus translate prints each step in order
separated by ;. Deferred or transform-only steps emit a -- step N: dynamic SQL or transform-only marker.
Documentation
License
MIT. See LICENSE.