gandalf

module

v0.0.0-...-7b314fb Latest Latest Go to latest Published: Feb 26, 2026 License: GPL-3.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/eugener/gandalf

Links

Open Source Insights

README ¶

Gandalf

LLM gateway that sits between your applications and LLM providers, adding authentication, routing, rate limiting, caching, and observability.

Features

Core Gateway

Multi-provider support (OpenAI, Anthropic, Gemini, Ollama)
Unified OpenAI-compatible API (/v1/chat/completions, /v1/embeddings, /v1/models)
Multi-instance providers (e.g. openai-us, openai-eu) with independent credentials
Priority failover routing across providers on errors
SSE streaming with keep-alive and client disconnect detection
Native API passthrough (Anthropic, Gemini, Azure, Ollama) without translation
YAML config with ${ENV_VAR} expansion
Graceful shutdown with in-flight request draining

Cloud Hosting

Azure OpenAI (API key auth)
GCP Vertex AI for Gemini and Anthropic (OAuth ADC)
AWS Bedrock for Anthropic (SigV4 signing, binary event stream)
Azure Entra ID (OAuth2 token auth)

Auth and Access Control

API key authentication (gnd_ prefix, SHA-256 hashed)
Per-key roles (admin / member / viewer / service_account)
RBAC with permission bitmask (no DB lookup on hot path)
Per-key model allowlists
JWT/OIDC dual-mode auth (JWKS auto-refresh, claim mapping)
Multi-tenant org/team hierarchy with limit inheritance
SSO/SAML via Dex companion service

Rate Limiting and Quotas

Dual token bucket (RPM + TPM) per key
Config-level default RPM/TPM fallback
Rate limit headers (X-Ratelimit-Limit/Remaining, Retry-After)
Quota enforcement with in-memory spend tracking
Periodic quota sync from DB

Caching

W-TinyLFU in-memory response cache (otter)
Deterministic cache keys (SHA-256, normalized messages)
Route-configurable cache TTL per model
Semantic caching (embedding similarity)
Redis cache backend

Observability

Prometheus metrics (native histograms, request duration, tokens processed, cache hits/misses, rate limit rejects)
OpenTelemetry distributed tracing (OTLP gRPC)
Structured logging (log/slog)
Per-request tracing spans with provider attribution

Admin API

Provider CRUD (/admin/v1/providers)
API key management (/admin/v1/keys)
Route configuration (/admin/v1/routes)
Cache purge (/admin/v1/cache/purge)
Usage query and summary (/admin/v1/usage, /admin/v1/usage/summary)
Org/team CRUD (/admin/v1/organizations, /admin/v1/teams)
Auth configuration endpoint (/admin/v1/auth/configure)

Usage and Billing

Async batched usage recording (buffered channel, no hot-path blocking)
Hourly usage rollups (background worker)
Per-request cost estimation
Usage filtering by org, key, model, time range

Resilience

Circuit breaker with weighted failure classification (sliding window, per-provider)
Exponential backoff with jitter retry strategy
Retry budget (cap retries at 20% of base rate)
Peak EWMA + P2C load balancing
Request coalescing (singleflight for identical non-streaming requests)

Deployment

Single static binary (pure Go, no CGO)
Docker image
SQLite with WAL mode (zero external dependencies)
PostgreSQL storage backend
mTLS support

Horizontal Scaling (Planned)

PostgreSQL backend for shared state (prereq)
Redis for centralized rate limits and quota counters
Redis response cache backend (replace per-process W-TinyLFU for multi-instance)
Stateless hot path (move all mutable state to PG/Redis)
Usage recording via shared queue (Redis streams or direct PG writes)
Kubernetes-ready: liveness/readiness probes (done), Helm chart, HPA guidance

Quick Start

# Set required env vars
export OPENAI_API_KEY="sk-..."
export GANDALF_ADMIN_KEY="gnd_your_admin_key"

# Build and run
make run

The server starts on :8080. Send requests using the OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $GANDALF_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

API

Universal (OpenAI-format, auth required)

Method	Path	Description
POST	`/v1/chat/completions`	Chat completion (streaming supported)
POST	`/v1/embeddings`	Text embeddings
GET	`/v1/models`	List available models

Native passthrough (raw forwarding, auth required)

Prefix	Provider
`/v1/messages`	Anthropic
`/v1beta/models/*`	Gemini
`/openai/deployments/*`	Azure OpenAI
`/api/*`	Ollama

Admin API (auth + RBAC)

Path	Description
`/admin/v1/providers`	Provider CRUD
`/admin/v1/keys`	API key management
`/admin/v1/routes`	Route configuration
`/admin/v1/cache/purge`	Cache invalidation
`/admin/v1/usage`	Usage query
`/admin/v1/usage/summary`	Aggregated usage

System (no auth)

Path	Description
`/healthz`	Liveness probe
`/readyz`	Readiness probe
`/metrics`	Prometheus metrics

Configuration

YAML config with ${ENV_VAR} expansion. See configs/gandalf.yaml for the full example.

./bin/gandalf -config configs/gandalf.yaml

Key sections: server (address, timeouts), database (SQLite DSN), providers (name, type, credentials, models, priority), routes (model alias to provider mapping), rate_limits (RPM/TPM defaults), cache (size, TTL), keys (bootstrap API keys with roles).

Provider name is the instance identifier (registry key, DB primary key, route target reference). Provider type selects the wire format (openai, anthropic, gemini, ollama). When type is omitted, it defaults to name for backward compatibility.

providers:
  - name: openai-us          # instance ID
    type: openai              # wire format (defaults to name if omitted)
    base_url: https://api.openai.com/v1
    api_key: "${OPENAI_US_KEY}"
    models: [gpt-4o]

  - name: openai-eu
    type: openai
    base_url: https://eu.api.openai.com/v1
    api_key: "${OPENAI_EU_KEY}"
    models: [gpt-4o]

Auth

API keys require gnd_ prefix. Bootstrap via GANDALF_ADMIN_KEY env var. Per-key roles control access to admin endpoints via RBAC bitmask. Delete gandalf.db to re-bootstrap after changing keys.

Development

make build      # compile binary (GOEXPERIMENT=jsonv2)
make test       # tests with race detector
make bench      # benchmarks with ns/op, rps, allocs
make lint       # go vet + golangci-lint
make check      # full pipeline: build + fix + vet + test + govulncheck + bench
make coverage   # HTML coverage report

Docker

make docker
docker run -p 8080:8080 \
  -e OPENAI_API_KEY="sk-..." \
  -e GANDALF_ADMIN_KEY="gnd_..." \
  gandalf:dev -config /path/to/config.yaml

Architecture

Hexagonal architecture with domain types at the center and no circular imports.

cmd/gandalf/           entrypoint, dependency wiring, graceful shutdown
internal/
  gateway.go           domain types + interfaces (no project imports)
  server/              HTTP handlers + middleware (chi), SSE streaming, native passthrough
  app/                 ProxyService (failover), RouterService (cached routing), KeyManager
  provider/            Registry (keyed by instance name) + adapters (openai, anthropic, gemini, ollama)
  auth/                API key auth with otter cache, per-key roles
  ratelimit/           dual token bucket (RPM+TPM), Registry, QuotaTracker
  circuitbreaker/      per-provider circuit breaker (sliding window, half-open probe)
  cache/               W-TinyLFU in-memory cache (otter)
  tokencount/          token estimation for TPM rate limiting
  telemetry/           Prometheus metrics, OpenTelemetry tracing
  worker/              async usage recording, quota sync, usage rollups
  storage/sqlite/      SQLite with read/write pools, WAL, goose migrations
  config/              YAML config loading + DB bootstrap
  testutil/            reusable test fakes

See docs/architecture.md for detailed dependency flow, interfaces, streaming design, failover logic, and native passthrough.

License

GPL-3.0

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
gandalf command Gandalf is a high-performance LLM gateway that unifies multiple providers behind an OpenAI-compatible API.	Gandalf is a high-performance LLM gateway that unifies multiple providers behind an OpenAI-compatible API.
internal Package gateway defines domain types and interfaces for the Gandalf LLM gateway.	Package gateway defines domain types and interfaces for the Gandalf LLM gateway.
app Package app implements application-level services for the Gandalf LLM gateway.	Package app implements application-level services for the Gandalf LLM gateway.
auth Package auth implements API key authentication for the Gandalf gateway.	Package auth implements API key authentication for the Gandalf gateway.
cache Package cache provides response caching for the gateway.	Package cache provides response caching for the gateway.
circuitbreaker Package circuitbreaker implements a per-provider circuit breaker with a sliding-window error rate detector.	Package circuitbreaker implements a per-provider circuit breaker with a sliding-window error rate detector.
cloudauth Package cloudauth provides http.RoundTripper decorators that inject authentication headers for cloud-hosted LLM providers (direct API keys, GCP OAuth, Azure Entra).	Package cloudauth provides http.RoundTripper decorators that inject authentication headers for cloud-hosted LLM providers (direct API keys, GCP OAuth, Azure Entra).
config Package config provides configuration loading and database bootstrapping.	Package config provides configuration loading and database bootstrapping.
provider Package provider contains shared utilities for LLM provider adapters.	Package provider contains shared utilities for LLM provider adapters.
provider/anthropic Package anthropic implements the gateway.Provider adapter for the Anthropic API.	Package anthropic implements the gateway.Provider adapter for the Anthropic API.
provider/gemini Package gemini implements the gateway.Provider adapter for the Google Gemini API.	Package gemini implements the gateway.Provider adapter for the Google Gemini API.
provider/ollama Package ollama implements the gateway.Provider and gateway.NativeProxy adapters for local Ollama instances.	Package ollama implements the gateway.Provider and gateway.NativeProxy adapters for local Ollama instances.
provider/openai Package openai implements the gateway.Provider adapter for the OpenAI API.	Package openai implements the gateway.Provider adapter for the OpenAI API.
provider/sseutil Package sseutil provides shared SSE line reading utilities for provider adapters.	Package sseutil provides shared SSE line reading utilities for provider adapters.
ratelimit Package ratelimit implements per-key RPM and TPM rate limiting with lazy-refill token buckets.	Package ratelimit implements per-key RPM and TPM rate limiting with lazy-refill token buckets.
server Package server implements the HTTP transport layer for the Gandalf gateway.	Package server implements the HTTP transport layer for the Gandalf gateway.
storage Package storage defines persistence interfaces for the gateway.	Package storage defines persistence interfaces for the gateway.
storage/sqlite Package sqlite implements the storage interfaces using SQLite via modernc.org/sqlite.	Package sqlite implements the storage interfaces using SQLite via modernc.org/sqlite.
telemetry Package telemetry provides observability primitives for the Gandalf gateway.	Package telemetry provides observability primitives for the Gandalf gateway.
testutil Package testutil provides configurable test fakes for gateway interfaces.	Package testutil provides configurable test fakes for gateway interfaces.
tokencount Package tokencount provides token estimation for TPM rate limiting and usage recording.	Package tokencount provides token estimation for TPM rate limiting and usage recording.
worker Package worker provides background task infrastructure for the gateway.	Package worker provides background task infrastructure for the gateway.