gandalf

module
v0.0.0-...-7b314fb Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 26, 2026 License: GPL-3.0

README

Gandalf

CI Coverage Go Report Card Go Version License

LLM gateway that sits between your applications and LLM providers, adding authentication, routing, rate limiting, caching, and observability.

Features

Core Gateway
  • Multi-provider support (OpenAI, Anthropic, Gemini, Ollama)
  • Unified OpenAI-compatible API (/v1/chat/completions, /v1/embeddings, /v1/models)
  • Multi-instance providers (e.g. openai-us, openai-eu) with independent credentials
  • Priority failover routing across providers on errors
  • SSE streaming with keep-alive and client disconnect detection
  • Native API passthrough (Anthropic, Gemini, Azure, Ollama) without translation
  • YAML config with ${ENV_VAR} expansion
  • Graceful shutdown with in-flight request draining
Cloud Hosting
  • Azure OpenAI (API key auth)
  • GCP Vertex AI for Gemini and Anthropic (OAuth ADC)
  • AWS Bedrock for Anthropic (SigV4 signing, binary event stream)
  • Azure Entra ID (OAuth2 token auth)
Auth and Access Control
  • API key authentication (gnd_ prefix, SHA-256 hashed)
  • Per-key roles (admin / member / viewer / service_account)
  • RBAC with permission bitmask (no DB lookup on hot path)
  • Per-key model allowlists
  • JWT/OIDC dual-mode auth (JWKS auto-refresh, claim mapping)
  • Multi-tenant org/team hierarchy with limit inheritance
  • SSO/SAML via Dex companion service
Rate Limiting and Quotas
  • Dual token bucket (RPM + TPM) per key
  • Config-level default RPM/TPM fallback
  • Rate limit headers (X-Ratelimit-Limit/Remaining, Retry-After)
  • Quota enforcement with in-memory spend tracking
  • Periodic quota sync from DB
Caching
  • W-TinyLFU in-memory response cache (otter)
  • Deterministic cache keys (SHA-256, normalized messages)
  • Route-configurable cache TTL per model
  • Semantic caching (embedding similarity)
  • Redis cache backend
Observability
  • Prometheus metrics (native histograms, request duration, tokens processed, cache hits/misses, rate limit rejects)
  • OpenTelemetry distributed tracing (OTLP gRPC)
  • Structured logging (log/slog)
  • Per-request tracing spans with provider attribution
Admin API
  • Provider CRUD (/admin/v1/providers)
  • API key management (/admin/v1/keys)
  • Route configuration (/admin/v1/routes)
  • Cache purge (/admin/v1/cache/purge)
  • Usage query and summary (/admin/v1/usage, /admin/v1/usage/summary)
  • Org/team CRUD (/admin/v1/organizations, /admin/v1/teams)
  • Auth configuration endpoint (/admin/v1/auth/configure)
Usage and Billing
  • Async batched usage recording (buffered channel, no hot-path blocking)
  • Hourly usage rollups (background worker)
  • Per-request cost estimation
  • Usage filtering by org, key, model, time range
Resilience
  • Circuit breaker with weighted failure classification (sliding window, per-provider)
  • Exponential backoff with jitter retry strategy
  • Retry budget (cap retries at 20% of base rate)
  • Peak EWMA + P2C load balancing
  • Request coalescing (singleflight for identical non-streaming requests)
Deployment
  • Single static binary (pure Go, no CGO)
  • Docker image
  • SQLite with WAL mode (zero external dependencies)
  • PostgreSQL storage backend
  • mTLS support
Horizontal Scaling (Planned)
  • PostgreSQL backend for shared state (prereq)
  • Redis for centralized rate limits and quota counters
  • Redis response cache backend (replace per-process W-TinyLFU for multi-instance)
  • Stateless hot path (move all mutable state to PG/Redis)
  • Usage recording via shared queue (Redis streams or direct PG writes)
  • Kubernetes-ready: liveness/readiness probes (done), Helm chart, HPA guidance

Quick Start

# Set required env vars
export OPENAI_API_KEY="sk-..."
export GANDALF_ADMIN_KEY="gnd_your_admin_key"

# Build and run
make run

The server starts on :8080. Send requests using the OpenAI-compatible API:

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $GANDALF_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

API

Universal (OpenAI-format, auth required)

Method Path Description
POST /v1/chat/completions Chat completion (streaming supported)
POST /v1/embeddings Text embeddings
GET /v1/models List available models

Native passthrough (raw forwarding, auth required)

Prefix Provider
/v1/messages Anthropic
/v1beta/models/* Gemini
/openai/deployments/* Azure OpenAI
/api/* Ollama

Admin API (auth + RBAC)

Path Description
/admin/v1/providers Provider CRUD
/admin/v1/keys API key management
/admin/v1/routes Route configuration
/admin/v1/cache/purge Cache invalidation
/admin/v1/usage Usage query
/admin/v1/usage/summary Aggregated usage

System (no auth)

Path Description
/healthz Liveness probe
/readyz Readiness probe
/metrics Prometheus metrics

Configuration

YAML config with ${ENV_VAR} expansion. See configs/gandalf.yaml for the full example.

./bin/gandalf -config configs/gandalf.yaml

Key sections: server (address, timeouts), database (SQLite DSN), providers (name, type, credentials, models, priority), routes (model alias to provider mapping), rate_limits (RPM/TPM defaults), cache (size, TTL), keys (bootstrap API keys with roles).

Provider name is the instance identifier (registry key, DB primary key, route target reference). Provider type selects the wire format (openai, anthropic, gemini, ollama). When type is omitted, it defaults to name for backward compatibility.

providers:
  - name: openai-us          # instance ID
    type: openai              # wire format (defaults to name if omitted)
    base_url: https://api.openai.com/v1
    api_key: "${OPENAI_US_KEY}"
    models: [gpt-4o]

  - name: openai-eu
    type: openai
    base_url: https://eu.api.openai.com/v1
    api_key: "${OPENAI_EU_KEY}"
    models: [gpt-4o]

Auth

API keys require gnd_ prefix. Bootstrap via GANDALF_ADMIN_KEY env var. Per-key roles control access to admin endpoints via RBAC bitmask. Delete gandalf.db to re-bootstrap after changing keys.

Development

make build      # compile binary (GOEXPERIMENT=jsonv2)
make test       # tests with race detector
make bench      # benchmarks with ns/op, rps, allocs
make lint       # go vet + golangci-lint
make check      # full pipeline: build + fix + vet + test + govulncheck + bench
make coverage   # HTML coverage report

Docker

make docker
docker run -p 8080:8080 \
  -e OPENAI_API_KEY="sk-..." \
  -e GANDALF_ADMIN_KEY="gnd_..." \
  gandalf:dev -config /path/to/config.yaml

Architecture

Hexagonal architecture with domain types at the center and no circular imports.

cmd/gandalf/           entrypoint, dependency wiring, graceful shutdown
internal/
  gateway.go           domain types + interfaces (no project imports)
  server/              HTTP handlers + middleware (chi), SSE streaming, native passthrough
  app/                 ProxyService (failover), RouterService (cached routing), KeyManager
  provider/            Registry (keyed by instance name) + adapters (openai, anthropic, gemini, ollama)
  auth/                API key auth with otter cache, per-key roles
  ratelimit/           dual token bucket (RPM+TPM), Registry, QuotaTracker
  circuitbreaker/      per-provider circuit breaker (sliding window, half-open probe)
  cache/               W-TinyLFU in-memory cache (otter)
  tokencount/          token estimation for TPM rate limiting
  telemetry/           Prometheus metrics, OpenTelemetry tracing
  worker/              async usage recording, quota sync, usage rollups
  storage/sqlite/      SQLite with read/write pools, WAL, goose migrations
  config/              YAML config loading + DB bootstrap
  testutil/            reusable test fakes

See docs/architecture.md for detailed dependency flow, interfaces, streaming design, failover logic, and native passthrough.

License

GPL-3.0

Directories

Path Synopsis
cmd
gandalf command
Gandalf is a high-performance LLM gateway that unifies multiple providers behind an OpenAI-compatible API.
Gandalf is a high-performance LLM gateway that unifies multiple providers behind an OpenAI-compatible API.
Package gateway defines domain types and interfaces for the Gandalf LLM gateway.
Package gateway defines domain types and interfaces for the Gandalf LLM gateway.
app
Package app implements application-level services for the Gandalf LLM gateway.
Package app implements application-level services for the Gandalf LLM gateway.
auth
Package auth implements API key authentication for the Gandalf gateway.
Package auth implements API key authentication for the Gandalf gateway.
cache
Package cache provides response caching for the gateway.
Package cache provides response caching for the gateway.
circuitbreaker
Package circuitbreaker implements a per-provider circuit breaker with a sliding-window error rate detector.
Package circuitbreaker implements a per-provider circuit breaker with a sliding-window error rate detector.
cloudauth
Package cloudauth provides http.RoundTripper decorators that inject authentication headers for cloud-hosted LLM providers (direct API keys, GCP OAuth, Azure Entra).
Package cloudauth provides http.RoundTripper decorators that inject authentication headers for cloud-hosted LLM providers (direct API keys, GCP OAuth, Azure Entra).
config
Package config provides configuration loading and database bootstrapping.
Package config provides configuration loading and database bootstrapping.
provider
Package provider contains shared utilities for LLM provider adapters.
Package provider contains shared utilities for LLM provider adapters.
provider/anthropic
Package anthropic implements the gateway.Provider adapter for the Anthropic API.
Package anthropic implements the gateway.Provider adapter for the Anthropic API.
provider/gemini
Package gemini implements the gateway.Provider adapter for the Google Gemini API.
Package gemini implements the gateway.Provider adapter for the Google Gemini API.
provider/ollama
Package ollama implements the gateway.Provider and gateway.NativeProxy adapters for local Ollama instances.
Package ollama implements the gateway.Provider and gateway.NativeProxy adapters for local Ollama instances.
provider/openai
Package openai implements the gateway.Provider adapter for the OpenAI API.
Package openai implements the gateway.Provider adapter for the OpenAI API.
provider/sseutil
Package sseutil provides shared SSE line reading utilities for provider adapters.
Package sseutil provides shared SSE line reading utilities for provider adapters.
ratelimit
Package ratelimit implements per-key RPM and TPM rate limiting with lazy-refill token buckets.
Package ratelimit implements per-key RPM and TPM rate limiting with lazy-refill token buckets.
server
Package server implements the HTTP transport layer for the Gandalf gateway.
Package server implements the HTTP transport layer for the Gandalf gateway.
storage
Package storage defines persistence interfaces for the gateway.
Package storage defines persistence interfaces for the gateway.
storage/sqlite
Package sqlite implements the storage interfaces using SQLite via modernc.org/sqlite.
Package sqlite implements the storage interfaces using SQLite via modernc.org/sqlite.
telemetry
Package telemetry provides observability primitives for the Gandalf gateway.
Package telemetry provides observability primitives for the Gandalf gateway.
testutil
Package testutil provides configurable test fakes for gateway interfaces.
Package testutil provides configurable test fakes for gateway interfaces.
tokencount
Package tokencount provides token estimation for TPM rate limiting and usage recording.
Package tokencount provides token estimation for TPM rate limiting and usage recording.
worker
Package worker provides background task infrastructure for the gateway.
Package worker provides background task infrastructure for the gateway.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL