llmbridge

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 20, 2026 License: MIT Imports: 22 Imported by: 0

README

llmbridge

Go Reference CI Go Report Card

A unified Go interface to multiple LLM providers.

Switch between OpenAI, Anthropic, Gemini, Bedrock, Azure, Cohere, Ollama, Groq, or any OpenAI-compatible endpoint by changing one line. Your application code never changes.

Features

  • Unified interface — one API across all providers: chat, streaming, tool use, embeddings, TTS, image generation
  • Router — multi-provider failover with five strategies, weighted routing, circuit breaker, typed fallback for context-window and content-policy errors
  • Proxy server — OpenAI-compatible HTTP proxy; drop in front of any backend
  • Caching — in-memory, disk, Redis, and semantic (cosine-similarity) caches
  • Budget & spend tracking — per-key/org/team limits with threshold alerts
  • Guardrails — input/output length limits, PII detection, keyword blocking, prompt injection detection
  • Observability — Langfuse tracing, Prometheus metrics, JSON access logs, webhooks
  • Auth & multi-tenancy — API key management, SSO/OIDC (Google, GitHub, Microsoft), orgs and teams, SQLite persistence
  • Secret management — AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault
  • Deployment — Docker (multi-arch), docker-compose, Helm chart, GitHub Actions

Architecture

llmbridge/
├── llmbridge.go          # Unified interface + top-level helpers
├── router.go             # Multi-provider routing, failover, circuit breaker
├── middleware.go         # Request/response middleware chain
├── cost_calculator.go    # Per-provider cost estimation
├── session.go            # Conversation persistence
├── constants.go          # Model registry & pricing tables
│
├── types/                # Shared types (Request, Response, Message…)
├── exceptions/           # Typed error hierarchy
├── budget/               # Per-key/org spend tracking and alerts
├── caching/              # In-memory, disk, Redis, semantic caches
├── callbacks/            # Langfuse, Prometheus, JSON log, webhook handlers
├── guardrails/           # Input/output safety rules engine
├── tokencount/           # Token counting utilities
├── toolbuilder/          # Fluent builder for tool/function definitions
├── prompttpl/            # Prompt template helpers
│
├── llms/                 # Provider implementations
│   ├── base/             # LLM, Streamer, EmbedProvider, ImageGenerator interfaces
│   ├── openai/           # OpenAI (chat, embeddings, TTS, image gen, batch, files)
│   ├── anthropic/        # Anthropic Claude
│   ├── azure/            # Azure OpenAI Service
│   ├── bedrock/          # AWS Bedrock (Titan, Claude, Llama…)
│   ├── cohere/           # Cohere Command
│   ├── gemini/           # Google Gemini
│   └── compatible/       # Ollama, LM Studio, Groq, Together AI, xAI, any OpenAI-compat
│
└── proxy/                # OpenAI-compatible HTTP proxy server
    ├── auth/             # API key store, rate limiting, SSO/OIDC
    ├── audit/            # Audit logging
    ├── config/           # JSON config loader
    ├── management/       # Key, model, router, alias management endpoints
    ├── metrics/          # Prometheus collector + /metrics handler
    ├── middleware/       # HTTP access log middleware
    ├── persistence/      # SQLite-backed key/org/team store
    ├── prompts/          # Stored prompt management
    ├── secrets/          # AWS / GCP / Vault secret backends
    ├── ui/               # Embedded admin SPA
    └── webhooks/         # Outbound webhook delivery

Installation

go get github.com/Vedanshu7/llmbridge@latest

Requires Go 1.25+. The only external dependency is modernc.org/sqlite (pure-Go, no CGo), used by the proxy server for persistence.

Quick Start

import (
    "github.com/Vedanshu7/llmbridge"
    "github.com/Vedanshu7/llmbridge/llms/openai"
)

p := openai.New("gpt-4o-mini", os.Getenv("OPENAI_API_KEY"))

resp, err := p.Complete(ctx, llmbridge.Request{
    System:   "You are a helpful assistant.",
    Messages: []llmbridge.Message{
        {Role: "user", Content: "What is the capital of France?"},
    },
})
fmt.Println(resp.Content) // Paris

Supported Providers

Provider Package Constructor
OpenAI llms/openai openai.New(model, key)
Anthropic llms/anthropic anthropic.New(model, key)
Azure OpenAI llms/azure azure.New(model, endpoint, key)
AWS Bedrock llms/bedrock bedrock.New(model, region)
Cohere llms/cohere cohere.New(model, key)
Google Gemini llms/gemini gemini.New(model, key)
Ollama llms/compatible compatible.NewOllama(model)
LM Studio llms/compatible compatible.NewLMStudio(model)
Groq llms/compatible compatible.NewGroq(model, key)
Together AI llms/compatible compatible.NewTogetherAI(model, key)
xAI / Grok llms/compatible compatible.NewXAI(model, key)
Any OpenAI-compat llms/compatible compatible.NewCompatible(name, url, key, model)

Usage

Streaming
ch, err := provider.Stream(ctx, llmbridge.Request{
    Messages: []llmbridge.Message{{Role: "user", Content: "Tell me a story."}},
})
for delta := range ch {
    if delta.Err != nil { /* handle */ }
    if delta.Done { break }
    fmt.Print(delta.Content)
}
Tool Use (Function Calling)
tools := []llmbridge.Tool{{
    Name:        "get_weather",
    Description: "Get current weather for a city.",
    Parameters: llmbridge.Schema{
        Type: "object",
        Properties: map[string]llmbridge.Property{
            "city": {Type: "string", Description: "City name"},
        },
        Required: []string{"city"},
    },
}}

resp, err := provider.Complete(ctx, llmbridge.Request{
    Messages: []llmbridge.Message{{Role: "user", Content: "Weather in Paris?"}},
    Tools:    tools,
})
Embeddings
import "github.com/Vedanshu7/llmbridge/llms/openai"

p := openai.New("text-embedding-3-small", key)
vecs, err := p.Embed(ctx, []string{"hello world", "foo bar"})
Multi-Provider Router
router := llmbridge.NewRouter(
    []llmbridge.Provider{
        openai.New("gpt-4o", os.Getenv("OPENAI_API_KEY")),
        anthropic.New("claude-sonnet-4-6", os.Getenv("ANTHROPIC_API_KEY")),
    },
    llmbridge.WithStrategy(llmbridge.RoundRobin),
    llmbridge.WithRetryPolicy(llmbridge.DefaultRetryPolicy),
    llmbridge.WithCircuitBreaker(5, 30*time.Second),
    llmbridge.WithContextWindowFallback(true),
)

resp, err := router.Complete(ctx, req)

Routing strategies: PriorityOrder · RoundRobin · LeastLatency · LeastBusy · CostBased · Weighted

Middleware
func Logger(log *slog.Logger) llmbridge.Middleware {
    return func(ctx context.Context, req llmbridge.Request, next llmbridge.Handler) (*llmbridge.Response, error) {
        start := time.Now()
        resp, err := next(ctx, req)
        log.Info("llm call", "latency", time.Since(start), "err", err)
        return resp, err
    }
}

p := llmbridge.Chain(openai.New("gpt-4o", key), Logger(slog.Default()))
Caching
import "github.com/Vedanshu7/llmbridge/caching"

// Exact-match cache
cache := caching.NewInMemoryCache()

// Semantic cache — hits on meaning, not exact text
embedder := openai.New("text-embedding-3-small", key)
sc := caching.NewSemanticCache(cache, embedder, 0.95)
Budget & Spend Tracking
import "github.com/Vedanshu7/llmbridge/budget"

tracker := budget.NewTracker()
tracker.SetLimit("my-key", 10.00)               // $10 limit
tracker.OnAlert(func(key string, spend float64) {
    log.Printf("key %s at $%.2f", key, spend)
})

cost, _ := llmbridge.CompletionCost(resp)
if err := tracker.Record("my-key", cost); err != nil {
    // budget.ErrBudgetExceeded
}
Guardrails
import "github.com/Vedanshu7/llmbridge/guardrails"

engine, _ := guardrails.NewEngine(
    guardrails.MaxInputLength(50000),
    guardrails.BlockPIIPatterns(),
    guardrails.BlockPromptInjection(),
)

if err := engine.Check(req); err != nil {
    // handle violation
}
Cost Estimation
resp, _ := provider.Complete(ctx, req)
cost, err := llmbridge.CompletionCost(resp)
fmt.Printf("cost: $%.6f\n", cost)
OpenAI-Compatible Proxy Server

Run llmbridge as a drop-in proxy that any OpenAI SDK client can talk to:

import (
    "github.com/Vedanshu7/llmbridge/proxy"
    "github.com/Vedanshu7/llmbridge/llms/anthropic"
)

backend := anthropic.New("claude-sonnet-4-6", os.Getenv("ANTHROPIC_API_KEY"))
srv, err := proxy.NewServerWithDB(backend, "/data/llmbridge.db")

key, _ := srv.KeyStore().GenerateAPIKey([]string{"completion"})
fmt.Println("API key:", key)

srv.Start(ctx, ":8080")

Or via the CLI:

llmbridge server -config config.json -db /data/llmbridge.db

Proxy endpoints:

Method Path Auth Description
GET /health public Liveness check
GET /metrics public Prometheus text metrics
GET /v1/models key List registered models
GET /v1/models/{model} key Get single model
POST /v1/chat/completions key Chat completion (streaming supported)
POST /v1/embeddings key Vector embeddings
POST /v1/audio/speech key Text-to-speech
POST /v1/moderations key Content moderation
POST /v1/batches key Create batch job
GET /v1/batches/{id} key Batch status
POST /v1/batches/{id}/cancel key Cancel batch
GET /auth/login?provider=google|github|microsoft public Start SSO flow
GET /auth/callback public SSO callback
GET /admin/ui public Web admin interface
GET /admin/stats admin Aggregated metrics
POST /admin/key/generate admin Create API key
DELETE /admin/key/delete admin Delete API key
GET /admin/keys admin List API keys
GET/POST /admin/models admin List / register models
GET/POST /admin/router admin List / deploy router configs
GET/POST /admin/aliases admin Model name aliases
GET/POST /admin/orgs admin Organizations
GET/POST /admin/teams admin Teams

Config file:

{
  "listen_addr": ":8080",
  "jwt_secret": "change-me",
  "admin_keys": ["llmb-your-admin-key"],
  "log_file": "/var/log/llmbridge.log",
  "cache_ttl_seconds": 300,
  "models": [
    {"name": "gpt-4o",  "provider": "openai",    "model": "gpt-4o"},
    {"name": "sonnet",  "provider": "anthropic",  "model": "claude-sonnet-4-6"}
  ],
  "aliases": {"fast": "gpt-4o"},
  "router": {"strategy": "round_robin", "retries": 2},
  "guardrails": {
    "max_input_length": 100000,
    "block_pii": true,
    "block_prompt_injection": true
  },
  "oidc": {
    "provider": "google",
    "client_id": "...",
    "client_secret": "...",
    "redirect_url": "http://localhost:8080/auth/callback"
  },
  "secrets": {
    "backend": "vault",
    "options": {"vault_addr": "http://vault:8200"},
    "mappings": {"OPENAI_API_KEY": "prod/openai-key"}
  },
  "orgs": [
    {"name": "Acme", "budget": 500, "teams": [{"name": "Engineering", "budget": 200}]}
  ]
}

Docker:

docker compose up
# or
docker run -p 8080:8080 \
  -e OPENAI_API_KEY \
  -v ./config.json:/config.json:ro \
  ghcr.io/vedanshu7/llmbridge server -config /config.json

Error Handling

All provider errors are typed and unwrappable:

import "github.com/Vedanshu7/llmbridge/exceptions"

resp, err := provider.Complete(ctx, req)
if err != nil {
    var authErr *exceptions.AuthenticationError
    var rlErr   *exceptions.RateLimitError
    var cwErr   *exceptions.ContextWindowExceededError
    switch {
    case errors.As(err, &authErr):
        log.Fatal("bad API key:", authErr.LLMProvider)
    case errors.As(err, &rlErr):
        time.Sleep(5 * time.Second)
    case errors.As(err, &cwErr):
        // switch to a model with a larger context window
    }
}

Error types: AuthenticationError · RateLimitError · TimeoutError · ContextWindowExceededError · ContentPolicyViolationError · BudgetExceededError · InternalServerError · and more.

Adding a New Provider

  1. Create llms/yourprovider/yourprovider.go — implement base.LLM:
    type Provider struct { ... }
    func (p *Provider) Name() string { return "yourprovider" }
    func (p *Provider) Complete(ctx context.Context, req types.Request) (*types.Response, error) { ... }
    
  2. Optionally implement base.Streamer (SSE), base.EmbedProvider (embeddings), or base.ImageGenerator.
  3. Add llms/yourprovider/chat/transformation.go for request/response wire-format mapping.
  4. Add llms/yourprovider/cost_calculation.go and wire it into cost_calculator.go.
  5. Open a PR — see CONTRIBUTING.md.

Contributing

Contributions are welcome! Please read CONTRIBUTING.md before opening a pull request.

Acknowledgements

Inspired by LiteLLM — a Go-native reimplementation of its core concepts (unified provider interface, proxy server, routing, caching, spend tracking, and observability). All code is written from scratch in Go.

License

MIT — © 2025 Vedanshu Joshi

Documentation

Overview

Package llmbridge provides a unified interface to multiple LLM providers.

Every provider implements the Provider interface, so you can swap between OpenAI, Anthropic, Ollama, LM Studio, or any OpenAI-compatible endpoint without changing your application code.

Quick start:

p := llmbridge.NewOpenAI("gpt-4o-mini", os.Getenv("OPENAI_API_KEY"))
resp, err := p.Complete(ctx, llmbridge.Request{
    System:   "You are a helpful assistant.",
    Messages: []llmbridge.Message{{Role: "user", Content: "Hello!"}},
})

Index

Constants

View Source
const DefaultHTTPTimeout = 60 // seconds

Default HTTP timeout for provider requests.

View Source
const Version = "0.3.0"

Version is the current module version.

Variables

View Source
var DefaultModels = map[string]string{
	"openai":     "gpt-4o-mini",
	"anthropic":  "claude-sonnet-4-6",
	"gemini":     "gemini-2.0-flash",
	"azure":      "gpt-4o",
	"cohere":     "command-r-plus-08-2024",
	"bedrock":    "anthropic.claude-3-5-sonnet-20241022-v2:0",
	"ollama":     "llama3.2",
	"groq":       "llama-3.3-70b-versatile",
	"together":   "meta-llama/Llama-3-8b-chat-hf",
	"lmstudio":   "local-model",
	"deepseek":   "deepseek-chat",
	"perplexity": "llama-3.1-sonar-large-128k-online",
	"fireworks":  "accounts/fireworks/models/llama-v3p1-70b-instruct",
	"cerebras":   "llama3.1-70b",
	"sambanova":  "Meta-Llama-3.1-70B-Instruct",
	"mistral":    "mistral-large-latest",
	"hyperbolic": "meta-llama/Meta-Llama-3.1-70B-Instruct",
	"novita":     "meta-llama/llama-3.1-70b-instruct",
	"xai":        "grok-2-latest",
}

Default model aliases used when no model is specified on a Request.

View Source
var DefaultRetryPolicy = RetryPolicy{
	MaxAttempts:  2,
	InitialDelay: time.Second,
	Multiplier:   2.0,
	MaxDelay:     8 * time.Second,
}

DefaultRetryPolicy is a sensible starting point: two attempts per provider with 1-second initial backoff doubling to at most 8 seconds.

View Source
var ModelInfoDB = map[string]ModelInfo{

	"gpt-4o": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000025, OutputCostPerToken: 0.000010,
	},
	"gpt-4o-mini": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000015, OutputCostPerToken: 0.0000006,
	},
	"gpt-4-turbo": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.00001, OutputCostPerToken: 0.00003,
	},
	"gpt-3.5-turbo": {
		MaxTokens: 16385, MaxInputTokens: 16385,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000005, OutputCostPerToken: 0.0000015,
	},
	"o1": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.000015, OutputCostPerToken: 0.00006,
	},

	"claude-opus-4-7": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.000015, OutputCostPerToken: 0.000075,
	},
	"claude-sonnet-4-6": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.000003, OutputCostPerToken: 0.000015,
	},
	"claude-haiku-4-5-20251001": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000008, OutputCostPerToken: 0.000004,
	},
	"claude-3-5-sonnet-20241022": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.000003, OutputCostPerToken: 0.000015,
	},
	"claude-3-haiku-20240307": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000025, OutputCostPerToken: 0.00000125,
	},

	"gemini-2.0-flash": {
		MaxTokens: 1048576, MaxInputTokens: 1048576,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000001, OutputCostPerToken: 0.0000004,
	},
	"gemini-1.5-pro": {
		MaxTokens: 2097152, MaxInputTokens: 2097152,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000125, OutputCostPerToken: 0.000005,
	},
	"gemini-1.5-flash": {
		MaxTokens: 1048576, MaxInputTokens: 1048576,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.000000075, OutputCostPerToken: 0.0000003,
	},

	"command-r-plus-08-2024": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000025, OutputCostPerToken: 0.00001,
	},
	"command-r-08-2024": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000015, OutputCostPerToken: 0.0000006,
	},

	"deepseek-chat": {
		MaxTokens: 65536, MaxInputTokens: 65536,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000027, OutputCostPerToken: 0.0000011,
	},
	"deepseek-coder": {
		MaxTokens: 65536, MaxInputTokens: 65536,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000014, OutputCostPerToken: 0.00000028,
	},

	"mistral-large-latest": {
		MaxTokens: 131072, MaxInputTokens: 131072,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.000003, OutputCostPerToken: 0.000009,
	},
	"mistral-small-latest": {
		MaxTokens: 131072, MaxInputTokens: 131072,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.000001, OutputCostPerToken: 0.000003,
	},

	"llama-3.3-70b-versatile": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000059, OutputCostPerToken: 0.00000079,
	},
	"llama-3.1-8b-instant": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000005, OutputCostPerToken: 0.00000008,
	},
}

ModelInfoDB is a static registry of known models and their capabilities. Sourced from public provider documentation; update as new models are released.

View Source
var SupportedProviders = []string{
	"openai",
	"anthropic",
	"gemini",
	"azure",
	"cohere",
	"bedrock",
	"ollama",
	"lmstudio",
	"groq",
	"together",
	"deepseek",
	"perplexity",
	"fireworks",
	"cerebras",
	"sambanova",
	"mistral",
	"hyperbolic",
	"novita",
	"xai",
}

SupportedProviders lists the built-in provider names.

Functions

func AComplete

func AComplete(ctx context.Context, p Provider, req Request) <-chan AsyncResult

AComplete sends a completion request asynchronously and returns a channel that will receive exactly one AsyncResult.

func CompletionCost

func CompletionCost(resp *types.Response) (float64, error)

CompletionCost calculates the estimated cost in USD for a completed response. It dispatches to the provider-specific pricing table based on resp.Provider. Returns 0 and an error if the provider or model is not in the pricing tables.

func Embed

func Embed(ctx context.Context, p EmbedProvider, texts []string) ([][]float64, error)

Embed generates vector embeddings using the given EmbedProvider.

func EmbeddingCost

func EmbeddingCost(provider, model string, tokens int) (float64, error)

EmbeddingCost calculates the cost for an embedding request. provider is the provider name; tokens is the input token count.

func ResolveModel

func ResolveModel(req types.Request, providerName string) string

ResolveModel returns req.Model if non-empty, otherwise the provider's default.

func SanitizeRequest

func SanitizeRequest(req types.Request) types.Request

SanitizeRequest applies provider-safe defaults to req and trims whitespace. It does not mutate the original; it returns a copy.

func ValidateModel

func ValidateModel(modelName string) bool

ValidateModel returns true if modelName is in the built-in registry.

Types

type AsyncResult

type AsyncResult = types.AsyncResult

AsyncResult wraps a Response and error for async operations.

type BatchResult

type BatchResult = types.BatchResult

BatchResult holds the outcome of one request in a BatchComplete call.

func BatchComplete

func BatchComplete(ctx context.Context, p Provider, reqs []Request) []BatchResult

BatchComplete sends all requests concurrently and returns one BatchResult per request. Results are ordered by their original index regardless of completion order.

type CallType

type CallType = types.CallType

CallType identifies the kind of LLM operation.

type Delta

type Delta = types.Delta

Delta is a single token or structured fragment emitted during streaming.

type EmbedProvider

type EmbedProvider = base.EmbedProvider

EmbedProvider is the optional interface for embedding generation.

type ErrAuth

ErrAuth indicates an authentication or authorization failure. Deprecated: use exceptions.AuthenticationError directly.

type ErrProvider

type ErrProvider = exceptions.ProviderError

ErrProvider wraps a provider-level failure. Deprecated: use exceptions.ProviderError directly.

type ErrRateLimit

type ErrRateLimit = exceptions.RateLimitError

ErrRateLimit indicates the provider throttled the request. Deprecated: use exceptions.RateLimitError directly.

type ErrTimeout

type ErrTimeout = exceptions.TimeoutError

ErrTimeout indicates the request exceeded the HTTP deadline. Deprecated: use exceptions.TimeoutError directly.

type GeneratedImage

type GeneratedImage = types.GeneratedImage

GeneratedImage is a single image returned by an image generation call.

type Handler

type Handler func(ctx context.Context, req types.Request) (*types.Response, error)

Handler is the inner function type used within a middleware chain.

type HealthStatus

type HealthStatus struct {
	Healthy       bool
	LastCheck     time.Time
	LastError     error
	Failures      int       // consecutive failure count (reset on success)
	CooldownUntil time.Time // skip provider until this time (circuit breaker)
}

HealthStatus records the last known health of a provider.

type ImageGenerator

type ImageGenerator = base.ImageGenerator

ImageGenerator is the optional interface for image generation.

type ImageRequest

type ImageRequest = types.ImageRequest

ImageRequest is the input to an image generation call.

type ImageResponse

type ImageResponse = types.ImageResponse

ImageResponse is the output from an image generation call.

func ImageGenerate

func ImageGenerate(ctx context.Context, p ImageGenerator, req ImageRequest) (*ImageResponse, error)

ImageGenerate generates images from a text prompt using the given ImageGenerator.

type Message

type Message = types.Message

Message is a single turn in a conversation.

type Middleware

type Middleware func(ctx context.Context, req types.Request, next Handler) (*types.Response, error)

Middleware wraps a Handler to add cross-cutting behavior such as logging, metrics, caching, request transformation, or response post-processing.

func Logger(log *slog.Logger) llmbridge.Middleware {
    return func(ctx context.Context, req llmbridge.Request, next llmbridge.Handler) (*llmbridge.Response, error) {
        log.Info("llm request", "provider", ctx.Value("provider"))
        resp, err := next(ctx, req)
        log.Info("llm response", "tokens", len(resp.Content), "err", err)
        return resp, err
    }
}

type ModelInfo

type ModelInfo = types.ModelInfo

ModelInfo describes the capabilities and pricing of a specific model.

func GetModelInfo

func GetModelInfo(modelName string) (ModelInfo, bool)

GetModelInfo looks up metadata for a known model. Returns (ModelInfo{}, false) for unrecognized model names.

type ModerationRequest

type ModerationRequest = types.ModerationRequest

ModerationRequest is the input to a content moderation call.

type ModerationResponse

type ModerationResponse = types.ModerationResponse

ModerationResponse is the output from a content moderation call.

func Moderate

Moderate classifies content for policy violations using the given Moderator.

type ModerationResult

type ModerationResult = types.ModerationResult

ModerationResult is the moderation verdict for a single input.

type Moderator

type Moderator = base.Moderator

Moderator is the optional interface for content moderation.

type Property

type Property = types.Property

Property is a single parameter in a Schema.

type Provider

type Provider = base.LLM

Provider is the unified interface every LLM backend must satisfy.

func Chain

func Chain(provider Provider, mw ...Middleware) Provider

Chain wraps provider with the given middleware in order: the first middleware in the slice is the outermost (first to run on a request, last on a response). The returned Provider satisfies the Provider interface; it is NOT a Streamer even when the inner provider implements streaming.

type Request

type Request = types.Request

Request is the normalized, provider-agnostic input to any LLM.

type RerankRequest

type RerankRequest = types.RerankRequest

RerankRequest is the input to a document reranking call.

type RerankResponse

type RerankResponse = types.RerankResponse

RerankResponse is the output from a document reranking call.

func Rerank

func Rerank(ctx context.Context, p Reranker, req RerankRequest) (*RerankResponse, error)

Rerank reorders documents by relevance to a query using the given Reranker.

type RerankResult

type RerankResult = types.RerankResult

RerankResult is a single ranked document in a RerankResponse.

type Reranker

type Reranker = base.Reranker

Reranker is the optional interface for document reranking.

type Response

type Response = types.Response

Response is the normalized output from any provider.

func Complete

func Complete(ctx context.Context, p Provider, req Request) (*Response, error)

Complete sends a blocking completion request using the given provider. This is a package-level convenience wrapper around provider.Complete.

type RetryPolicy

type RetryPolicy struct {
	// MaxAttempts is the number of tries per provider. 1 = no retry.
	MaxAttempts int

	// InitialDelay before the first retry.
	InitialDelay time.Duration

	// Multiplier applied to the delay on each subsequent retry.
	Multiplier float64

	// MaxDelay caps the backoff growth.
	MaxDelay time.Duration
}

RetryPolicy controls per-provider retry behavior inside the Router.

type Router

type Router struct {
	// contains filtered or unexported fields
}

Router dispatches requests across multiple Provider instances with automatic failover and load balancing. It implements Provider itself.

func NewRouter

func NewRouter(providers []Provider, opts ...RouterOption) *Router

NewRouter returns a Router that dispatches across the given providers.

func NewTagRouter

func NewTagRouter(providers []TaggedProvider, opts ...RouterOption) *Router

NewTagRouter returns a Router where each provider carries routing tags and optional weights. Use WithRequiredTags to filter providers by tag at request time. Use WithStrategy(Weighted) to route proportionally by Weight.

func (*Router) Complete

func (r *Router) Complete(ctx context.Context, req types.Request) (*types.Response, error)

Complete implements Provider.

func (*Router) Name

func (r *Router) Name() string

Name implements Provider.

func (*Router) Stop

func (r *Router) Stop()

Stop cancels the health check goroutine if one was started.

func (*Router) ValidateEnvironment

func (r *Router) ValidateEnvironment() error

ValidateEnvironment implements Provider.

type RouterOption

type RouterOption func(*Router)

RouterOption configures a Router.

func WithAutoVisionRouting

func WithAutoVisionRouting() RouterOption

WithAutoVisionRouting makes the router prefer providers tagged "vision" when the incoming request contains image_url content parts. Falls back to all eligible providers if none are tagged "vision".

func WithCircuitBreaker

func WithCircuitBreaker(threshold int, cooldown time.Duration) RouterOption

WithCircuitBreaker enables the circuit breaker. After threshold consecutive failures on a provider, it is placed in cooldown for the given duration. Set threshold to 0 to disable (default).

func WithContentPolicyFallback

func WithContentPolicyFallback(enabled bool) RouterOption

WithContentPolicyFallback enables failover when a provider returns a ContentPolicyViolationError, trying the next provider in the order. Useful when providers have different content policies and a stricter provider is listed first.

func WithContextWindowFallback

func WithContextWindowFallback(enabled bool) RouterOption

WithContextWindowFallback enables failover when a provider returns ContextWindowExceededError, trying the next provider in the order.

func WithHealthChecks

func WithHealthChecks(interval time.Duration) RouterOption

WithHealthChecks starts a background goroutine that calls ValidateEnvironment() on each provider every interval. Providers that error are marked unhealthy and skipped in routing until they recover.

func WithMaxCostPerRequest

func WithMaxCostPerRequest(dollars float64) RouterOption

WithMaxCostPerRequest limits each request to the given USD budget. The router estimates input cost from message length and the request model's pricing entry. Requests that are estimated to exceed the budget are rejected with an error before any provider is contacted.

func WithRequiredTags

func WithRequiredTags(tags []string) RouterOption

WithRequiredTags restricts routing to providers whose tag set is a superset of all the given tags. Only meaningful when using NewTagRouter.

func WithRetryPolicy

func WithRetryPolicy(p RetryPolicy) RouterOption

WithRetryPolicy sets the per-provider retry policy.

func WithRoutingGroups

func WithRoutingGroups(groups []RoutingGroup) RouterOption

WithRoutingGroups registers named routing groups for per-model strategies.

func WithStrategy

func WithStrategy(s Strategy) RouterOption

WithStrategy sets the selection strategy.

func WithTrafficSplit

func WithTrafficSplit(groups []TrafficSplitGroup) RouterOption

WithTrafficSplit configures explicit experiment arms and switches the strategy to TrafficSplit. Each group specifies a provider index and relative weight. On every request one arm is chosen by weighted random selection; the remaining eligible providers serve as ordered fallbacks if that arm fails.

func WithWeightedStrategy

func WithWeightedStrategy() RouterOption

WithWeightedStrategy is a convenience option that sets the Weighted strategy.

type RoutingGroup

type RoutingGroup struct {
	Name      string
	Providers []Provider
	Strategy  Strategy
	Policy    RetryPolicy
}

RoutingGroup defines a named group of providers with a dedicated routing strategy. Useful when different models need different failover behavior.

type Schema

type Schema = types.Schema

Schema is the JSON Schema definition of tool parameters.

type Session

type Session struct {
	ID        string          `json:"id"`
	CreatedAt time.Time       `json:"created_at"`
	UpdatedAt time.Time       `json:"updated_at"`
	Provider  string          `json:"provider"`
	Model     string          `json:"model"`
	Messages  []types.Message `json:"messages"`
}

Session stores a conversation history that can be saved to disk and resumed in future processes, similar to claude --continue.

func ListSessions

func ListSessions() ([]*Session, error)

ListSessions returns all saved sessions sorted by creation time (newest first).

func LoadLatestSession

func LoadLatestSession() (*Session, error)

LoadLatestSession loads the most recently saved session. Returns (nil, nil) if no sessions have been saved yet.

func LoadSession

func LoadSession(id string) (*Session, error)

LoadSession loads a session by its ID from disk.

func NewSession

func NewSession(providerName, model string) *Session

NewSession creates an empty session for the given provider and model.

func (*Session) Add

func (s *Session) Add(msg types.Message)

Add appends a message to the session and updates UpdatedAt.

func (*Session) Save

func (s *Session) Save() error

Save writes the session to disk and updates the "latest" pointer.

type SpeechProvider

type SpeechProvider = base.SpeechProvider

SpeechProvider is the optional interface for text-to-speech.

type SpeechRequest

type SpeechRequest = types.SpeechRequest

SpeechRequest is the input to a text-to-speech call.

type SpeechResponse

type SpeechResponse = types.SpeechResponse

SpeechResponse is the output from a text-to-speech call.

func Speech

Speech converts text to audio using the given SpeechProvider.

type Strategy

type Strategy int

Strategy controls how the Router picks a provider for each request.

const (
	// PriorityOrder tries providers in declaration order, failing over on retryable errors.
	PriorityOrder Strategy = iota

	// RoundRobin distributes requests evenly across all providers.
	RoundRobin

	// LeastLatency routes to the provider with the lowest EMA latency.
	LeastLatency

	// LeastBusy routes to the provider currently handling the fewest requests.
	LeastBusy

	// UsageBased routes based on observed token/request metrics.
	UsageBased

	// CostBased routes to minimize estimated cost per request.
	CostBased

	// Weighted distributes traffic proportionally to each provider's Weight field.
	Weighted

	// TrafficSplit routes by explicit percentage splits across labeled experiment groups.
	// Configure via WithTrafficSplit.
	TrafficSplit
)

type Streamer

type Streamer = base.Streamer

Streamer is the optional interface for token-by-token streaming.

type TaggedProvider

type TaggedProvider struct {
	Provider Provider
	Tags     []string // e.g. ["fast", "cheap", "vision"]
	Weight   int      // relative traffic weight for Weighted strategy; 0 treated as 1
}

TaggedProvider pairs a Provider with routing tags and an optional weight.

type TextCompleter

type TextCompleter = base.TextCompleter

TextCompleter is the optional interface for legacy text completion.

type TextRequest

type TextRequest = types.TextRequest

TextRequest is the input to a legacy text completion call.

type TextResponse

type TextResponse = types.TextResponse

TextResponse is the output from a legacy text completion call.

func TextComplete

func TextComplete(ctx context.Context, p TextCompleter, req TextRequest) (*TextResponse, error)

TextComplete sends a legacy (non-chat) text completion request.

type Tool

type Tool = types.Tool

Tool defines a function the model can invoke.

type ToolCall

type ToolCall = types.ToolCall

ToolCall is a single tool invocation requested by the model.

type TrafficSplitGroup

type TrafficSplitGroup struct {
	Label       string // experiment arm label (for observability)
	ProviderIdx int    // index into the Router's provider slice
	Weight      int    // relative traffic weight; 0 treated as 1
}

TrafficSplitGroup defines one labeled experiment arm for TrafficSplit routing. ProviderIdx is the index into the Router's provider slice; Weight controls how often this arm is selected relative to the others.

type Transcriber

type Transcriber = base.Transcriber

Transcriber is the optional interface for audio transcription.

type TranscriptionRequest

type TranscriptionRequest = types.TranscriptionRequest

TranscriptionRequest is the input to an audio transcription call.

type TranscriptionResponse

type TranscriptionResponse = types.TranscriptionResponse

TranscriptionResponse is the output from an audio transcription call.

func Transcribe

Transcribe converts audio to text using the given Transcriber.

type UsageData

type UsageData = types.UsageData

UsageData holds token consumption metrics.

Directories

Path Synopsis
Package budget provides per-key spend tracking and budget enforcement.
Package budget provides per-key spend tracking and budget enforcement.
Package caching provides request/response caching for llmbridge providers.
Package caching provides request/response caching for llmbridge providers.
Package callbacks provides an event-driven observability system for llmbridge.
Package callbacks provides an event-driven observability system for llmbridge.
cmd
llmbridge command
Command llmbridge is a CLI for running and managing an llmbridge proxy server.
Command llmbridge is a CLI for running and managing an llmbridge proxy server.
Package exceptions defines the error hierarchy for llmbridge provider failures.
Package exceptions defines the error hierarchy for llmbridge provider failures.
Package guardrails provides configurable safety rules for LLM requests and responses.
Package guardrails provides configurable safety rules for LLM requests and responses.
llms
anthropic
Package anthropic provides a base.LLM backed by the Anthropic Messages API (Claude Opus, Sonnet, Haiku families).
Package anthropic provides a base.LLM backed by the Anthropic Messages API (Claude Opus, Sonnet, Haiku families).
anthropic/chat
Package chat implements Anthropic Messages API request/response transformation.
Package chat implements Anthropic Messages API request/response transformation.
azure
Package azure provides a base.LLM backed by Azure OpenAI Service.
Package azure provides a base.LLM backed by Azure OpenAI Service.
base
Package base defines the core interfaces that all LLM provider implementations must satisfy.
Package base defines the core interfaces that all LLM provider implementations must satisfy.
bedrock
Package bedrock provides a base.LLM backed by AWS Bedrock Converse API.
Package bedrock provides a base.LLM backed by AWS Bedrock Converse API.
bedrock/chat
Package chat handles AWS Bedrock Converse API wire-format transformations.
Package chat handles AWS Bedrock Converse API wire-format transformations.
cohere
Package cohere provides a base.LLM backed by the Cohere API.
Package cohere provides a base.LLM backed by the Cohere API.
cohere/chat
Package chat handles Cohere API wire-format transformations.
Package chat handles Cohere API wire-format transformations.
compatible
Package compatible provides llmbridge Providers for endpoints that speak the OpenAI chat completions wire format.
Package compatible provides llmbridge Providers for endpoints that speak the OpenAI chat completions wire format.
gemini
Package gemini provides a base.LLM backed by the Google Gemini API.
Package gemini provides a base.LLM backed by the Google Gemini API.
gemini/chat
Package chat handles Google Gemini API wire-format transformations.
Package chat handles Google Gemini API wire-format transformations.
openai
Package openai provides a base.LLM backed by the OpenAI chat completions API.
Package openai provides a base.LLM backed by the OpenAI chat completions API.
openai/chat
Package chat implements OpenAI chat completions request/response transformation.
Package chat implements OpenAI chat completions request/response transformation.
Package prompttpl provides simple {{variable}} interpolation for prompt templates.
Package prompttpl provides simple {{variable}} interpolation for prompt templates.
Package proxy implements an OpenAI-compatible HTTP proxy server that dispatches requests to any llmbridge Provider backend.
Package proxy implements an OpenAI-compatible HTTP proxy server that dispatches requests to any llmbridge Provider backend.
audit
Package audit provides a fixed-size ring buffer of request audit entries for the llmbridge proxy.
Package audit provides a fixed-size ring buffer of request audit entries for the llmbridge proxy.
auth
Package auth provides API key authentication for the llmbridge proxy server.
Package auth provides API key authentication for the llmbridge proxy server.
config
Package config defines the JSON configuration file format for the llmbridge proxy server.
Package config defines the JSON configuration file format for the llmbridge proxy server.
management
Package management provides admin endpoints for the llmbridge proxy server.
Package management provides admin endpoints for the llmbridge proxy server.
metrics
Package metrics provides a minimal Prometheus-compatible /metrics endpoint for the llmbridge proxy server.
Package metrics provides a minimal Prometheus-compatible /metrics endpoint for the llmbridge proxy server.
middleware
Package middleware provides HTTP middleware for the llmbridge proxy server.
Package middleware provides HTTP middleware for the llmbridge proxy server.
persistence
Package persistence provides a SQLite-backed store for proxy state.
Package persistence provides a SQLite-backed store for proxy state.
prompts
Package prompts provides server-side prompt template storage with versioning.
Package prompts provides server-side prompt template storage with versioning.
secrets
Package secrets provides pluggable secret loading from AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault — all implemented with stdlib only.
Package secrets provides pluggable secret loading from AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault — all implemented with stdlib only.
ui
Package ui embeds the admin SPA static assets into the binary.
Package ui embeds the admin SPA static assets into the binary.
webhooks
Package webhooks provides configurable outbound webhook delivery for llmbridge proxy events.
Package webhooks provides configurable outbound webhook delivery for llmbridge proxy events.
Package tokencount provides heuristic token-count estimates for LLM requests and responses without requiring any external tokenizer library.
Package tokencount provides heuristic token-count estimates for LLM requests and responses without requiring any external tokenizer library.
Package toolbuilder provides a fluent API for constructing types.Tool values without manually assembling nested structs.
Package toolbuilder provides a fluent API for constructing types.Tool values without manually assembling nested structs.
Package types defines all core data structures shared across llmbridge packages.
Package types defines all core data structures shared across llmbridge packages.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL