llmbridge

package module

v1.0.0 Latest Latest Go to latest Published: May 20, 2026 License: MIT Imports: 22 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Vedanshu7/llmbridge

Links

Open Source Insights

README ¶

llmbridge

A unified Go interface to multiple LLM providers.

Switch between OpenAI, Anthropic, Gemini, Bedrock, Azure, Cohere, Ollama, Groq, or any OpenAI-compatible endpoint by changing one line. Your application code never changes.

Features

Unified interface — one API across all providers: chat, streaming, tool use, embeddings, TTS, image generation
Router — multi-provider failover with five strategies, weighted routing, circuit breaker, typed fallback for context-window and content-policy errors
Proxy server — OpenAI-compatible HTTP proxy; drop in front of any backend
Caching — in-memory, disk, Redis, and semantic (cosine-similarity) caches
Budget & spend tracking — per-key/org/team limits with threshold alerts
Guardrails — input/output length limits, PII detection, keyword blocking, prompt injection detection
Observability — Langfuse tracing, Prometheus metrics, JSON access logs, webhooks
Auth & multi-tenancy — API key management, SSO/OIDC (Google, GitHub, Microsoft), orgs and teams, SQLite persistence
Secret management — AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault
Deployment — Docker (multi-arch), docker-compose, Helm chart, GitHub Actions

Architecture

llmbridge/
├── llmbridge.go          # Unified interface + top-level helpers
├── router.go             # Multi-provider routing, failover, circuit breaker
├── middleware.go         # Request/response middleware chain
├── cost_calculator.go    # Per-provider cost estimation
├── session.go            # Conversation persistence
├── constants.go          # Model registry & pricing tables
│
├── types/                # Shared types (Request, Response, Message…)
├── exceptions/           # Typed error hierarchy
├── budget/               # Per-key/org spend tracking and alerts
├── caching/              # In-memory, disk, Redis, semantic caches
├── callbacks/            # Langfuse, Prometheus, JSON log, webhook handlers
├── guardrails/           # Input/output safety rules engine
├── tokencount/           # Token counting utilities
├── toolbuilder/          # Fluent builder for tool/function definitions
├── prompttpl/            # Prompt template helpers
│
├── llms/                 # Provider implementations
│   ├── base/             # LLM, Streamer, EmbedProvider, ImageGenerator interfaces
│   ├── openai/           # OpenAI (chat, embeddings, TTS, image gen, batch, files)
│   ├── anthropic/        # Anthropic Claude
│   ├── azure/            # Azure OpenAI Service
│   ├── bedrock/          # AWS Bedrock (Titan, Claude, Llama…)
│   ├── cohere/           # Cohere Command
│   ├── gemini/           # Google Gemini
│   └── compatible/       # Ollama, LM Studio, Groq, Together AI, xAI, any OpenAI-compat
│
└── proxy/                # OpenAI-compatible HTTP proxy server
    ├── auth/             # API key store, rate limiting, SSO/OIDC
    ├── audit/            # Audit logging
    ├── config/           # JSON config loader
    ├── management/       # Key, model, router, alias management endpoints
    ├── metrics/          # Prometheus collector + /metrics handler
    ├── middleware/       # HTTP access log middleware
    ├── persistence/      # SQLite-backed key/org/team store
    ├── prompts/          # Stored prompt management
    ├── secrets/          # AWS / GCP / Vault secret backends
    ├── ui/               # Embedded admin SPA
    └── webhooks/         # Outbound webhook delivery

Installation

go get github.com/Vedanshu7/llmbridge@latest

Requires Go 1.25+. The only external dependency is modernc.org/sqlite (pure-Go, no CGo), used by the proxy server for persistence.

Quick Start

import (
    "github.com/Vedanshu7/llmbridge"
    "github.com/Vedanshu7/llmbridge/llms/openai"
)

p := openai.New("gpt-4o-mini", os.Getenv("OPENAI_API_KEY"))

resp, err := p.Complete(ctx, llmbridge.Request{
    System:   "You are a helpful assistant.",
    Messages: []llmbridge.Message{
        {Role: "user", Content: "What is the capital of France?"},
    },
})
fmt.Println(resp.Content) // Paris

Supported Providers

Provider	Package	Constructor
OpenAI	`llms/openai`	`openai.New(model, key)`
Anthropic	`llms/anthropic`	`anthropic.New(model, key)`
Azure OpenAI	`llms/azure`	`azure.New(model, endpoint, key)`
AWS Bedrock	`llms/bedrock`	`bedrock.New(model, region)`
Cohere	`llms/cohere`	`cohere.New(model, key)`
Google Gemini	`llms/gemini`	`gemini.New(model, key)`
Ollama	`llms/compatible`	`compatible.NewOllama(model)`
LM Studio	`llms/compatible`	`compatible.NewLMStudio(model)`
Groq	`llms/compatible`	`compatible.NewGroq(model, key)`
Together AI	`llms/compatible`	`compatible.NewTogetherAI(model, key)`
xAI / Grok	`llms/compatible`	`compatible.NewXAI(model, key)`
Any OpenAI-compat	`llms/compatible`	`compatible.NewCompatible(name, url, key, model)`

Usage

Streaming

ch, err := provider.Stream(ctx, llmbridge.Request{
    Messages: []llmbridge.Message{{Role: "user", Content: "Tell me a story."}},
})
for delta := range ch {
    if delta.Err != nil { /* handle */ }
    if delta.Done { break }
    fmt.Print(delta.Content)
}

Tool Use (Function Calling)

tools := []llmbridge.Tool{{
    Name:        "get_weather",
    Description: "Get current weather for a city.",
    Parameters: llmbridge.Schema{
        Type: "object",
        Properties: map[string]llmbridge.Property{
            "city": {Type: "string", Description: "City name"},
        },
        Required: []string{"city"},
    },
}}

resp, err := provider.Complete(ctx, llmbridge.Request{
    Messages: []llmbridge.Message{{Role: "user", Content: "Weather in Paris?"}},
    Tools:    tools,
})

Embeddings

import "github.com/Vedanshu7/llmbridge/llms/openai"

p := openai.New("text-embedding-3-small", key)
vecs, err := p.Embed(ctx, []string{"hello world", "foo bar"})

Multi-Provider Router

router := llmbridge.NewRouter(
    []llmbridge.Provider{
        openai.New("gpt-4o", os.Getenv("OPENAI_API_KEY")),
        anthropic.New("claude-sonnet-4-6", os.Getenv("ANTHROPIC_API_KEY")),
    },
    llmbridge.WithStrategy(llmbridge.RoundRobin),
    llmbridge.WithRetryPolicy(llmbridge.DefaultRetryPolicy),
    llmbridge.WithCircuitBreaker(5, 30*time.Second),
    llmbridge.WithContextWindowFallback(true),
)

resp, err := router.Complete(ctx, req)

Routing strategies: PriorityOrder · RoundRobin · LeastLatency · LeastBusy · CostBased · Weighted

Middleware

func Logger(log *slog.Logger) llmbridge.Middleware {
    return func(ctx context.Context, req llmbridge.Request, next llmbridge.Handler) (*llmbridge.Response, error) {
        start := time.Now()
        resp, err := next(ctx, req)
        log.Info("llm call", "latency", time.Since(start), "err", err)
        return resp, err
    }
}

p := llmbridge.Chain(openai.New("gpt-4o", key), Logger(slog.Default()))

Caching

import "github.com/Vedanshu7/llmbridge/caching"

// Exact-match cache
cache := caching.NewInMemoryCache()

// Semantic cache — hits on meaning, not exact text
embedder := openai.New("text-embedding-3-small", key)
sc := caching.NewSemanticCache(cache, embedder, 0.95)

Budget & Spend Tracking

import "github.com/Vedanshu7/llmbridge/budget"

tracker := budget.NewTracker()
tracker.SetLimit("my-key", 10.00)               // $10 limit
tracker.OnAlert(func(key string, spend float64) {
    log.Printf("key %s at $%.2f", key, spend)
})

cost, _ := llmbridge.CompletionCost(resp)
if err := tracker.Record("my-key", cost); err != nil {
    // budget.ErrBudgetExceeded
}

Guardrails

import "github.com/Vedanshu7/llmbridge/guardrails"

engine, _ := guardrails.NewEngine(
    guardrails.MaxInputLength(50000),
    guardrails.BlockPIIPatterns(),
    guardrails.BlockPromptInjection(),
)

if err := engine.Check(req); err != nil {
    // handle violation
}

Cost Estimation

resp, _ := provider.Complete(ctx, req)
cost, err := llmbridge.CompletionCost(resp)
fmt.Printf("cost: $%.6f\n", cost)

OpenAI-Compatible Proxy Server

Run llmbridge as a drop-in proxy that any OpenAI SDK client can talk to:

import (
    "github.com/Vedanshu7/llmbridge/proxy"
    "github.com/Vedanshu7/llmbridge/llms/anthropic"
)

backend := anthropic.New("claude-sonnet-4-6", os.Getenv("ANTHROPIC_API_KEY"))
srv, err := proxy.NewServerWithDB(backend, "/data/llmbridge.db")

key, _ := srv.KeyStore().GenerateAPIKey([]string{"completion"})
fmt.Println("API key:", key)

srv.Start(ctx, ":8080")

Or via the CLI:

llmbridge server -config config.json -db /data/llmbridge.db

Proxy endpoints:

Method	Path	Auth	Description
`GET`	`/health`	public	Liveness check
`GET`	`/metrics`	public	Prometheus text metrics
`GET`	`/v1/models`	key	List registered models
`GET`	`/v1/models/{model}`	key	Get single model
`POST`	`/v1/chat/completions`	key	Chat completion (streaming supported)
`POST`	`/v1/embeddings`	key	Vector embeddings
`POST`	`/v1/audio/speech`	key	Text-to-speech
`POST`	`/v1/moderations`	key	Content moderation
`POST`	`/v1/batches`	key	Create batch job
`GET`	`/v1/batches/{id}`	key	Batch status
`POST`	`/v1/batches/{id}/cancel`	key	Cancel batch
`GET`	`/auth/login?provider=google\|github\|microsoft`	public	Start SSO flow
`GET`	`/auth/callback`	public	SSO callback
`GET`	`/admin/ui`	public	Web admin interface
`GET`	`/admin/stats`	admin	Aggregated metrics
`POST`	`/admin/key/generate`	admin	Create API key
`DELETE`	`/admin/key/delete`	admin	Delete API key
`GET`	`/admin/keys`	admin	List API keys
`GET/POST`	`/admin/models`	admin	List / register models
`GET/POST`	`/admin/router`	admin	List / deploy router configs
`GET/POST`	`/admin/aliases`	admin	Model name aliases
`GET/POST`	`/admin/orgs`	admin	Organizations
`GET/POST`	`/admin/teams`	admin	Teams

Config file:

{
  "listen_addr": ":8080",
  "jwt_secret": "change-me",
  "admin_keys": ["llmb-your-admin-key"],
  "log_file": "/var/log/llmbridge.log",
  "cache_ttl_seconds": 300,
  "models": [
    {"name": "gpt-4o",  "provider": "openai",    "model": "gpt-4o"},
    {"name": "sonnet",  "provider": "anthropic",  "model": "claude-sonnet-4-6"}
  ],
  "aliases": {"fast": "gpt-4o"},
  "router": {"strategy": "round_robin", "retries": 2},
  "guardrails": {
    "max_input_length": 100000,
    "block_pii": true,
    "block_prompt_injection": true
  },
  "oidc": {
    "provider": "google",
    "client_id": "...",
    "client_secret": "...",
    "redirect_url": "http://localhost:8080/auth/callback"
  },
  "secrets": {
    "backend": "vault",
    "options": {"vault_addr": "http://vault:8200"},
    "mappings": {"OPENAI_API_KEY": "prod/openai-key"}
  },
  "orgs": [
    {"name": "Acme", "budget": 500, "teams": [{"name": "Engineering", "budget": 200}]}
  ]
}

Docker:

docker compose up
# or
docker run -p 8080:8080 \
  -e OPENAI_API_KEY \
  -v ./config.json:/config.json:ro \
  ghcr.io/vedanshu7/llmbridge server -config /config.json

Error Handling

All provider errors are typed and unwrappable:

import "github.com/Vedanshu7/llmbridge/exceptions"

resp, err := provider.Complete(ctx, req)
if err != nil {
    var authErr *exceptions.AuthenticationError
    var rlErr   *exceptions.RateLimitError
    var cwErr   *exceptions.ContextWindowExceededError
    switch {
    case errors.As(err, &authErr):
        log.Fatal("bad API key:", authErr.LLMProvider)
    case errors.As(err, &rlErr):
        time.Sleep(5 * time.Second)
    case errors.As(err, &cwErr):
        // switch to a model with a larger context window
    }
}

Error types: AuthenticationError · RateLimitError · TimeoutError · ContextWindowExceededError · ContentPolicyViolationError · BudgetExceededError · InternalServerError · and more.

Adding a New Provider

Create llms/yourprovider/yourprovider.go — implement base.LLM:

type Provider struct { ... }
func (p *Provider) Name() string { return "yourprovider" }
func (p *Provider) Complete(ctx context.Context, req types.Request) (*types.Response, error) { ... }

Optionally implement base.Streamer (SSE), base.EmbedProvider (embeddings), or base.ImageGenerator.
Add llms/yourprovider/chat/transformation.go for request/response wire-format mapping.
Add llms/yourprovider/cost_calculation.go and wire it into cost_calculator.go.
Open a PR — see CONTRIBUTING.md.

Contributing

Contributions are welcome! Please read CONTRIBUTING.md before opening a pull request.

Acknowledgements

Inspired by LiteLLM — a Go-native reimplementation of its core concepts (unified provider interface, proxy server, routing, caching, spend tracking, and observability). All code is written from scratch in Go.

License

Documentation ¶

Overview ¶

Package llmbridge provides a unified interface to multiple LLM providers.

Every provider implements the Provider interface, so you can swap between OpenAI, Anthropic, Ollama, LM Studio, or any OpenAI-compatible endpoint without changing your application code.

Quick start:

p := llmbridge.NewOpenAI("gpt-4o-mini", os.Getenv("OPENAI_API_KEY"))
resp, err := p.Complete(ctx, llmbridge.Request{
    System:   "You are a helpful assistant.",
    Messages: []llmbridge.Message{{Role: "user", Content: "Hello!"}},
})

Index ¶

Constants
Variables
func AComplete(ctx context.Context, p Provider, req Request) <-chan AsyncResult
func CompletionCost(resp *types.Response) (float64, error)
func Embed(ctx context.Context, p EmbedProvider, texts []string) ([][]float64, error)
func EmbeddingCost(provider, model string, tokens int) (float64, error)
func ResolveModel(req types.Request, providerName string) string
func SanitizeRequest(req types.Request) types.Request
func ValidateModel(modelName string) bool
type AsyncResult
type BatchResult
- func BatchComplete(ctx context.Context, p Provider, reqs []Request) []BatchResult
type CallType
type Delta
type EmbedProvider
type ErrAuth
type ErrProvider
type ErrRateLimit
type ErrTimeout
type GeneratedImage
type Handler
type HealthStatus
type ImageGenerator
type ImageRequest
type ImageResponse
- func ImageGenerate(ctx context.Context, p ImageGenerator, req ImageRequest) (*ImageResponse, error)
type Message
type Middleware
type ModelInfo
- func GetModelInfo(modelName string) (ModelInfo, bool)
type ModerationRequest
type ModerationResponse
- func Moderate(ctx context.Context, p Moderator, req ModerationRequest) (*ModerationResponse, error)
type ModerationResult
type Moderator
type Property
type Provider
- func Chain(provider Provider, mw ...Middleware) Provider
type Request
type RerankRequest
type RerankResponse
- func Rerank(ctx context.Context, p Reranker, req RerankRequest) (*RerankResponse, error)
type RerankResult
type Reranker
type Response
- func Complete(ctx context.Context, p Provider, req Request) (*Response, error)
type RetryPolicy
type Router
- func NewRouter(providers []Provider, opts ...RouterOption) *Router
- func NewTagRouter(providers []TaggedProvider, opts ...RouterOption) *Router
- func (r *Router) Complete(ctx context.Context, req types.Request) (*types.Response, error)
- func (r *Router) Name() string
- func (r *Router) Stop()
- func (r *Router) ValidateEnvironment() error
type RouterOption
- func WithAutoVisionRouting() RouterOption
- func WithCircuitBreaker(threshold int, cooldown time.Duration) RouterOption
- func WithContentPolicyFallback(enabled bool) RouterOption
- func WithContextWindowFallback(enabled bool) RouterOption
- func WithHealthChecks(interval time.Duration) RouterOption
- func WithMaxCostPerRequest(dollars float64) RouterOption
- func WithRequiredTags(tags []string) RouterOption
- func WithRetryPolicy(p RetryPolicy) RouterOption
- func WithRoutingGroups(groups []RoutingGroup) RouterOption
- func WithStrategy(s Strategy) RouterOption
- func WithTrafficSplit(groups []TrafficSplitGroup) RouterOption
- func WithWeightedStrategy() RouterOption
type RoutingGroup
type Schema
type Session
- func ListSessions() ([]*Session, error)
- func LoadLatestSession() (*Session, error)
- func LoadSession(id string) (*Session, error)
- func NewSession(providerName, model string) *Session
- func (s *Session) Add(msg types.Message)
- func (s *Session) Save() error
type SpeechProvider
type SpeechRequest
type SpeechResponse
- func Speech(ctx context.Context, p SpeechProvider, req SpeechRequest) (*SpeechResponse, error)
type Strategy
type Streamer
type TaggedProvider
type TextCompleter
type TextRequest
type TextResponse
- func TextComplete(ctx context.Context, p TextCompleter, req TextRequest) (*TextResponse, error)
type Tool
type ToolCall
type TrafficSplitGroup
type Transcriber
type TranscriptionRequest
type TranscriptionResponse
- func Transcribe(ctx context.Context, p Transcriber, req TranscriptionRequest) (*TranscriptionResponse, error)
type UsageData

Constants ¶

View Source

const DefaultHTTPTimeout = 60 // seconds

Default HTTP timeout for provider requests.

View Source

const Version = "0.3.0"

Version is the current module version.

Variables ¶

View Source

var DefaultModels = map[string]string{
	"openai":     "gpt-4o-mini",
	"anthropic":  "claude-sonnet-4-6",
	"gemini":     "gemini-2.0-flash",
	"azure":      "gpt-4o",
	"cohere":     "command-r-plus-08-2024",
	"bedrock":    "anthropic.claude-3-5-sonnet-20241022-v2:0",
	"ollama":     "llama3.2",
	"groq":       "llama-3.3-70b-versatile",
	"together":   "meta-llama/Llama-3-8b-chat-hf",
	"lmstudio":   "local-model",
	"deepseek":   "deepseek-chat",
	"perplexity": "llama-3.1-sonar-large-128k-online",
	"fireworks":  "accounts/fireworks/models/llama-v3p1-70b-instruct",
	"cerebras":   "llama3.1-70b",
	"sambanova":  "Meta-Llama-3.1-70B-Instruct",
	"mistral":    "mistral-large-latest",
	"hyperbolic": "meta-llama/Meta-Llama-3.1-70B-Instruct",
	"novita":     "meta-llama/llama-3.1-70b-instruct",
	"xai":        "grok-2-latest",
}

Default model aliases used when no model is specified on a Request.

View Source

var DefaultRetryPolicy = RetryPolicy{
	MaxAttempts:  2,
	InitialDelay: time.Second,
	Multiplier:   2.0,
	MaxDelay:     8 * time.Second,
}

DefaultRetryPolicy is a sensible starting point: two attempts per provider with 1-second initial backoff doubling to at most 8 seconds.

View Source

var ModelInfoDB = map[string]ModelInfo{

	"gpt-4o": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000025, OutputCostPerToken: 0.000010,
	},
	"gpt-4o-mini": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000015, OutputCostPerToken: 0.0000006,
	},
	"gpt-4-turbo": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.00001, OutputCostPerToken: 0.00003,
	},
	"gpt-3.5-turbo": {
		MaxTokens: 16385, MaxInputTokens: 16385,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000005, OutputCostPerToken: 0.0000015,
	},
	"o1": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.000015, OutputCostPerToken: 0.00006,
	},

	"claude-opus-4-7": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.000015, OutputCostPerToken: 0.000075,
	},
	"claude-sonnet-4-6": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.000003, OutputCostPerToken: 0.000015,
	},
	"claude-haiku-4-5-20251001": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000008, OutputCostPerToken: 0.000004,
	},
	"claude-3-5-sonnet-20241022": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.000003, OutputCostPerToken: 0.000015,
	},
	"claude-3-haiku-20240307": {
		MaxTokens: 200000, MaxInputTokens: 200000,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000025, OutputCostPerToken: 0.00000125,
	},

	"gemini-2.0-flash": {
		MaxTokens: 1048576, MaxInputTokens: 1048576,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000001, OutputCostPerToken: 0.0000004,
	},
	"gemini-1.5-pro": {
		MaxTokens: 2097152, MaxInputTokens: 2097152,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000125, OutputCostPerToken: 0.000005,
	},
	"gemini-1.5-flash": {
		MaxTokens: 1048576, MaxInputTokens: 1048576,
		SupportsFunctionCalling: true, SupportsVision: true, SupportsStreaming: true,
		InputCostPerToken: 0.000000075, OutputCostPerToken: 0.0000003,
	},

	"command-r-plus-08-2024": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.0000025, OutputCostPerToken: 0.00001,
	},
	"command-r-08-2024": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000015, OutputCostPerToken: 0.0000006,
	},

	"deepseek-chat": {
		MaxTokens: 65536, MaxInputTokens: 65536,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000027, OutputCostPerToken: 0.0000011,
	},
	"deepseek-coder": {
		MaxTokens: 65536, MaxInputTokens: 65536,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000014, OutputCostPerToken: 0.00000028,
	},

	"mistral-large-latest": {
		MaxTokens: 131072, MaxInputTokens: 131072,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.000003, OutputCostPerToken: 0.000009,
	},
	"mistral-small-latest": {
		MaxTokens: 131072, MaxInputTokens: 131072,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.000001, OutputCostPerToken: 0.000003,
	},

	"llama-3.3-70b-versatile": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000059, OutputCostPerToken: 0.00000079,
	},
	"llama-3.1-8b-instant": {
		MaxTokens: 128000, MaxInputTokens: 128000,
		SupportsFunctionCalling: true, SupportsStreaming: true,
		InputCostPerToken: 0.00000005, OutputCostPerToken: 0.00000008,
	},
}

ModelInfoDB is a static registry of known models and their capabilities. Sourced from public provider documentation; update as new models are released.

View Source

var SupportedProviders = []string{
	"openai",
	"anthropic",
	"gemini",
	"azure",
	"cohere",
	"bedrock",
	"ollama",
	"lmstudio",
	"groq",
	"together",
	"deepseek",
	"perplexity",
	"fireworks",
	"cerebras",
	"sambanova",
	"mistral",
	"hyperbolic",
	"novita",
	"xai",
}

SupportedProviders lists the built-in provider names.

Functions ¶

func AComplete ¶

func AComplete(ctx context.Context, p Provider, req Request) <-chan AsyncResult

AComplete sends a completion request asynchronously and returns a channel that will receive exactly one AsyncResult.

func CompletionCost ¶

func CompletionCost(resp *types.Response) (float64, error)

CompletionCost calculates the estimated cost in USD for a completed response. It dispatches to the provider-specific pricing table based on resp.Provider. Returns 0 and an error if the provider or model is not in the pricing tables.

func Embed ¶

func Embed(ctx context.Context, p EmbedProvider, texts []string) ([][]float64, error)

Embed generates vector embeddings using the given EmbedProvider.

func EmbeddingCost ¶

func EmbeddingCost(provider, model string, tokens int) (float64, error)

EmbeddingCost calculates the cost for an embedding request. provider is the provider name; tokens is the input token count.

func ResolveModel ¶

func ResolveModel(req types.Request, providerName string) string

ResolveModel returns req.Model if non-empty, otherwise the provider's default.

func SanitizeRequest ¶

func SanitizeRequest(req types.Request) types.Request

SanitizeRequest applies provider-safe defaults to req and trims whitespace. It does not mutate the original; it returns a copy.

func ValidateModel ¶

func ValidateModel(modelName string) bool

ValidateModel returns true if modelName is in the built-in registry.

Types ¶

type AsyncResult ¶

type AsyncResult = types.AsyncResult

AsyncResult wraps a Response and error for async operations.

type BatchResult ¶

type BatchResult = types.BatchResult

BatchResult holds the outcome of one request in a BatchComplete call.

func BatchComplete ¶

func BatchComplete(ctx context.Context, p Provider, reqs []Request) []BatchResult

BatchComplete sends all requests concurrently and returns one BatchResult per request. Results are ordered by their original index regardless of completion order.

type CallType ¶

type CallType = types.CallType

CallType identifies the kind of LLM operation.

type Delta ¶

type Delta = types.Delta

Delta is a single token or structured fragment emitted during streaming.

type EmbedProvider ¶

type EmbedProvider = base.EmbedProvider

EmbedProvider is the optional interface for embedding generation.

type ErrAuth ¶

type ErrAuth = exceptions.AuthenticationError

ErrAuth indicates an authentication or authorization failure. Deprecated: use exceptions.AuthenticationError directly.

type ErrProvider ¶

type ErrProvider = exceptions.ProviderError

ErrProvider wraps a provider-level failure. Deprecated: use exceptions.ProviderError directly.

type ErrRateLimit ¶

type ErrRateLimit = exceptions.RateLimitError

ErrRateLimit indicates the provider throttled the request. Deprecated: use exceptions.RateLimitError directly.

type ErrTimeout ¶

type ErrTimeout = exceptions.TimeoutError

ErrTimeout indicates the request exceeded the HTTP deadline. Deprecated: use exceptions.TimeoutError directly.

type GeneratedImage ¶

type GeneratedImage = types.GeneratedImage

GeneratedImage is a single image returned by an image generation call.

type Handler ¶

type Handler func(ctx context.Context, req types.Request) (*types.Response, error)

Handler is the inner function type used within a middleware chain.

type HealthStatus ¶

type HealthStatus struct {
	Healthy       bool
	LastCheck     time.Time
	LastError     error
	Failures      int       // consecutive failure count (reset on success)
	CooldownUntil time.Time // skip provider until this time (circuit breaker)
}

HealthStatus records the last known health of a provider.

type ImageGenerator ¶

type ImageGenerator = base.ImageGenerator

ImageGenerator is the optional interface for image generation.

type ImageRequest ¶

type ImageRequest = types.ImageRequest

ImageRequest is the input to an image generation call.

type ImageResponse ¶

type ImageResponse = types.ImageResponse

ImageResponse is the output from an image generation call.

func ImageGenerate ¶

func ImageGenerate(ctx context.Context, p ImageGenerator, req ImageRequest) (*ImageResponse, error)

ImageGenerate generates images from a text prompt using the given ImageGenerator.

type Message ¶

type Message = types.Message

Message is a single turn in a conversation.

type Middleware ¶

type Middleware func(ctx context.Context, req types.Request, next Handler) (*types.Response, error)

Middleware wraps a Handler to add cross-cutting behavior such as logging, metrics, caching, request transformation, or response post-processing.

func Logger(log *slog.Logger) llmbridge.Middleware {
    return func(ctx context.Context, req llmbridge.Request, next llmbridge.Handler) (*llmbridge.Response, error) {
        log.Info("llm request", "provider", ctx.Value("provider"))
        resp, err := next(ctx, req)
        log.Info("llm response", "tokens", len(resp.Content), "err", err)
        return resp, err
    }
}

type ModelInfo ¶

type ModelInfo = types.ModelInfo

ModelInfo describes the capabilities and pricing of a specific model.

func GetModelInfo ¶

func GetModelInfo(modelName string) (ModelInfo, bool)

GetModelInfo looks up metadata for a known model. Returns (ModelInfo{}, false) for unrecognized model names.

type ModerationRequest ¶

type ModerationRequest = types.ModerationRequest

ModerationRequest is the input to a content moderation call.

type ModerationResponse ¶

type ModerationResponse = types.ModerationResponse

ModerationResponse is the output from a content moderation call.

func Moderate ¶

func Moderate(ctx context.Context, p Moderator, req ModerationRequest) (*ModerationResponse, error)

Moderate classifies content for policy violations using the given Moderator.

type ModerationResult ¶

type ModerationResult = types.ModerationResult

ModerationResult is the moderation verdict for a single input.

type Moderator ¶

type Moderator = base.Moderator

Moderator is the optional interface for content moderation.

type Property ¶

type Property = types.Property

Property is a single parameter in a Schema.

type Provider ¶

type Provider = base.LLM

Provider is the unified interface every LLM backend must satisfy.

func Chain ¶

func Chain(provider Provider, mw ...Middleware) Provider

Chain wraps provider with the given middleware in order: the first middleware in the slice is the outermost (first to run on a request, last on a response). The returned Provider satisfies the Provider interface; it is NOT a Streamer even when the inner provider implements streaming.

type Request ¶

type Request = types.Request

Request is the normalized, provider-agnostic input to any LLM.

type RerankRequest ¶

type RerankRequest = types.RerankRequest

RerankRequest is the input to a document reranking call.

type RerankResponse ¶

type RerankResponse = types.RerankResponse

RerankResponse is the output from a document reranking call.

func Rerank ¶

func Rerank(ctx context.Context, p Reranker, req RerankRequest) (*RerankResponse, error)

Rerank reorders documents by relevance to a query using the given Reranker.

type RerankResult ¶

type RerankResult = types.RerankResult

RerankResult is a single ranked document in a RerankResponse.

type Reranker ¶

type Reranker = base.Reranker

Reranker is the optional interface for document reranking.

type Response ¶

type Response = types.Response

Response is the normalized output from any provider.

func Complete ¶

func Complete(ctx context.Context, p Provider, req Request) (*Response, error)

Complete sends a blocking completion request using the given provider. This is a package-level convenience wrapper around provider.Complete.

type RetryPolicy ¶

type RetryPolicy struct {
	// MaxAttempts is the number of tries per provider. 1 = no retry.
	MaxAttempts int

	// InitialDelay before the first retry.
	InitialDelay time.Duration

	// Multiplier applied to the delay on each subsequent retry.
	Multiplier float64

	// MaxDelay caps the backoff growth.
	MaxDelay time.Duration
}

RetryPolicy controls per-provider retry behavior inside the Router.

type Router ¶

type Router struct {
	// contains filtered or unexported fields
}

Router dispatches requests across multiple Provider instances with automatic failover and load balancing. It implements Provider itself.

func NewRouter ¶

func NewRouter(providers []Provider, opts ...RouterOption) *Router

NewRouter returns a Router that dispatches across the given providers.

func NewTagRouter ¶

func NewTagRouter(providers []TaggedProvider, opts ...RouterOption) *Router

NewTagRouter returns a Router where each provider carries routing tags and optional weights. Use WithRequiredTags to filter providers by tag at request time. Use WithStrategy(Weighted) to route proportionally by Weight.

func (*Router) Complete ¶

func (r *Router) Complete(ctx context.Context, req types.Request) (*types.Response, error)

Complete implements Provider.

func (*Router) Name ¶

func (r *Router) Name() string

Name implements Provider.

func (*Router) Stop ¶

func (r *Router) Stop()

Stop cancels the health check goroutine if one was started.

func (*Router) ValidateEnvironment ¶

func (r *Router) ValidateEnvironment() error

ValidateEnvironment implements Provider.

type RouterOption ¶

type RouterOption func(*Router)

RouterOption configures a Router.

func WithAutoVisionRouting ¶

func WithAutoVisionRouting() RouterOption

WithAutoVisionRouting makes the router prefer providers tagged "vision" when the incoming request contains image_url content parts. Falls back to all eligible providers if none are tagged "vision".

func WithCircuitBreaker ¶

func WithCircuitBreaker(threshold int, cooldown time.Duration) RouterOption

WithCircuitBreaker enables the circuit breaker. After threshold consecutive failures on a provider, it is placed in cooldown for the given duration. Set threshold to 0 to disable (default).

func WithContentPolicyFallback ¶

func WithContentPolicyFallback(enabled bool) RouterOption

WithContentPolicyFallback enables failover when a provider returns a ContentPolicyViolationError, trying the next provider in the order. Useful when providers have different content policies and a stricter provider is listed first.

func WithContextWindowFallback ¶

func WithContextWindowFallback(enabled bool) RouterOption

WithContextWindowFallback enables failover when a provider returns ContextWindowExceededError, trying the next provider in the order.

func WithHealthChecks ¶

func WithHealthChecks(interval time.Duration) RouterOption

WithHealthChecks starts a background goroutine that calls ValidateEnvironment() on each provider every interval. Providers that error are marked unhealthy and skipped in routing until they recover.

func WithMaxCostPerRequest ¶

func WithMaxCostPerRequest(dollars float64) RouterOption

WithMaxCostPerRequest limits each request to the given USD budget. The router estimates input cost from message length and the request model's pricing entry. Requests that are estimated to exceed the budget are rejected with an error before any provider is contacted.

func WithRequiredTags ¶

func WithRequiredTags(tags []string) RouterOption

WithRequiredTags restricts routing to providers whose tag set is a superset of all the given tags. Only meaningful when using NewTagRouter.

func WithRetryPolicy ¶

func WithRetryPolicy(p RetryPolicy) RouterOption

WithRetryPolicy sets the per-provider retry policy.

func WithRoutingGroups ¶

func WithRoutingGroups(groups []RoutingGroup) RouterOption

WithRoutingGroups registers named routing groups for per-model strategies.

func WithStrategy ¶

func WithStrategy(s Strategy) RouterOption

WithStrategy sets the selection strategy.

func WithTrafficSplit ¶

func WithTrafficSplit(groups []TrafficSplitGroup) RouterOption

WithTrafficSplit configures explicit experiment arms and switches the strategy to TrafficSplit. Each group specifies a provider index and relative weight. On every request one arm is chosen by weighted random selection; the remaining eligible providers serve as ordered fallbacks if that arm fails.

func WithWeightedStrategy ¶

func WithWeightedStrategy() RouterOption

WithWeightedStrategy is a convenience option that sets the Weighted strategy.

type RoutingGroup ¶

type RoutingGroup struct {
	Name      string
	Providers []Provider
	Strategy  Strategy
	Policy    RetryPolicy
}

RoutingGroup defines a named group of providers with a dedicated routing strategy. Useful when different models need different failover behavior.

type Schema ¶

type Schema = types.Schema

Schema is the JSON Schema definition of tool parameters.

type Session ¶

type Session struct {
	ID        string          `json:"id"`
	CreatedAt time.Time       `json:"created_at"`
	UpdatedAt time.Time       `json:"updated_at"`
	Provider  string          `json:"provider"`
	Model     string          `json:"model"`
	Messages  []types.Message `json:"messages"`
}

Session stores a conversation history that can be saved to disk and resumed in future processes, similar to claude --continue.

func ListSessions ¶

func ListSessions() ([]*Session, error)

ListSessions returns all saved sessions sorted by creation time (newest first).

func LoadLatestSession ¶

func LoadLatestSession() (*Session, error)

LoadLatestSession loads the most recently saved session. Returns (nil, nil) if no sessions have been saved yet.

func LoadSession ¶

func LoadSession(id string) (*Session, error)

LoadSession loads a session by its ID from disk.

func NewSession ¶

func NewSession(providerName, model string) *Session

NewSession creates an empty session for the given provider and model.

func (*Session) Add ¶

func (s *Session) Add(msg types.Message)

Add appends a message to the session and updates UpdatedAt.

func (*Session) Save ¶

func (s *Session) Save() error

Save writes the session to disk and updates the "latest" pointer.

type SpeechProvider ¶

type SpeechProvider = base.SpeechProvider

SpeechProvider is the optional interface for text-to-speech.

type SpeechRequest ¶

type SpeechRequest = types.SpeechRequest

SpeechRequest is the input to a text-to-speech call.

type SpeechResponse ¶

type SpeechResponse = types.SpeechResponse

SpeechResponse is the output from a text-to-speech call.

func Speech ¶

func Speech(ctx context.Context, p SpeechProvider, req SpeechRequest) (*SpeechResponse, error)

Speech converts text to audio using the given SpeechProvider.

type Strategy ¶

type Strategy int

Strategy controls how the Router picks a provider for each request.

const (
	// PriorityOrder tries providers in declaration order, failing over on retryable errors.
	PriorityOrder Strategy = iota

	// RoundRobin distributes requests evenly across all providers.
	RoundRobin

	// LeastLatency routes to the provider with the lowest EMA latency.
	LeastLatency

	// LeastBusy routes to the provider currently handling the fewest requests.
	LeastBusy

	// UsageBased routes based on observed token/request metrics.
	UsageBased

	// CostBased routes to minimize estimated cost per request.
	CostBased

	// Weighted distributes traffic proportionally to each provider's Weight field.
	Weighted

	// TrafficSplit routes by explicit percentage splits across labeled experiment groups.
	// Configure via WithTrafficSplit.
	TrafficSplit
)

type Streamer ¶

type Streamer = base.Streamer

Streamer is the optional interface for token-by-token streaming.

type TaggedProvider ¶

type TaggedProvider struct {
	Provider Provider
	Tags     []string // e.g. ["fast", "cheap", "vision"]
	Weight   int      // relative traffic weight for Weighted strategy; 0 treated as 1
}

TaggedProvider pairs a Provider with routing tags and an optional weight.

type TextCompleter ¶

type TextCompleter = base.TextCompleter

TextCompleter is the optional interface for legacy text completion.

type TextRequest ¶

type TextRequest = types.TextRequest

TextRequest is the input to a legacy text completion call.

type TextResponse ¶

type TextResponse = types.TextResponse

TextResponse is the output from a legacy text completion call.

func TextComplete ¶

func TextComplete(ctx context.Context, p TextCompleter, req TextRequest) (*TextResponse, error)

TextComplete sends a legacy (non-chat) text completion request.

type Tool ¶

type Tool = types.Tool

Tool defines a function the model can invoke.

type ToolCall ¶

type ToolCall = types.ToolCall

ToolCall is a single tool invocation requested by the model.

type TrafficSplitGroup ¶

type TrafficSplitGroup struct {
	Label       string // experiment arm label (for observability)
	ProviderIdx int    // index into the Router's provider slice
	Weight      int    // relative traffic weight; 0 treated as 1
}

TrafficSplitGroup defines one labeled experiment arm for TrafficSplit routing. ProviderIdx is the index into the Router's provider slice; Weight controls how often this arm is selected relative to the others.

type Transcriber ¶

type Transcriber = base.Transcriber

Transcriber is the optional interface for audio transcription.

type TranscriptionRequest ¶

type TranscriptionRequest = types.TranscriptionRequest

TranscriptionRequest is the input to an audio transcription call.

type TranscriptionResponse ¶

type TranscriptionResponse = types.TranscriptionResponse

TranscriptionResponse is the output from an audio transcription call.

func Transcribe ¶

func Transcribe(ctx context.Context, p Transcriber, req TranscriptionRequest) (*TranscriptionResponse, error)

Transcribe converts audio to text using the given Transcriber.

type UsageData ¶

type UsageData = types.UsageData

UsageData holds token consumption metrics.

Source Files ¶

View all Source files

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
budget Package budget provides per-key spend tracking and budget enforcement.	Package budget provides per-key spend tracking and budget enforcement.
caching Package caching provides request/response caching for llmbridge providers.	Package caching provides request/response caching for llmbridge providers.
callbacks Package callbacks provides an event-driven observability system for llmbridge.	Package callbacks provides an event-driven observability system for llmbridge.
cmd
llmbridge command Command llmbridge is a CLI for running and managing an llmbridge proxy server.	Command llmbridge is a CLI for running and managing an llmbridge proxy server.
exceptions Package exceptions defines the error hierarchy for llmbridge provider failures.	Package exceptions defines the error hierarchy for llmbridge provider failures.
guardrails Package guardrails provides configurable safety rules for LLM requests and responses.	Package guardrails provides configurable safety rules for LLM requests and responses.
llms
anthropic Package anthropic provides a base.LLM backed by the Anthropic Messages API (Claude Opus, Sonnet, Haiku families).	Package anthropic provides a base.LLM backed by the Anthropic Messages API (Claude Opus, Sonnet, Haiku families).
anthropic/chat Package chat implements Anthropic Messages API request/response transformation.	Package chat implements Anthropic Messages API request/response transformation.
azure Package azure provides a base.LLM backed by Azure OpenAI Service.	Package azure provides a base.LLM backed by Azure OpenAI Service.
base Package base defines the core interfaces that all LLM provider implementations must satisfy.	Package base defines the core interfaces that all LLM provider implementations must satisfy.
bedrock Package bedrock provides a base.LLM backed by AWS Bedrock Converse API.	Package bedrock provides a base.LLM backed by AWS Bedrock Converse API.
bedrock/chat Package chat handles AWS Bedrock Converse API wire-format transformations.	Package chat handles AWS Bedrock Converse API wire-format transformations.
cohere Package cohere provides a base.LLM backed by the Cohere API.	Package cohere provides a base.LLM backed by the Cohere API.
cohere/chat Package chat handles Cohere API wire-format transformations.	Package chat handles Cohere API wire-format transformations.
compatible Package compatible provides llmbridge Providers for endpoints that speak the OpenAI chat completions wire format.	Package compatible provides llmbridge Providers for endpoints that speak the OpenAI chat completions wire format.
gemini Package gemini provides a base.LLM backed by the Google Gemini API.	Package gemini provides a base.LLM backed by the Google Gemini API.
gemini/chat Package chat handles Google Gemini API wire-format transformations.	Package chat handles Google Gemini API wire-format transformations.
openai Package openai provides a base.LLM backed by the OpenAI chat completions API.	Package openai provides a base.LLM backed by the OpenAI chat completions API.
openai/chat Package chat implements OpenAI chat completions request/response transformation.	Package chat implements OpenAI chat completions request/response transformation.
prompttpl Package prompttpl provides simple {{variable}} interpolation for prompt templates.	Package prompttpl provides simple {{variable}} interpolation for prompt templates.
proxy Package proxy implements an OpenAI-compatible HTTP proxy server that dispatches requests to any llmbridge Provider backend.	Package proxy implements an OpenAI-compatible HTTP proxy server that dispatches requests to any llmbridge Provider backend.
audit Package audit provides a fixed-size ring buffer of request audit entries for the llmbridge proxy.	Package audit provides a fixed-size ring buffer of request audit entries for the llmbridge proxy.
auth Package auth provides API key authentication for the llmbridge proxy server.	Package auth provides API key authentication for the llmbridge proxy server.
config Package config defines the JSON configuration file format for the llmbridge proxy server.	Package config defines the JSON configuration file format for the llmbridge proxy server.
management Package management provides admin endpoints for the llmbridge proxy server.	Package management provides admin endpoints for the llmbridge proxy server.
metrics Package metrics provides a minimal Prometheus-compatible /metrics endpoint for the llmbridge proxy server.	Package metrics provides a minimal Prometheus-compatible /metrics endpoint for the llmbridge proxy server.
middleware Package middleware provides HTTP middleware for the llmbridge proxy server.	Package middleware provides HTTP middleware for the llmbridge proxy server.
persistence Package persistence provides a SQLite-backed store for proxy state.	Package persistence provides a SQLite-backed store for proxy state.
prompts Package prompts provides server-side prompt template storage with versioning.	Package prompts provides server-side prompt template storage with versioning.
secrets Package secrets provides pluggable secret loading from AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault — all implemented with stdlib only.	Package secrets provides pluggable secret loading from AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault — all implemented with stdlib only.
ui Package ui embeds the admin SPA static assets into the binary.	Package ui embeds the admin SPA static assets into the binary.
webhooks Package webhooks provides configurable outbound webhook delivery for llmbridge proxy events.	Package webhooks provides configurable outbound webhook delivery for llmbridge proxy events.
tokencount Package tokencount provides heuristic token-count estimates for LLM requests and responses without requiring any external tokenizer library.	Package tokencount provides heuristic token-count estimates for LLM requests and responses without requiring any external tokenizer library.
toolbuilder Package toolbuilder provides a fluent API for constructing types.Tool values without manually assembling nested structs.	Package toolbuilder provides a fluent API for constructing types.Tool values without manually assembling nested structs.
types Package types defines all core data structures shared across llmbridge packages.	Package types defines all core data structures shared across llmbridge packages.