Documentation
¶
Overview ¶
Package llmrouter provides a unified interface for routing LLM requests across multiple AI providers. Write once against a single API and deploy across OpenAI, Anthropic Claude, Google Gemini, or any OpenAI-compatible service — DeepSeek, Groq, Together AI, Ollama, Sarvam, and more.
Installation ¶
go get github.com/bluefunda/llmrouter
Quick start ¶
import (
llmrouter "github.com/bluefunda/llmrouter"
"github.com/bluefunda/llmrouter/middleware"
"github.com/bluefunda/llmrouter/providers/anthropic"
"github.com/bluefunda/llmrouter/providers/openai"
)
router := llmrouter.New(
llmrouter.WithProvider("openai", openai.NewFromEnv("openai", "OPENAI_API_KEY")),
llmrouter.WithProvider("anthropic", anthropic.NewFromEnv()),
llmrouter.WithMiddleware(
middleware.Retry(3, time.Second),
middleware.Timeout(60*time.Second),
),
)
resp, err := router.Complete(ctx, &llmrouter.Request{
Model: "gpt-4o-mini",
Messages: []llmrouter.Message{{Role: llmrouter.RoleUser, Content: "Hello!"}},
})
Providers ¶
Three native provider packages are included:
- github.com/bluefunda/llmrouter/providers/openai — OpenAI (gpt-4o, gpt-4o-mini, o1, ...)
- github.com/bluefunda/llmrouter/providers/anthropic — Anthropic Claude (claude-sonnet-4, claude-haiku-4, ...)
- github.com/bluefunda/llmrouter/providers/gemini — Google Gemini (gemini-2.0-flash, gemini-2.5-pro, ...)
The openai package also covers any OpenAI-compatible API via built-in presets:
openai.NewFromEnv("deepseek", "DEEPSEEK_API_KEY") // DeepSeek
openai.NewFromEnv("groq", "GROQ_API_KEY") // Groq
openai.NewFromEnv("together", "TOGETHER_API_KEY") // Together AI
openai.NewFromEnv("ollama", "") // Ollama (local)
openai.NewFromEnv("sarvam", "SARVAM_API_KEY") // Sarvam
Streaming ¶
Use Router.Stream to receive tokens as they arrive:
stream, err := router.Stream(ctx, &llmrouter.Request{
Model: "claude-sonnet-4-20250514",
Messages: []llmrouter.Message{{Role: llmrouter.RoleUser, Content: "Write a haiku."}},
})
if err != nil {
log.Fatal(err)
}
defer stream.Close()
for stream.Next() {
event := stream.Event()
switch event.Type {
case llmrouter.EventContentDelta:
fmt.Print(event.Content)
case llmrouter.EventDone:
fmt.Println()
}
}
if err := stream.Err(); err != nil {
log.Fatal(err)
}
Fallback routing ¶
Register multiple providers and declare a fallback order. On primary failure the router tries each fallback in sequence, returning the first success:
router := llmrouter.New(
llmrouter.WithProvider("openai", openai.NewFromEnv("openai", "OPENAI_API_KEY")),
llmrouter.WithProvider("anthropic", anthropic.NewFromEnv()),
llmrouter.WithModelMapping("gpt-4o", "openai"),
llmrouter.WithFallback("anthropic"), // tried if openai fails
)
Prompt caching ¶
Mark static blocks for provider-level caching. Anthropic uses explicit cache_control annotations; OpenAI and Gemini cache automatically. Observe savings via [Usage.CachedPromptTokens] and [Usage.CacheCreationTokens]:
req := &llmrouter.Request{
Model: "claude-sonnet-4-20250514",
Messages: []llmrouter.Message{
{
Role: llmrouter.RoleSystem,
Content: longSystemPrompt, // paid once, reused on every call
CacheControl: &llmrouter.CacheControl{Type: "ephemeral"},
},
{Role: llmrouter.RoleUser, Content: userQuery},
},
}
resp, _ := router.Complete(ctx, req)
fmt.Printf("cached=%d creation=%d\n",
resp.Usage.CachedPromptTokens, resp.Usage.CacheCreationTokens)
Tool calling ¶
Pass tool definitions in the request; the model returns tool calls which your code executes and returns as RoleTool messages:
req := &llmrouter.Request{
Model: "gpt-4o-mini",
Messages: []llmrouter.Message{
{Role: llmrouter.RoleUser, Content: "What's the weather in Tokyo?"},
},
Tools: []llmrouter.Tool{weatherTool},
}
resp, _ := router.Complete(ctx, req)
if resp.Choices[0].FinishReason == "tool_calls" {
tc := resp.Choices[0].Message.ToolCalls[0]
result := callWeatherAPI(tc.Function.Arguments)
// send result back in a follow-up request
}
Middleware ¶
Middleware is applied in declaration order; each wraps the next. The github.com/bluefunda/llmrouter/middleware package provides three built-ins:
- github.com/bluefunda/llmrouter/middleware.Retry — exponential backoff on retryable errors (429, 5xx)
- github.com/bluefunda/llmrouter/middleware.Timeout — per-request context deadline
- github.com/bluefunda/llmrouter/middleware.NewCircuitBreaker — open circuit after N consecutive failures
Custom middleware is a MiddlewareFunc — a function that wraps a Provider:
func Logging(next llmrouter.Provider) llmrouter.Provider {
return &loggingProvider{Provider: next}
}
router := llmrouter.New(
llmrouter.WithMiddleware(Logging),
)
Model resolution ¶
The router resolves a model name to a provider in this order:
- Explicit mapping via WithModelMapping
- Provider name match (model name equals a registered provider name)
- Provider model list scan via Provider.Models()
Error handling ¶
Errors are classified for intelligent retry decisions. Use IsRetryable and IsRateLimited for programmatic checks, or match typed sentinels directly:
resp, err := router.Complete(ctx, req)
if errors.Is(err, llmrouter.ErrRateLimited) {
// back off and retry later
}
if errors.Is(err, llmrouter.ErrCircuitOpen) {
// provider is temporarily unavailable
}
Other sentinels: ErrUnknownModel, ErrNoProviders, ErrAuthFailed, ErrMaxRetriesExceeded.
Packages ¶
- github.com/bluefunda/llmrouter/middleware — retry, timeout, and circuit breaker middleware
- github.com/bluefunda/llmrouter/providers/openai — OpenAI and OpenAI-compatible providers (DeepSeek, Groq, Together AI, Ollama, Sarvam)
- github.com/bluefunda/llmrouter/providers/anthropic — Anthropic Claude
- github.com/bluefunda/llmrouter/providers/gemini — Google Gemini
Index ¶
- Variables
- func CalculateCost(model string, usage *Usage, prices map[string]ModelPrice) float64
- func IsRateLimited(err error) bool
- func IsRetryable(err error) bool
- type APIError
- type CacheControl
- type Choice
- type ContentPart
- type Delta
- type Document
- type Event
- type EventType
- type FuncCall
- type FuncRef
- type Function
- type ImageURL
- type Message
- type MiddlewareFunc
- type ModelPrice
- type Option
- type Provider
- type ProviderConfig
- type Request
- type Response
- type Role
- type Router
- func (r *Router) AddMiddleware(m MiddlewareFunc)
- func (r *Router) Close() error
- func (r *Router) Complete(ctx context.Context, req *Request) (*Response, error)
- func (r *Router) GetProvider(name string) (Provider, bool)
- func (r *Router) MapModel(model, provider string)
- func (r *Router) Providers() []string
- func (r *Router) RegisterProvider(name string, p Provider)
- func (r *Router) SetFallbacks(providers ...string)
- func (r *Router) Stream(ctx context.Context, req *Request) (*StreamResult, error)
- type StreamResult
- type Tool
- type ToolCall
- type ToolChoice
- type Usage
Constants ¶
This section is empty.
Variables ¶
var ( ErrUnknownModel = errors.New("unknown model") ErrUnknownProvider = errors.New("unknown provider") ErrNoProviders = errors.New("no providers registered") ErrRateLimited = errors.New("rate limited") ErrInvalidRequest = errors.New("invalid request") ErrAuthFailed = errors.New("authentication failed") ErrProviderError = errors.New("provider error") ErrCircuitOpen = errors.New("circuit breaker is open") ErrMaxRetriesExceeded = errors.New("max retries exceeded") )
Sentinel errors
var DefaultPrices = map[string]ModelPrice{
"gpt-4.1": {InputPerMillion: 2.00, OutputPerMillion: 8.00, CacheReadPerMillion: 0.50},
"gpt-4.1-mini": {InputPerMillion: 0.40, OutputPerMillion: 1.60, CacheReadPerMillion: 0.10},
"gpt-4.1-nano": {InputPerMillion: 0.10, OutputPerMillion: 0.40, CacheReadPerMillion: 0.025},
"gpt-4o": {InputPerMillion: 2.50, OutputPerMillion: 10.00, CacheReadPerMillion: 1.25},
"gpt-4o-mini": {InputPerMillion: 0.15, OutputPerMillion: 0.60, CacheReadPerMillion: 0.075},
"o4-mini": {InputPerMillion: 1.10, OutputPerMillion: 4.40, CacheReadPerMillion: 0.275},
"claude-opus-4-20250514": {InputPerMillion: 15.00, OutputPerMillion: 75.00, CacheReadPerMillion: 1.50},
"claude-sonnet-4-20250514": {InputPerMillion: 3.00, OutputPerMillion: 15.00, CacheReadPerMillion: 0.30},
"claude-3-5-haiku-20241022": {InputPerMillion: 0.80, OutputPerMillion: 4.00, CacheReadPerMillion: 0.08},
"claude-3-5-sonnet-20241022": {InputPerMillion: 3.00, OutputPerMillion: 15.00, CacheReadPerMillion: 0.30},
"claude-3-opus-20240229": {InputPerMillion: 15.00, OutputPerMillion: 75.00, CacheReadPerMillion: 1.50},
"claude-3-sonnet-20240229": {InputPerMillion: 3.00, OutputPerMillion: 15.00, CacheReadPerMillion: 0.30},
"claude-3-haiku-20240307": {InputPerMillion: 0.25, OutputPerMillion: 1.25, CacheReadPerMillion: 0.03},
"deepseek-chat": {InputPerMillion: 0.07, OutputPerMillion: 1.10},
"deepseek-coder": {InputPerMillion: 0.07, OutputPerMillion: 1.10},
"gemini-2.5-pro": {InputPerMillion: 1.25, OutputPerMillion: 10.00},
"gemini-2.5-flash": {InputPerMillion: 0.15, OutputPerMillion: 0.60},
"gemini-2.0-flash": {InputPerMillion: 0.10, OutputPerMillion: 0.40},
}
DefaultPrices is the built-in price table for known models. Cost is 0 for models not present in the map. Prices reflect standard API rates as of mid-2025; override with WithPriceTable if needed.
Functions ¶
func CalculateCost ¶ added in v0.4.1
func CalculateCost(model string, usage *Usage, prices map[string]ModelPrice) float64
CalculateCost returns the estimated USD cost for a request given token usage and a price table. Cached tokens are billed at CacheReadPerMillion; uncached prompt tokens at InputPerMillion. Returns 0 if the model is not in the price table or usage is nil.
func IsRateLimited ¶
IsRateLimited returns true if the error indicates rate limiting
func IsRetryable ¶
IsRetryable returns true if the error is retryable
Types ¶
type CacheControl ¶
type CacheControl struct {
Type string `json:"type"` // "ephemeral"
}
CacheControl marks a content block for provider-level prompt caching. Only "ephemeral" is currently supported. OpenAI and Gemini cache automatically and ignore this field; set it only when targeting Anthropic.
type Choice ¶
type Choice struct {
Index int `json:"index"`
Message *Message `json:"message,omitempty"`
Delta *Delta `json:"delta,omitempty"`
FinishReason string `json:"finish_reason,omitempty"`
}
Choice represents a completion choice
type ContentPart ¶
type ContentPart struct {
Type string `json:"type"` // "text", "image_url", or "document"
Text string `json:"text,omitempty"`
ImageURL *ImageURL `json:"image_url,omitempty"`
Document *Document `json:"document,omitempty"`
CacheControl *CacheControl `json:"cache_control,omitempty"`
}
ContentPart represents a part of a multimodal message
type Delta ¶
type Delta struct {
Role Role `json:"role,omitempty"`
Content string `json:"content,omitempty"`
ToolCalls []ToolCall `json:"tool_calls,omitempty"`
}
Delta represents streaming content delta
type Document ¶
type Document struct {
Base64 string `json:"base64"`
MediaType string `json:"media_type"` // e.g. "application/pdf"
}
Document represents a document (PDF, etc.) for providers that support it natively
type FuncRef ¶
type FuncRef struct {
Name string `json:"name"`
}
FuncRef references a specific function
type Function ¶
type Function struct {
Name string `json:"name"`
Description string `json:"description,omitempty"`
Parameters json.RawMessage `json:"parameters,omitempty"`
}
Function represents a function definition
type ImageURL ¶
type ImageURL struct {
URL string `json:"url"`
Detail string `json:"detail,omitempty"`
Base64 string `json:"base64,omitempty"`
MediaType string `json:"media_type,omitempty"`
}
ImageURL represents an image reference with both URL and base64 forms
type Message ¶
type Message struct {
Role Role `json:"role"`
Content string `json:"content"`
ContentParts []ContentPart `json:"content_parts,omitempty"`
Name string `json:"name,omitempty"`
ToolCalls []ToolCall `json:"tool_calls,omitempty"`
ToolCallID string `json:"tool_call_id,omitempty"`
// CacheControl marks this message's content for prompt caching (Anthropic only).
// For user messages with ContentParts, set CacheControl on individual parts instead.
CacheControl *CacheControl `json:"cache_control,omitempty"`
}
Message represents a chat message
type MiddlewareFunc ¶ added in v0.4.0
MiddlewareFunc wraps a Provider with additional functionality. It is a plain function type; any func(Provider) Provider satisfies it directly.
type ModelPrice ¶ added in v0.4.1
type ModelPrice struct {
InputPerMillion float64 // USD per million input (prompt) tokens
OutputPerMillion float64 // USD per million output (completion) tokens
CacheReadPerMillion float64 // USD per million cache-read tokens; 0 if not applicable
}
ModelPrice holds the per-token USD pricing for a model.
type Option ¶
type Option func(*Router)
Option configures the Router
func WithFallback ¶
WithFallback sets fallback providers in priority order
func WithMiddleware ¶
func WithMiddleware(m ...MiddlewareFunc) Option
WithMiddleware adds middleware to the processing chain. Use this with middleware from the middleware package:
import "github.com/bluefunda/llmrouter/middleware"
router := llmrouter.New(
llmrouter.WithMiddleware(
middleware.Retry(3, time.Second),
middleware.Timeout(60*time.Second),
),
)
func WithModelMapping ¶
WithModelMapping maps a model to a specific provider
func WithPriceTable ¶ added in v0.4.1
func WithPriceTable(prices map[string]ModelPrice) Option
WithPriceTable replaces the default price table used for cost calculation. Callers can start from DefaultPrices and extend it, or supply a fully custom map.
func WithProvider ¶
WithProvider registers a provider with the router
type Provider ¶
type Provider interface {
// Name returns the provider identifier (e.g., "openai", "anthropic")
Name() string
// Models returns the list of supported model IDs
Models() []string
// Complete performs a non-streaming completion
Complete(ctx context.Context, req *Request) (*Response, error)
// Stream performs a streaming completion
Stream(ctx context.Context, req *Request) (*StreamResult, error)
}
Provider is the core interface that all LLM providers must implement.
type ProviderConfig ¶
type ProviderConfig struct {
Name string
APIKey string
BaseURL string
Model string
Models []string
Timeout time.Duration
CustomHeaders map[string]string // custom HTTP headers (e.g. api-subscription-key)
// StringContentOnly forces message content to be sent as plain strings
// instead of structured arrays. Required for some OpenAI-compatible APIs
// (e.g. Sarvam) that don't support the array content format.
StringContentOnly bool
}
ProviderConfig holds common configuration for providers
type Request ¶
type Request struct {
Messages []Message `json:"messages"`
Model string `json:"model,omitempty"`
Tools []Tool `json:"tools,omitempty"`
ToolChoice *ToolChoice `json:"tool_choice,omitempty"`
Temperature *float64 `json:"temperature,omitempty"`
MaxTokens *int `json:"max_tokens,omitempty"`
TopP *float64 `json:"top_p,omitempty"`
Stop []string `json:"stop,omitempty"`
Metadata map[string]any `json:"metadata,omitempty"`
}
Request represents a unified LLM request
type Response ¶
type Response struct {
ID string `json:"id"`
Object string `json:"object"`
Created int64 `json:"created"`
Model string `json:"model"`
Choices []Choice `json:"choices"`
Usage *Usage `json:"usage,omitempty"`
Provider string `json:"provider"`
}
Response represents a unified LLM response (OpenAI-compatible)
type Router ¶
type Router struct {
// contains filtered or unexported fields
}
Router manages multiple LLM providers and routes requests
func (*Router) AddMiddleware ¶
func (r *Router) AddMiddleware(m MiddlewareFunc)
AddMiddleware adds middleware to the router
func (*Router) Close ¶ added in v0.4.0
Close releases resources held by registered providers that implement io.Closer. Call this when the router is no longer needed (e.g. on application shutdown).
func (*Router) GetProvider ¶
GetProvider returns a provider by name
func (*Router) RegisterProvider ¶
RegisterProvider adds a provider to the router
func (*Router) SetFallbacks ¶
SetFallbacks sets the fallback provider order
type StreamResult ¶ added in v0.3.1
type StreamResult struct {
// contains filtered or unexported fields
}
StreamResult is the iterator type for streaming LLM responses. Usage:
stream, err := router.Stream(ctx, req)
if err != nil { return err }
defer stream.Close()
for stream.Next() {
event := stream.Event()
// handle event
}
if err := stream.Err(); err != nil { ... }
func NewStreamResult ¶ added in v0.3.1
func NewStreamResult(ch <-chan Event) *StreamResult
NewStreamResult creates a StreamResult from an event channel. Providers and middleware use this to construct a StreamResult.
func (*StreamResult) Close ¶ added in v0.3.1
func (s *StreamResult) Close() error
Close stops the stream and releases resources. Safe to call multiple times.
func (*StreamResult) Err ¶ added in v0.3.1
func (s *StreamResult) Err() error
Err returns the streaming error, if any. Check after Next returns false.
func (*StreamResult) Event ¶ added in v0.3.1
func (s *StreamResult) Event() Event
Event returns the current event (valid after Next returns true).
func (*StreamResult) Next ¶ added in v0.3.1
func (s *StreamResult) Next() bool
Next advances to the next event. Returns false when the stream ends or an error occurs.
func (*StreamResult) OnClose ¶ added in v0.3.1
func (s *StreamResult) OnClose(fn func() error)
OnClose registers a function called when Close is invoked (e.g. context cancel).
type ToolCall ¶
type ToolCall struct {
ID string `json:"id"`
Type string `json:"type"`
Function FuncCall `json:"function"`
Index *int `json:"index,omitempty"`
}
ToolCall represents a tool invocation
type ToolChoice ¶
type ToolChoice struct {
Type string `json:"type,omitempty"`
Function *FuncRef `json:"function,omitempty"`
}
ToolChoice controls tool selection
type Usage ¶
type Usage struct {
PromptTokens int `json:"prompt_tokens"`
CompletionTokens int `json:"completion_tokens"`
TotalTokens int `json:"total_tokens"`
CachedPromptTokens int `json:"cached_prompt_tokens,omitempty"` // tokens served from cache (all providers)
CacheCreationTokens int `json:"cache_creation_tokens,omitempty"` // tokens written to cache (Anthropic only)
Cost float64 `json:"cost_usd,omitempty"` // estimated USD cost; 0 if model not in price table
}
Usage represents token usage
func (*Usage) CacheHitRate ¶ added in v0.4.1
CacheHitRate returns the fraction of prompt tokens served from cache (0–1). Returns 0 if no prompt tokens were recorded.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
examples
|
|
|
fallback
command
|
|
|
simple
command
|
|
|
streaming
command
|
|
|
tools
command
|
|
|
Package middleware provides composable cross-cutting concerns for LLM provider calls: retry with exponential backoff, per-request timeouts, and a circuit breaker to prevent cascading failures.
|
Package middleware provides composable cross-cutting concerns for LLM provider calls: retry with exponential backoff, per-request timeouts, and a circuit breaker to prevent cascading failures. |
|
providers
|
|
|
anthropic
Package anthropic implements the llmrouter.Provider interface for Anthropic Claude models using the official Anthropic Go SDK.
|
Package anthropic implements the llmrouter.Provider interface for Anthropic Claude models using the official Anthropic Go SDK. |
|
gemini
Package gemini implements the llmrouter.Provider interface for Google Gemini models using the official Google Generative AI Go SDK.
|
Package gemini implements the llmrouter.Provider interface for Google Gemini models using the official Google Generative AI Go SDK. |
|
openai
Package openai implements the llmrouter.Provider interface for OpenAI and any OpenAI-compatible API.
|
Package openai implements the llmrouter.Provider interface for OpenAI and any OpenAI-compatible API. |