llm

package module
v0.21.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 21, 2026 License: MIT Imports: 9 Imported by: 0

README

LLM Provider Abstraction Library

A unified Go library for interacting with multiple LLM providers through a consistent interface. Supports streaming responses, tool calling, and automatic provider registration.

Features

  • Unified Provider Interface - Single API for multiple LLM providers
  • Streaming Support - Channel-based streaming for all providers
  • Tool Calling - Consistent tool/function calling across providers
  • Context Cancellation - Proper cancellation support for long-running streams
  • Registry Pattern - Automatic provider discovery with provider/model format
  • Production Ready - Race-free, tested with comprehensive integration tests

Supported Providers

Provider Name Description
Anthropic API anthropic Direct Anthropic API with API key
Claude OAuth claude OAuth-based Claude access (auto-detects local credentials)
OpenAI openai OpenAI GPT models (GPT-4, GPT-4o, etc.)
AWS Bedrock bedrock AWS Bedrock models (Claude, Llama, etc.)
Ollama ollama Local Ollama models (11 curated defaults)
OpenRouter openrouter 229 tool-enabled models via OpenRouter proxy
Aggregate aggregate Combines multiple providers with failover and aliases

Installation

go get github.com/codewandler/llm

Quick Start

Using the Default Registry

The simplest way to use the library is with the default registry:

package main

import (
    "context"
    "fmt"
    "os"

    "github.com/codewandler/llm"
    "github.com/codewandler/llm/provider"
)

func main() {
    // Set environment variables for providers
    os.Setenv("OPENAI_KEY", "your-api-key")
    os.Setenv("OPENROUTER_API_KEY", "your-api-key")
    
    ctx := context.Background()
    
    // Create a stream using provider/model format
    events, err := provider.CreateStream(ctx, llm.StreamOptions{
        Model: "ollama/glm-4.7-flash",
        Messages: llm.Messages{
            &llm.UserMsg{Content: "What is the capital of France?"},
        },
    })
    if err != nil {
        panic(err)
    }
    
    // Process streaming response
    for event := range events {
        switch event.Type {
        case llm.StreamEventDelta:
            fmt.Print(event.Delta)
        case llm.StreamEventDone:
            fmt.Println("\nDone!")
            if event.Usage != nil {
                fmt.Printf("Tokens: %d input, %d output\n", 
                    event.Usage.InputTokens, event.Usage.OutputTokens)
            }
        case llm.StreamEventError:
            fmt.Printf("Error: %v\n", event.Error)
        }
    }
}
Creating a Custom Registry

For more control, create your own registry:

import (
    "github.com/codewandler/llm/provider"
    "github.com/codewandler/llm/provider/ollama"
    "github.com/codewandler/llm/provider/openai"
    "github.com/codewandler/llm/provider/openrouter"
)

// Create empty registry
reg := llm.NewRegistry()

// Register specific providers
reg.Register(ollama.New("http://localhost:11434"))
reg.Register(openai.New("your-api-key"))
reg.Register(openrouter.New("your-api-key"))

// Use the registry
events, err := reg.CreateStream(ctx, llm.StreamOptions{
    Model: "openrouter/anthropic/claude-sonnet-4.5",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "Hello!"},
    },
})

Provider-Specific Usage

Anthropic API (Direct)

Direct API access with API key:

import "github.com/codewandler/llm/provider/anthropic"

provider := anthropic.New(llm.WithAPIKey("your-api-key"))

events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "claude-sonnet-4-6",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "Hello!"},
    },
})
Claude OAuth Provider

OAuth-based access with automatic token refresh. By default, auto-detects credentials from your local Claude installation (~/.claude/.credentials.json):

import "github.com/codewandler/llm/provider/anthropic/claude"

// Auto-detect local Claude credentials (default)
provider := claude.New()

// Or with explicit token provider
provider := claude.New(
    claude.WithManagedTokenProvider("my-key", tokenStore, nil),
)

events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "claude-sonnet-4-6",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "Hello!"},
    },
})

Token management interfaces:

  • TokenStore - Stores and retrieves tokens (implement for your storage backend)
  • LocalTokenStore - Reads from ~/.claude/.credentials.json
  • ManagedTokenProvider - Wraps a TokenStore with automatic refresh
OpenAI

Access OpenAI models including GPT-5, GPT-4o, and reasoning models:

import "github.com/codewandler/llm/provider/openai"

provider := openai.New("your-api-key")

events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "gpt-4o-mini",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "Hello!"},
    },
})

Popular OpenAI models:

  • GPT-5 series - gpt-5, gpt-5.2, gpt-5.2-pro, gpt-5-mini, gpt-5-nano
  • GPT-4.1 series - gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
  • GPT-4o series - gpt-4o, gpt-4o-mini (default), gpt-4-turbo
  • Reasoning models - o3, o3-mini, o3-pro, o1, o1-pro
  • Specialized - gpt-5.1-codex, gpt-5.2-codex (code generation)
Ollama (Local Models)
import "github.com/codewandler/llm/provider/ollama"

provider := ollama.New("http://localhost:11434")

// Download a model if needed
if err := provider.Download(ctx, "llama3.2:1b"); err != nil {
    // Handle error
}

// Use the model
events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "llama3.2:1b",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "Hello!"},
    },
})

Curated Ollama Models (all tested with tool calling):

  • glm-4.7-flash (default)
  • ministral-3:8b
  • rnj-1
  • functiongemma
  • devstral-small-2
  • nemotron-3-nano:30b
  • llama3.2:1b, qwen3:1.7b, qwen3:0.6b, granite3.1-moe:1b, qwen2.5:0.5b
AWS Bedrock

Access AWS Bedrock models with AWS credentials:

import "github.com/codewandler/llm/provider/bedrock"

// Uses default AWS credential chain (env vars, ~/.aws/credentials, IAM role)
provider := bedrock.New()

// Or with explicit region
provider := bedrock.New(bedrock.WithRegion("us-east-1"))

events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "anthropic.claude-3-5-sonnet-20241022-v2:0",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "Hello!"},
    },
})

Supported Bedrock models include Claude, Llama, Mistral, and other models available in your AWS region.

OpenRouter (Multi-Provider Proxy)

Access 229 tool-enabled models:

import "github.com/codewandler/llm/provider/openrouter"

provider := openrouter.New("your-api-key")

events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "anthropic/claude-sonnet-4.5",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "Hello!"},
    },
})

Popular OpenRouter models:

  • anthropic/claude-sonnet-4.5
  • google/gemini-2.0-flash-001
  • openai/gpt-4-turbo
  • meta-llama/llama-3.1-70b-instruct

See provider/openrouter/README.md for full model list.

Aggregate Provider

Combine multiple providers with failover routing and model aliases:

import "github.com/codewandler/llm/provider/aggregate"

cfg := aggregate.Config{
    Name: "my-aggregate",
    Providers: []aggregate.ProviderInstanceConfig{
        {Name: "primary", Type: "anthropic"},
        {Name: "fallback", Type: "openai"},
    },
    Aliases: map[string][]aggregate.AliasTarget{
        "fast":     {{Provider: "primary", Model: "claude-haiku-4-5"}},
        "default":  {{Provider: "primary", Model: "claude-sonnet-4-6"}},
        "powerful": {{Provider: "primary", Model: "claude-opus-4-6"}},
    },
}

provider, _ := aggregate.New(cfg, factories)

// Use aliases instead of full model names
events, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "default",  // Resolves to claude-sonnet-4-6
    Messages: messages,
})

Standard aliases:

  • fast - Fastest/cheapest model (e.g., Haiku)
  • default - Balanced performance (e.g., Sonnet)
  • powerful - Most capable model (e.g., Opus)

The aggregate provider tries each target in order until one succeeds, providing automatic failover across accounts or providers.

Tool Calling

All providers support tool/function calling with automatic tool call ID tracking.

The best way to work with tools is using ToolSpec and ToolSet, which provide:

  • Automatic JSON Schema generation from Go structs
  • Runtime validation of tool arguments
  • Type-safe parameter access via generics
  • Clean type-switch dispatch
// 1. Define parameter structs
type GetWeatherParams struct {
    Location string `json:"location" jsonschema:"description=City name,required"`
    Unit     string `json:"unit" jsonschema:"description=Temperature unit,enum=celsius,enum=fahrenheit"`
}

type SearchParams struct {
    Query string `json:"query" jsonschema:"description=Search query,required"`
    Limit int    `json:"limit" jsonschema:"description=Max results,minimum=1,maximum=100"`
}

// 2. Create ToolSet
tools := llm.NewToolSet(
    llm.NewToolSpec[GetWeatherParams]("get_weather", "Get weather for a location"),
    llm.NewToolSpec[SearchParams]("search", "Search the web"),
)

// 3. Send to LLM
stream, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:    "openrouter/moonshotai/kimi-k2-0905",
    Messages: messages,
    Tools:    tools.Definitions(),  // Returns []ToolDefinition
})

// 4. Collect tool calls from stream
var rawCalls []llm.ToolCall
for event := range stream {
    if event.Type == llm.StreamEventToolCall {
        rawCalls = append(rawCalls, *event.ToolCall)
    }
}

// 5. Parse with validation
calls, err := tools.Parse(rawCalls)
if err != nil {
    log.Printf("parse warnings: %v", err)  // Non-fatal: you still get valid calls
}

// 6. Type-safe dispatch
for _, call := range calls {
    switch c := call.(type) {
    case *llm.TypedToolCall[GetWeatherParams]:
        // c.Params is strongly typed!
        fmt.Printf("Weather for: %s (unit: %s)\n", c.Params.Location, c.Params.Unit)
        result := getWeather(c.Params.Location, c.Params.Unit)
        
        // Send result back
        messages = append(messages,
            &llm.AssistantMsg{ToolCalls: []llm.ToolCall{{
                ID: c.ID, Name: c.Name, Arguments: map[string]any{
                    "location": c.Params.Location,
                    "unit": c.Params.Unit,
                },
            }}},
            &llm.ToolCallResult{ToolCallID: c.ID, Output: result},
        )
        
    case *llm.TypedToolCall[SearchParams]:
        fmt.Printf("Search: %s (limit: %d)\n", c.Params.Query, c.Params.Limit)
        // ... handle search
    }
}

Benefits:

  • Arguments are validated against JSON Schema (required fields, types, enums, ranges)
  • Type-safe access: c.Params.Location instead of c.Arguments["location"].(string)
  • Compile-time checking of parameter struct fields
  • Parse errors are non-fatal - you get all successfully parsed calls
Quick Example (Type-Safe with Generics)

The recommended way is using ToolDefinitionFor[T]() which generates JSON Schema from Go structs:

// Define parameter struct with struct tags
type GetWeatherParams struct {
    Location string `json:"location" jsonschema:"description=City name or coordinates,required"`
    Unit     string `json:"unit" jsonschema:"description=Temperature unit,enum=celsius,enum=fahrenheit"`
}

// Create tool definition from struct
tools := []llm.ToolDefinition{
    llm.ToolDefinitionFor[GetWeatherParams]("get_weather", "Get current weather for a location"),
}

// Step 1: Send initial request with tools
events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model:    "ollama/glm-4.7-flash",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "What's the weather in Paris?"},
    },
    Tools: tools,
})

// Step 2: Process tool calls
var toolCall *llm.ToolCall
for event := range events {
    if event.Type == llm.StreamEventToolCall {
        toolCall = event.ToolCall
        // Arguments are automatically parsed into map[string]any
        fmt.Printf("Tool: %s\n", toolCall.Name)
        fmt.Printf("Location: %s\n", toolCall.Arguments["location"])
    }
}

// Step 3: Execute the tool
result := fmt.Sprintf(`{"temp": 22, "conditions": "sunny"}`)

// Step 4: Send tool result back
events2, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "ollama/glm-4.7-flash",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "What's the weather in Paris?"},
        &llm.AssistantMsg{ToolCalls: []llm.ToolCall{*toolCall}},
        &llm.ToolCallResult{ToolCallID: toolCall.ID, Output: result},
    },
    Tools: tools,
})

// Step 5: Get final response
for event := range events2 {
    if event.Type == llm.StreamEventDelta {
        fmt.Print(event.Delta)
    }
}
Struct Tag Reference

The ToolDefinitionFor[T]() function uses these tags:

  • json:"fieldName" - Parameter name (required)
  • jsonschema:"description=..." - Parameter description
  • jsonschema:"required" - Mark parameter as required
  • jsonschema:"enum=val1,enum=val2" - Restrict to specific values
  • jsonschema:"minimum=1,maximum=10" - Numeric constraints
  • jsonschema:"pattern=^[a-z]+$" - String pattern (regex)
Manual Tool Definition

You can also define tools manually:

tools := []llm.ToolDefinition{
    {
        Name:        "get_weather",
        Description: "Get current weather for a location",
        Parameters: map[string]any{
            "type": "object",
            "properties": map[string]any{
                "location": map[string]any{
                    "type":        "string",
                    "description": "City name",
                },
            },
            "required": []string{"location"},
        },
    },
}

Important: Tool result messages must include ToolCallID to link them to the original tool call.

Tool Choice

Control whether and which tools the model should call using ToolChoice:

// Let the model decide (default behavior)
stream, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:      "openai/gpt-4o",
    Messages:   messages,
    Tools:      tools,
    ToolChoice: llm.ToolChoiceAuto{},  // or nil for the same behavior
})

// Force the model to call at least one tool
stream, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:      "openai/gpt-4o",
    Messages:   messages,
    Tools:      tools,
    ToolChoice: llm.ToolChoiceRequired{},
})

// Force the model to call a specific tool
stream, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:      "openai/gpt-4o",
    Messages:   messages,
    Tools:      tools,
    ToolChoice: llm.ToolChoiceTool{Name: "get_weather"},
})

// Prevent the model from calling any tools
stream, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:      "openai/gpt-4o",
    Messages:   messages,
    Tools:      tools,
    ToolChoice: llm.ToolChoiceNone{},
})
ToolChoice Types
Type Description OpenAI Anthropic Ollama
nil / ToolChoiceAuto{} Model decides "auto" {"type":"auto"} (ignored)
ToolChoiceRequired{} Must call ≥1 tool "required" {"type":"any"} (ignored)
ToolChoiceNone{} Cannot call tools "none" (omitted) (ignored)
ToolChoiceTool{Name:"X"} Must call tool "X" {"type":"function",...} {"type":"tool","name":"X"} (ignored)

Note: Ollama does not support tool_choice. All ToolChoice settings are silently ignored and treated as auto behavior.

Validation

The library validates ToolChoice at request time:

  • ToolChoice cannot be set without Tools
  • ToolChoiceTool{Name: "X"} must reference an existing tool in Tools
opts := llm.StreamOptions{
    Model:      "gpt-4o",
    Messages:   messages,
    Tools:      tools,
    ToolChoice: llm.ToolChoiceTool{Name: "unknown_tool"},
}
err := opts.Validate()  // Error: ToolChoiceTool references unknown tool "unknown_tool"

Reasoning Effort

Control how many reasoning tokens OpenAI models generate before producing a response. Lower reasoning effort means faster responses and fewer tokens used.

// Use low reasoning for faster responses
stream, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:           "openai/gpt-5",
    Messages:        messages,
    ReasoningEffort: llm.ReasoningEffortLow,
})

// Use high reasoning for complex tasks
stream, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:           "openai/o3",
    Messages:        messages,
    ReasoningEffort: llm.ReasoningEffortHigh,
})

// Disable reasoning entirely (GPT-5.1+ only)
stream, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:           "openai/gpt-5.1",
    Messages:        messages,
    ReasoningEffort: llm.ReasoningEffortNone,
})
ReasoningEffort Values
Value Constant Description
"none" ReasoningEffortNone No reasoning (GPT-5.1+ only)
"minimal" ReasoningEffortMinimal Minimal reasoning (pre-5.1 models only)
"low" ReasoningEffortLow Low reasoning
"medium" ReasoningEffortMedium Medium reasoning (OpenAI API default for pre-5.1)
"high" ReasoningEffortHigh High reasoning
"xhigh" ReasoningEffortXHigh Maximum reasoning (codex-max+ only)
Model-Specific Support

The OpenAI provider maps ReasoningEffort values to valid API values per model:

Model Category Supported Values Default Notes
Non-reasoning (gpt-4o, gpt-4, gpt-3.5) N/A N/A Parameter ignored
Pre-5.1 reasoning (gpt-5, o1, o3) minimal, low, medium, high medium none not supported
gpt-5.1 none, low, medium, high none minimal mapped to low
Pro models (gpt-5-pro, o3-pro) high only high Other values error
Codex models (gpt-5.1-codex+) none, low, medium, high, xhigh varies minimal mapped to low
Provider Support
Provider Behavior
OpenAI Model-specific mapping with validation (see above)
OpenRouter Passed through if specified, no default
Anthropic Ignored (uses different thinking.budget_tokens approach)
Ollama Ignored

Note: If not specified, the parameter is omitted and the OpenAI API uses its default for the model.

Prompt Caching

Prompt caching reduces cost and latency on repeated requests by reusing previously processed input tokens. Behaviour varies by provider — for most use cases, set a CacheHint on StreamOptions or on individual messages and the library handles the rest.

Quick Start
import "github.com/codewandler/llm"

// Enable automatic caching for the entire conversation prefix.
// Works for Anthropic, Bedrock (Claude), and OpenAI (always automatic).
events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "anthropic/claude-sonnet-4-6",
    Messages: llm.Messages{
        &llm.SystemMsg{Content: largeSystemPrompt},
        &llm.UserMsg{Content: "Hello!"},
    },
    CacheHint: &llm.CacheHint{Enabled: true},
})

On the first call, the provider writes the prompt prefix to cache (CacheWriteTokens > 0). On subsequent calls within the TTL window with the same prefix, the provider reads from cache (CachedTokens > 0, cost and latency drop significantly).

Inspect cache usage via the StreamEventDone event:

for event := range events {
    if event.Type == llm.StreamEventDone && event.Usage != nil {
        fmt.Printf("cached read:  %d tokens\n", event.Usage.CachedTokens)
        fmt.Printf("cached write: %d tokens\n", event.Usage.CacheWriteTokens)
        fmt.Printf("cost:         $%.6f\n",     event.Usage.Cost)
    }
}
Controlling Cache TTL

The default TTL is 5 minutes (refreshed on each cache hit, at no extra cost). For workloads with longer processing times, request a 1-hour TTL:

events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model:     "anthropic/claude-sonnet-4-6",
    Messages:  messages,
    CacheHint: &llm.CacheHint{Enabled: true, TTL: "1h"},
})

⚠️ 1-hour TTL is only available on Claude Haiku 4.5, Sonnet 4.5, and Opus 4.5 (Anthropic direct and Bedrock). For other models the TTL: "1h" hint silently falls back to the default 5-minute TTL.

Fine-Grained Cache Breakpoints (Advanced)

For requests with multiple sections that change at different rates — e.g. static tool definitions and a growing conversation — attach a CacheHint directly to individual messages. The provider caches everything up to each marked block (up to 4 breakpoints per request on Anthropic and Bedrock).

events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "anthropic/claude-sonnet-4-6",
    Messages: llm.Messages{
        // Cache the large static system prompt at this breakpoint
        &llm.SystemMsg{
            Content:   largeSystemPrompt,
            CacheHint: &llm.CacheHint{Enabled: true},
        },
        &llm.UserMsg{Content: "Turn 1"},
        &llm.AssistantMsg{Content: "Response 1"},
        // Also cache up to the last user turn
        &llm.UserMsg{
            Content:   "Turn 2",
            CacheHint: &llm.CacheHint{Enabled: true},
        },
    },
})

Per-message hints and StreamOptions.CacheHint are mutually exclusive: if any message carries a CacheHint, the top-level field is ignored.

Provider Support Summary
Provider Mode Annotation required TTL options
Anthropic (direct) Explicit breakpoints CacheHint on messages or StreamOptions "5m" (default), "1h" (selected models)
Bedrock (Claude) Explicit breakpoints CacheHint on messages or StreamOptions "5m" (default), "1h" (selected models)
OpenAI Fully automatic None (always active) "in_memory" default, "1h" via CacheHint{TTL: "1h"}
Claude OAuth Same as Anthropic Same as Anthropic Same as Anthropic
Ollama / OpenRouter Not supported Ignored
Minimum Token Threshold

Providers only cache prompts above a minimum token count:

Provider Minimum
Anthropic direct 1,024 tokens
Bedrock (Claude) 2,048 tokens (varies by model)
OpenAI 1,024 tokens

Cache hints on smaller prompts are silently ignored — no error is returned. The CacheWriteTokens and CachedTokens fields in Usage will be 0.

Pricing

Cache reads are significantly cheaper than regular input tokens:

Provider Cache write (relative) Cache read (relative)
Anthropic 1.25× input price 0.1× input price
Bedrock (Claude) 1.25× input price 0.1× input price
OpenAI 1× input price (first call) 0.5× input price

Usage.Cost in the StreamEventDone event accounts for cache read and write pricing automatically.

Multi-Turn Conversations

Build conversations by appending messages:

messages := llm.Messages{
    &llm.UserMsg{Content: "Hello!"},
}

// First turn
events, _ := provider.CreateStream(ctx, llm.StreamOptions{
    Model:    "ollama/glm-4.7-flash",
    Messages: messages,
})

var response string
for event := range events {
    if event.Type == llm.StreamEventDelta {
        response += event.Delta
    }
}

// Add assistant response to history
messages = append(messages, &llm.AssistantMsg{Content: response})

// Second turn
messages = append(messages, &llm.UserMsg{Content: "Tell me more about that"})

events, _ = provider.CreateStream(ctx, llm.StreamOptions{
    Model:    "ollama/glm-4.7-flash",
    Messages: messages,
})

Context Cancellation

All stream parsers support context cancellation:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

events, err := provider.CreateStream(ctx, llm.StreamOptions{
    Model: "ollama/glm-4.7-flash",
    Messages: llm.Messages{
        &llm.UserMsg{Content: "Write a very long essay"},
    },
})

for event := range events {
    if event.Type == llm.StreamEventError {
        if errors.Is(event.Error, context.DeadlineExceeded) {
            fmt.Println("Request timed out")
        }
    }
}

Environment Variables

Configure providers via environment variables:

# Anthropic (API key)
export ANTHROPIC_API_KEY="your-api-key"

# OpenAI
export OPENAI_KEY="your-api-key"

# OpenRouter
export OPENROUTER_API_KEY="your-api-key"

# Ollama (optional, defaults to http://localhost:11434)
export OLLAMA_BASE_URL="http://localhost:11434"

# AWS Bedrock (uses standard AWS credential chain)
export AWS_REGION="us-east-1"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"

Note: The Claude OAuth provider auto-detects credentials from ~/.claude/.credentials.json (created by Claude Code CLI).

Model Reference Format

Use the provider/model format with the registry:

anthropic/claude-sonnet-4-6           # Direct Anthropic API
claude/claude-sonnet-4-6              # Claude OAuth provider
openai/gpt-4o                         # OpenAI
openai/gpt-4o-mini                    # OpenAI
bedrock/anthropic.claude-3-5-sonnet   # AWS Bedrock
ollama/glm-4.7-flash                  # Local Ollama
ollama/llama3.2:1b                    # Local Ollama
openrouter/anthropic/claude-sonnet-4.5  # OpenRouter proxy
openrouter/google/gemini-2.0-flash-001  # OpenRouter proxy

Stream Event Types

type StreamEvent struct {
    Type     StreamEventType
    Start    *StreamStart     // For StreamEventStart
    Delta    string           // For StreamEventDelta
    ToolCall *ToolCall        // For StreamEventToolCall
    Usage    *Usage           // For StreamEventDone
    Error    error            // For StreamEventError
}

// Event types
const (
    StreamEventStart    // Stream metadata (first event)
    StreamEventDelta    // Text delta from model
    StreamEventToolCall // Tool call request
    StreamEventDone     // Stream complete (includes usage)
    StreamEventError    // Error occurred
)
Stream Start Metadata

The StreamEventStart event is emitted first and contains request metadata:

type StreamStart struct {
    RequestID        string        // Provider request ID (e.g., "msg_01XFDUDYJgAACzvnptvVoYEL")
    RequestedModel   string        // Model requested by caller
    ResolvedModel    string        // Model after alias resolution
    ProviderModel    string        // Actual model from API response
    TimeToFirstToken time.Duration // Time until first content token
}

Usage:

for event := range stream {
    switch event.Type {
    case llm.StreamEventStart:
        fmt.Printf("Request ID: %s\n", event.Start.RequestID)
        fmt.Printf("Model: %s -> %s\n", event.Start.RequestedModel, event.Start.ProviderModel)
    case llm.StreamEventDelta:
        fmt.Print(event.Delta)
    }
}
Usage Information

The Usage struct provides token counts and detailed breakdown:

type Usage struct {
    InputTokens     int     // Prompt tokens
    OutputTokens    int     // Completion tokens
    TotalTokens     int     // Total tokens
    Cost            float64 // Cost in USD (Anthropic, OpenRouter)

    // Detailed breakdown (provider-specific, may be zero)
    CachedTokens    int // Prompt tokens served from cache
    ReasoningTokens int // Tokens used for model reasoning
}

Usage in streaming:

for event := range stream {
    if event.Type == llm.StreamEventDone && event.Usage != nil {
        fmt.Printf("Tokens: %d in, %d out\n",
            event.Usage.InputTokens, event.Usage.OutputTokens)

        if event.Usage.CachedTokens > 0 {
            fmt.Printf("Cache hit: %d tokens\n", event.Usage.CachedTokens)
        }
        if event.Usage.ReasoningTokens > 0 {
            fmt.Printf("Reasoning: %d tokens\n", event.Usage.ReasoningTokens)
        }
    }
}
Field Description Providers
InputTokens Prompt tokens All
OutputTokens Completion tokens All
TotalTokens Total tokens All
Cost Cost in USD Anthropic (calculated), OpenRouter
CachedTokens Tokens served from prompt cache OpenAI, OpenRouter
ReasoningTokens Tokens used for reasoning OpenAI, OpenRouter (reasoning models)

Error Handling

events, err := provider.CreateStream(ctx, opts)
if err != nil {
    // Initial request failed (invalid params, auth error, etc.)
    return fmt.Errorf("create stream: %w", err)
}

for event := range events {
    if event.Type == llm.StreamEventError {
        // Stream error (network issue, parse error, etc.)
        return fmt.Errorf("stream error: %w", event.Error)
    }
}

Testing

The library includes comprehensive tests:

# Run all tests
go test ./...

# Run with race detector
go test -race ./...

# Run integration tests (requires providers)
go test -v ./... -run TestProviders

# Run Ollama compatibility test
go test -v ./... -run TestOllamaModels

Architecture

llm/
├── api.go              # Core types: Message, Model, Role, ToolCall
├── provider.go         # Provider interface, StreamEvent, StreamOptions
├── registry.go         # Provider registry, model resolution
├── tool.go             # ToolDefinition
│
├── provider/
│   ├── register.go     # Default registry with env-based config
│   ├── aggregate/      # Multi-provider aggregation with failover
│   ├── anthropic/      # Direct Anthropic API
│   │   └── claude/     # OAuth-based Claude provider
│   ├── bedrock/        # AWS Bedrock
│   ├── openai/         # OpenAI API
│   ├── ollama/         # Local Ollama integration
│   ├── openrouter/     # OpenRouter proxy (229 models)
│   └── fake/           # Test provider
│
└── cmd/llmcli/         # CLI tool for testing and OAuth management

CLI Tool

The llmcli tool provides quick testing and OAuth credential management:

# Check auth status (uses ~/.claude/.credentials.json)
go run ./cmd/llmcli auth status

# Quick inference
go run ./cmd/llmcli infer "Hello, how are you?"

# Verbose output with model info, tokens, cost, and timing
go run ./cmd/llmcli infer -v -m default "Explain Go channels"

# Model aliases: fast (haiku), default (sonnet), powerful (opus)
go run ./cmd/llmcli infer -m powerful "Complex analysis task"

Contributing

Contributions welcome! Please ensure:

  • All tests pass: go test ./...
  • No race conditions: go test -race ./...
  • Code is formatted: go fmt ./...
  • Follow existing patterns (see AGENTS.md)

License

MIT License - see LICENSE file for details

See Also

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNotFound   = errors.New("not found")
	ErrBadRequest = errors.New("bad request")
)

Common errors

Functions

This section is empty.

Types

type AssistantMsg added in v0.5.0

type AssistantMsg struct {
	Content   string
	ToolCalls []ToolCall
	CacheHint *CacheHint
}

AssistantMsg contains an assistant response, optionally with tool calls.

func (*AssistantMsg) MarshalJSON added in v0.5.0

func (m *AssistantMsg) MarshalJSON() ([]byte, error)

func (*AssistantMsg) Role added in v0.5.0

func (m *AssistantMsg) Role() Role

func (*AssistantMsg) Validate added in v0.5.0

func (m *AssistantMsg) Validate() error

type CacheHint added in v0.20.0

type CacheHint struct {
	// Enabled marks this content as a cache breakpoint candidate.
	// For Anthropic/Bedrock: emits cache_control / cachePoint at this position.
	// For OpenAI: no-op (caching is automatic).
	Enabled bool

	// TTL requests a specific cache duration.
	// Valid values: "" (provider default, typically 5m), "5m", "1h".
	// The "1h" option requires a supporting model (Claude Haiku/Sonnet/Opus 4.5+).
	TTL string
}

CacheHint requests provider-side prompt caching for a message or request. It is a provider-neutral instruction: Anthropic and Bedrock translate it to explicit cache breakpoints on content blocks; OpenAI caching is always automatic and ignores per-message hints, but honours TTL on StreamOptions.CacheHint.

type Message

type Message interface {
	Role() Role
	Validate() error
	json.Marshaler
	// contains filtered or unexported methods
}

Message is the interface all message types implement.

type Messages added in v0.5.0

type Messages []Message

Messages is a slice of Message with JSON unmarshal support.

func (*Messages) UnmarshalJSON added in v0.5.0

func (m *Messages) UnmarshalJSON(data []byte) error

type Model

type Model struct {
	ID       string   `json:"id"`
	Name     string   `json:"name"`
	Provider string   `json:"provider"`
	Aliases  []string `json:"aliases,omitempty"`
}

Model represents an LLM model.

type ModelFetcher

type ModelFetcher interface {
	FetchModels(ctx context.Context) ([]Model, error)
}

ModelFetcher is an optional interface providers can implement to list models dynamically from their API instead of returning a static list.

type Option added in v0.12.0

type Option func(*Options)

Option configures provider options.

func APIKeyFromEnv added in v0.12.0

func APIKeyFromEnv(candidates ...string) Option

APIKeyFromEnv returns an Option that reads the API key from environment variables. It tries each candidate in order, returning the first non-empty value. Returns an error at call time if none of the candidates are set.

func WithAPIKey added in v0.12.0

func WithAPIKey(key string) Option

WithAPIKey sets a static API key.

func WithAPIKeyFunc added in v0.12.0

func WithAPIKeyFunc(f func(ctx context.Context) (string, error)) Option

WithAPIKeyFunc sets a dynamic API key resolver. The function is called on each CreateStream() call, enabling:

  • Lazy key resolution (fetch from secret manager on first use)
  • Key rotation (fetch fresh key each time)
  • Context-aware resolution (respect timeouts/cancellation)

func WithBaseURL added in v0.12.0

func WithBaseURL(url string) Option

WithBaseURL sets a custom base URL for the provider.

type Options added in v0.12.0

type Options struct {
	// BaseURL is the base URL for the provider's API.
	BaseURL string

	// APIKeyFunc returns the API key for authentication.
	// It is called on each CreateStream() call, allowing for lazy/dynamic resolution.
	APIKeyFunc func(ctx context.Context) (string, error)
}

Options holds configuration shared across providers.

func Apply added in v0.12.0

func Apply(opts ...Option) *Options

Apply applies all options to a new Options struct and returns it.

func (*Options) ResolveAPIKey added in v0.12.0

func (o *Options) ResolveAPIKey(ctx context.Context) (string, error)

ResolveAPIKey calls the APIKeyFunc to get the API key. Returns an empty string (no error) if no APIKeyFunc was configured.

type ParsedToolCall

type ParsedToolCall interface {
	ToolName() string
	ToolCallID() string
}

ParsedToolCall is the interface for parsed tool call results. Use a type switch on the concrete *TypedToolCall[T] to access typed params.

Example:

switch c := call.(type) {
case *TypedToolCall[GetWeatherParams]:
    fmt.Println(c.Params.Location)  // strongly typed
case *TypedToolCall[SearchParams]:
    fmt.Println(c.Params.Query)
}

type Provider

type Provider interface {
	Name() string
	Models() []Model
	Streamer
}

Provider is the interface each LLM backend must implement.

type ReasoningEffort added in v0.7.0

type ReasoningEffort string

ReasoningEffort controls the amount of reasoning for reasoning models. Lower values result in faster responses with fewer reasoning tokens.

const (
	// ReasoningEffortNone disables reasoning (GPT-5.1+ only).
	ReasoningEffortNone ReasoningEffort = "none"
	// ReasoningEffortMinimal uses minimal reasoning effort.
	ReasoningEffortMinimal ReasoningEffort = "minimal"
	// ReasoningEffortLow uses low reasoning effort.
	ReasoningEffortLow ReasoningEffort = "low"
	// ReasoningEffortMedium uses medium reasoning effort (default for most models before GPT-5.1).
	ReasoningEffortMedium ReasoningEffort = "medium"
	// ReasoningEffortHigh uses high reasoning effort.
	ReasoningEffortHigh ReasoningEffort = "high"
	// ReasoningEffortXHigh uses extra high reasoning effort (codex-max+ only).
	ReasoningEffortXHigh ReasoningEffort = "xhigh"
)

func (ReasoningEffort) Valid added in v0.8.0

func (r ReasoningEffort) Valid() bool

Valid returns true if the ReasoningEffort is a known valid value or empty.

type RegisterFunc added in v0.13.0

type RegisterFunc func(*Registry)

RegisterFunc is a function that conditionally registers a provider with a registry. Each provider package exports a MaybeRegister function of this type.

type Registry

type Registry struct {
	// contains filtered or unexported fields
}

Registry holds all registered providers and resolves model references.

func NewRegistry

func NewRegistry() *Registry

NewRegistry creates an empty provider registry.

func (*Registry) AllModels

func (r *Registry) AllModels() []Model

AllModels returns all models from all registered providers.

func (*Registry) CreateStream

func (r *Registry) CreateStream(ctx context.Context, opts StreamOptions) (<-chan StreamEvent, error)

CreateStream is a convenience that resolves a model ref and delegates to the provider.

func (*Registry) FetchModels

func (r *Registry) FetchModels(ctx context.Context, name string) ([]Model, error)

FetchModels returns models for a specific provider. If the provider implements ModelFetcher, it fetches models dynamically from the API. Otherwise it falls back to the static Models() list.

func (*Registry) Provider

func (r *Registry) Provider(name string) (Provider, error)

Provider returns the provider with the given name, or an error.

func (*Registry) Register

func (r *Registry) Register(p Provider)

Register adds a provider to the registry.

func (*Registry) RegisterAll added in v0.13.0

func (r *Registry) RegisterAll(fns ...RegisterFunc)

RegisterAll calls all provided registration functions.

func (*Registry) ResolveModel

func (r *Registry) ResolveModel(ref string) (Provider, string, error)

ResolveModel parses a "provider/model" string and returns the provider and model ID.

type Resolver added in v0.16.0

type Resolver interface {
	// Resolve returns the Model for the given model ID or alias.
	// Returns ErrNotFound if the model is not recognized.
	Resolve(modelID string) (Model, error)
}

Resolver resolves a model alias or ID to its full Model representation.

type Role

type Role string

Role represents the role of a message in a conversation.

const (
	RoleSystem    Role = "system"
	RoleUser      Role = "user"
	RoleAssistant Role = "assistant"
	RoleTool      Role = "tool"
)

type StreamEvent

type StreamEvent struct {
	Type      StreamEventType
	Delta     string
	Reasoning string
	ToolCall  *ToolCall
	Error     error
	Usage     *Usage
	Start     *StreamStart // Populated for StreamEventStart
}

StreamEvent is a single event emitted by a provider during streaming.

type StreamEventType

type StreamEventType string

StreamEventType identifies the kind of streaming event from a provider.

const (
	StreamEventStart     StreamEventType = "start"
	StreamEventDelta     StreamEventType = "delta"
	StreamEventReasoning StreamEventType = "reasoning"
	StreamEventToolCall  StreamEventType = "tool_call"
	StreamEventDone      StreamEventType = "done"
	StreamEventError     StreamEventType = "error"
)

type StreamOptions

type StreamOptions struct {
	Model           string
	Messages        Messages
	Tools           []ToolDefinition
	ToolChoice      ToolChoice      // nil defaults to Auto when Tools provided
	ReasoningEffort ReasoningEffort // Controls reasoning for reasoning models (OpenAI)
	CacheHint       *CacheHint      // Top-level prompt caching hint (Anthropic auto mode, Bedrock trailing cachePoint, OpenAI extended retention)
}

StreamOptions configures a provider CreateStream call.

func (StreamOptions) Validate added in v0.6.0

func (o StreamOptions) Validate() error

Validate checks that the options are valid.

type StreamStart added in v0.16.0

type StreamStart struct {
	// RequestedModel is what the caller passed in StreamOptions.Model.
	// e.g., "fast", "sonnet", "work/claude/sonnet"
	RequestedModel string

	// ResolvedModel is the fully qualified model path after resolution.
	// For aggregate: "instance/type/model" e.g., "work/claude/claude-haiku-4-5-20251001"
	// For simple providers: same as what was sent to the API.
	ResolvedModel string

	// ProviderModel is what the underlying API returned in its response.
	// e.g., "claude-haiku-4-5-20251001". May be empty if API doesn't provide it.
	ProviderModel string

	// RequestID is the unique identifier returned by the API for this request.
	// Useful for debugging and support tickets. May be empty.
	RequestID string

	// TimeToFirstToken is the duration from request start until first response data.
	TimeToFirstToken time.Duration
}

StreamStart contains metadata about the stream, emitted with StreamEventStart.

type Streamer added in v0.19.0

type Streamer interface {
	CreateStream(ctx context.Context, opts StreamOptions) (<-chan StreamEvent, error)
}

type SystemMsg added in v0.5.0

type SystemMsg struct {
	Content   string
	CacheHint *CacheHint
}

SystemMsg contains a system prompt.

func (*SystemMsg) MarshalJSON added in v0.5.0

func (m *SystemMsg) MarshalJSON() ([]byte, error)

func (*SystemMsg) Role added in v0.5.0

func (m *SystemMsg) Role() Role

func (*SystemMsg) Validate added in v0.5.0

func (m *SystemMsg) Validate() error

type ToolCall

type ToolCall struct {
	ID        string
	Name      string
	Arguments map[string]any
}

ToolCall represents a request from the LLM to invoke a tool.

func (ToolCall) MarshalJSON added in v0.5.0

func (tc ToolCall) MarshalJSON() ([]byte, error)

func (*ToolCall) UnmarshalJSON added in v0.5.0

func (tc *ToolCall) UnmarshalJSON(data []byte) error

func (ToolCall) Validate added in v0.5.0

func (tc ToolCall) Validate() error

type ToolCallResult

type ToolCallResult struct {
	ToolCallID string
	Output     string
	IsError    bool
	CacheHint  *CacheHint
}

ToolCallResult contains the result of executing a tool call.

func (*ToolCallResult) MarshalJSON added in v0.5.0

func (m *ToolCallResult) MarshalJSON() ([]byte, error)

func (*ToolCallResult) Role added in v0.5.0

func (m *ToolCallResult) Role() Role

func (*ToolCallResult) Validate added in v0.5.0

func (m *ToolCallResult) Validate() error

type ToolChoice added in v0.6.0

type ToolChoice interface {
	// contains filtered or unexported methods
}

ToolChoice controls whether and which tools the model should call.

type ToolChoiceAuto added in v0.6.0

type ToolChoiceAuto struct{}

ToolChoiceAuto lets the model decide whether to call tools. This is the default behavior when ToolChoice is nil.

type ToolChoiceNone added in v0.6.0

type ToolChoiceNone struct{}

ToolChoiceNone prevents the model from calling any tools.

type ToolChoiceRequired added in v0.6.0

type ToolChoiceRequired struct{}

ToolChoiceRequired forces the model to call at least one tool.

type ToolChoiceTool added in v0.6.0

type ToolChoiceTool struct {
	Name string
}

ToolChoiceTool forces the model to call a specific tool by name.

type ToolDefinition

type ToolDefinition struct {
	Name        string         `json:"name"`
	Description string         `json:"description"`
	Parameters  map[string]any `json:"parameters"`
}

ToolDefinition describes a tool that the model can invoke. This is used when sending tools to a provider's API.

func ToolDefinitionFor

func ToolDefinitionFor[T any](name, description string) ToolDefinition

ToolDefinitionFor creates a ToolDefinition from a Go struct type using reflection. The struct's fields are converted to a JSON Schema that describes the tool's parameters.

Field tags:

  • `json:"fieldName"` - Sets the parameter name (required)
  • `jsonschema:"description=..."` - Describes the parameter
  • `jsonschema:"required"` - Marks the parameter as required
  • `jsonschema:"enum=val1,enum=val2"` - Restricts to specific values

Example:

type GetWeatherParams struct {
    Location string `json:"location" jsonschema:"description=City name,required"`
    Unit     string `json:"unit" jsonschema:"description=Temperature unit,enum=celsius,enum=fahrenheit"`
}

tool := ToolDefinitionFor[GetWeatherParams]("get_weather", "Get current weather")

func (ToolDefinition) Validate added in v0.8.0

func (t ToolDefinition) Validate() error

Validate checks that the tool definition is valid.

type ToolSet

type ToolSet struct {
	// contains filtered or unexported fields
}

ToolSet manages a collection of tool specifications. It provides tool definitions for sending to providers and parses raw tool calls into strongly-typed results with validation.

func NewToolSet

func NewToolSet(tools ...toolRegistration) *ToolSet

NewToolSet creates a ToolSet from one or more tool specs.

Example:

tools := NewToolSet(
    NewToolSpec[GetWeatherParams]("get_weather", "Get weather"),
    NewToolSpec[SearchParams]("search", "Search the web"),
)

func (*ToolSet) Definitions

func (ts *ToolSet) Definitions() []ToolDefinition

Definitions returns all tool definitions for sending to providers.

func (*ToolSet) Parse

func (ts *ToolSet) Parse(calls []ToolCall) ([]ParsedToolCall, error)

Parse converts raw ToolCalls (from stream events) into typed ParsedToolCalls. Each tool call's arguments are validated against its JSON Schema before parsing.

Successfully parsed calls are always returned. Errors from unknown tool names or validation/parse failures are collected and returned as a joined error. The error is non-fatal - you get all successfully parsed calls.

Example:

calls, err := tools.Parse(rawToolCalls)
if err != nil {
    log.Printf("parse warnings: %v", err)
}
for _, call := range calls {
    switch c := call.(type) {
    case *TypedToolCall[GetWeatherParams]:
        fmt.Println(c.Params.Location)
    }
}

type ToolSpec

type ToolSpec[T any] struct {
	// contains filtered or unexported fields
}

ToolSpec[T] is a type-safe tool specification that pairs a tool name/description with a Go struct that defines the parameter schema. It includes a compiled JSON Schema for runtime validation.

func NewToolSpec

func NewToolSpec[T any](name, description string) *ToolSpec[T]

NewToolSpec creates a typed tool specification from a parameter struct. The struct's fields define the JSON Schema for the tool's parameters. Field tags are the same as ToolDefinitionFor: json, jsonschema.

Example:

type GetWeatherParams struct {
    Location string `json:"location" jsonschema:"description=City name,required"`
}
spec := NewToolSpec[GetWeatherParams]("get_weather", "Get current weather")

func (*ToolSpec[T]) Definition

func (s *ToolSpec[T]) Definition() ToolDefinition

Definition returns the ToolDefinition for sending to providers.

type TypedToolCall

type TypedToolCall[T any] struct {
	ID     string // Original tool call ID (for sending results back)
	Name   string // Tool name
	Params T      // Parsed, validated parameters
}

TypedToolCall[T] holds a parsed tool call with strongly-typed parameters.

func (*TypedToolCall[T]) ToolCallID

func (c *TypedToolCall[T]) ToolCallID() string

ToolCallID returns the tool call ID.

func (*TypedToolCall[T]) ToolName

func (c *TypedToolCall[T]) ToolName() string

ToolName returns the tool name.

type Usage

type Usage struct {
	InputTokens  int
	OutputTokens int
	TotalTokens  int

	// Cost is the total request cost in USD.
	// For Anthropic, Bedrock, and OpenAI this is locally calculated from
	// provider pricing tables and equals the sum of the breakdown fields below.
	// For OpenRouter this is API-reported by the proxy (already includes cache pricing).
	Cost float64

	// Detailed token breakdown (provider-specific, may be zero)
	CachedTokens     int // Prompt tokens served from cache (all providers)
	CacheWriteTokens int // Prompt tokens written to cache (Anthropic, Bedrock)
	ReasoningTokens  int // Tokens used for model reasoning

	// Granular cost breakdown in USD (zero if provider/model pricing is unknown).
	// Sum of InputCost + CachedCost + CacheWriteCost + OutputCost == Cost.
	// Not populated for OpenRouter (API-reported cost is used instead).
	InputCost      float64 // Cost of regular (non-cached) input tokens
	CachedCost     float64 // Cost of cache-read tokens
	CacheWriteCost float64 // Cost of cache-write tokens
	OutputCost     float64 // Cost of output tokens
}

Usage holds token counts and cost from a provider response.

type UserMsg added in v0.5.0

type UserMsg struct {
	Content   string
	CacheHint *CacheHint
}

UserMsg contains user input.

func (*UserMsg) MarshalJSON added in v0.5.0

func (m *UserMsg) MarshalJSON() ([]byte, error)

func (*UserMsg) Role added in v0.5.0

func (m *UserMsg) Role() Role

func (*UserMsg) Validate added in v0.5.0

func (m *UserMsg) Validate() error

Directories

Path Synopsis
cmd
llmcli command
llmcli is a command-line tool for testing LLM providers.
llmcli is a command-line tool for testing LLM providers.
llmcli/cmds
Package cmds provides CLI commands for llmcli.
Package cmds provides CLI commands for llmcli.
llmcli/store
Package store provides token storage implementations.
Package store provides token storage implementations.
Package modeldb provides access to the models.dev model database.
Package modeldb provides access to the models.dev model database.
anthropic/claude
Package claude provides an Anthropic provider using Claude OAuth tokens.
Package claude provides an Anthropic provider using Claude OAuth tokens.
auto
Package auto provides zero-config multi-provider setup for LLM providers.
Package auto provides zero-config multi-provider setup for LLM providers.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL