llm

package
v1.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 17, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package llm provides an OpenAI-compatible HTTP client using only stdlib.

Package llm provides an OpenAI-compatible HTTP client using only stdlib.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ApplyCacheMarkers

func ApplyCacheMarkers(messages []Message) ([]Message, []SystemBlock)

ApplyCacheMarkers annotates messages with Anthropic-style cache_control markers to enable prompt caching. It:

  1. Marks the first system message (if present) with cache_control: ephemeral
  2. Marks the first user message with cache_control: ephemeral

Returns the updated messages and a System field (populated if the system message was moved out of the messages array for Anthropic compatibility). Providers that don't support prompt caching silently ignore these fields.

func DiscoverModelContext

func DiscoverModelContext(baseURL, apiKey, model string) int

DiscoverModelContext queries the /models endpoint of the configured base URL to discover the context window for the given model. Returns 0 if the endpoint doesn't support model attribute discovery or the model isn't found.

Results are cached per (baseURL, apiKey) so multiple agents using the same provider share a single API call. The full model list is cached so different model lookups from the same endpoint don't re-query.

Call this at startup before creating the engine. The HTTP call uses a 5s timeout and never blocks startup for more than that.

func ResetModelCache

func ResetModelCache()

ResetModelCache clears the model discovery cache. Used in tests.

Types

type CacheControl

type CacheControl struct {
	Type string `json:"type"` // "ephemeral"
}

CacheControl marks a message or system block as cacheable by Anthropic. Providers that don't support it (OpenAI, DeepSeek) silently ignore the field.

type CallParams

type CallParams struct {
	Model           string          `json:"model"`
	Messages        []Message       `json:"messages"`
	System          []SystemBlock   `json:"system,omitempty"` // Anthropic-style system blocks
	Tools           []ToolDef       `json:"tools,omitempty"`
	Stream          bool            `json:"stream"`
	MaxTokens       int             `json:"max_tokens,omitempty"`  // max output tokens (0 = omit/provider default)
	Temperature     *float64        `json:"temperature,omitempty"` // 0–2, nil = provider default
	Thinking        *ThinkingConfig `json:"thinking,omitempty"`
	ReasoningEffort string          `json:"reasoning_effort,omitempty"`
}

CallParams is the request body for /chat/completions.

type CallResult

type CallResult struct {
	Content          string     // assistant text
	ReasoningContent string     // DeepSeek reasoning/thinking tokens
	ToolCalls        []ToolCall // tool calls requested by the model
	InputTokens      int        // prompt_tokens from API usage (0 = not reported)
	OutputTokens     int        // completion_tokens from API usage (0 = not reported)

	// Cache metrics. Only populated when the provider returns them.
	// Anthropic: cache_creation_input_tokens, cache_read_input_tokens
	// OpenAI: prompt_tokens_details.cached_tokens
	CacheCreationTokens int // Anthropic — tokens written to cache
	CacheReadTokens     int // Anthropic — tokens read from cache hit
	CachedTokens        int // OpenAI — cached tokens in prompt
}

CallResult is the parsed response from /chat/completions.

type Client

type Client struct {
	BaseURL        string
	APIKey         string
	Model          string
	Thinking       string  // "enabled", "disabled", "low", "medium", "high", or empty
	ThinkingBudget int     // max thinking tokens for Anthropic extended thinking (0 = use default 5000)
	MaxTokens      int     // max output tokens (0 = provider default)
	Temperature    float64 // 0 = use provider default, <0 = omit from request
	// contains filtered or unexported fields
}

Client sends chat completion requests to any OpenAI-compatible endpoint.

func New

func New(baseURL, apiKey, model, thinking string, thinkingBudget int, timeout time.Duration) *Client

New creates a Client with the given timeout. Pass 0 to use the default (120s). The timeout applies per HTTP request — the agent loop may have multiple requests; set a generous timeout for deep-reasoning models.

func NewWithMaxTokens

func NewWithMaxTokens(baseURL, apiKey, model, thinking string, thinkingBudget int, maxTokens int, timeout time.Duration) *Client

NewWithMaxTokens creates a Client with a specific max_tokens setting. maxTokens=0 means no limit (provider default).

func (*Client) Call

func (c *Client) Call(ctx context.Context, messages []Message, systemBlocks []SystemBlock, tools []ToolDef) (*CallResult, error)

Call sends a chat completion request and returns the result. systemBlocks is optional — pass nil for providers that don't support the separate System field (OpenAI, DeepSeek). When non-nil, the system prompt is sent in the "system" field instead of as a system message in the messages array (Anthropic format for prompt caching).

func (*Client) SimpleCall

func (c *Client) SimpleCall(ctx context.Context, systemPrompt, userPrompt string) (string, error)

SimpleCall sends a single-turn chat completion request and returns the text response. No tools, no streaming, no thinking config. Used for lightweight LLM calls like skill risk assessment.

type FunctionDef

type FunctionDef struct {
	Name        string `json:"name"`
	Description string `json:"description"`
	Parameters  any    `json:"parameters"`
}

FunctionDef defines a single tool's function signature.

type Message

type Message struct {
	Role             string        `json:"role"`           // "system", "user", "assistant", "tool"
	Content          string        `json:"content"`        // text content
	Name             string        `json:"name,omitempty"` // tool name (for tool role)
	ToolCallID       string        `json:"tool_call_id,omitempty"`
	ToolCalls        []ToolCall    `json:"tool_calls,omitempty"`        // required for assistant role with tool calls
	ReasoningContent string        `json:"reasoning_content,omitempty"` // DeepSeek reasoning tokens, must be echoed back
	CacheControl     *CacheControl `json:"cache_control,omitempty"`     // Anthropic prompt caching marker
}

Message represents a chat message.

type SystemBlock

type SystemBlock struct {
	Type         string        `json:"type"` // "text"
	Text         string        `json:"text"`
	CacheControl *CacheControl `json:"cache_control,omitempty"`
}

SystemBlock represents an Anthropic-style system prompt block with optional cache control. OpenAI-compatible endpoints that don't support this format silently ignore the field.

type ThinkingConfig

type ThinkingConfig struct {
	Type         string `json:"type"`                    // "enabled" or "disabled"
	BudgetTokens int    `json:"budget_tokens,omitempty"` // Anthropic: max thinking tokens
}

ThinkingConfig controls extended thinking for DeepSeek and Anthropic models. Anthropic requires budget_tokens when type is "enabled"; DeepSeek ignores it.

type ToolCall

type ToolCall struct {
	ID       string `json:"id"`
	Type     string `json:"type"` // always "function"
	Function struct {
		Name      string `json:"name"`
		Arguments string `json:"arguments"`
	} `json:"function"`
}

ToolCall represents a single tool invocation requested by the model. Matches the OpenAI API format exactly.

type ToolDef

type ToolDef struct {
	Type     string      `json:"type"`
	Function FunctionDef `json:"function"`
}

ToolDef is the JSON Schema definition of a tool.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL