Documentation
¶
Overview ¶
Package model provides cache control for LLM backends (RFC 5).
Package model provides model integration for Orla Agent Mode (RFC 4).
Index ¶
- func ParseModelIdentifier(modelID string) (provider, modelName string, err error)
- type CacheController
- type CacheState
- type ContentEvent
- type Message
- type MessageRole
- type OllamaProvider
- func (p *OllamaProvider) Chat(ctx context.Context, messages []Message, tools []*mcp.Tool, stream bool, ...) (*Response, <-chan StreamEvent, error)
- func (p *OllamaProvider) EnsureReady(ctx context.Context) error
- func (p *OllamaProvider) Name() string
- func (p *OllamaProvider) SetTimeout(timeout time.Duration)
- type OpenAIProvider
- type Provider
- type Response
- type ResponseMetrics
- type SGLangCacheController
- type StreamEvent
- type StreamEventType
- type StreamWriter
- type ThinkingEvent
- type ToolCallEvent
- type ToolCallWithID
- type ToolResultWithID
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ParseModelIdentifier ¶
ParseModelIdentifier parses a model identifier string (e.g., "ollama:llama3") and returns the provider name and model name
Types ¶
type CacheController ¶ added in v1.0.0
type CacheController interface {
// FlushCache flushes the KV cache for the backend
FlushCache(ctx context.Context) error
// GetCacheState returns the current cache state
GetCacheState() CacheState
// GetMemoryPressure returns the current memory pressure (0.0-1.0)
// Returns 0.0 if memory pressure cannot be determined
GetMemoryPressure(ctx context.Context) (float64, error)
}
CacheController is the interface for backend-specific cache control
func NewCacheController ¶ added in v1.0.0
func NewCacheController(serverConfig *config.LLMServerConfig) (CacheController, error)
NewCacheController creates a cache controller based on the LLM server configuration
type CacheState ¶ added in v1.0.0
type CacheState struct {
// IsFlushed indicates whether the cache has been flushed
IsFlushed bool
// LastFlushTime is the timestamp of the last flush
LastFlushTime int64
}
CacheState represents the state of a cache
type ContentEvent ¶
type ContentEvent struct {
Content string
}
ContentEvent represents a content chunk in the stream
func (*ContentEvent) Type ¶
func (e *ContentEvent) Type() StreamEventType
type Message ¶
type Message struct {
Role MessageRole `json:"role"` // "user", "assistant", "system", or "tool"
Content string `json:"content"` // Message content
ToolName string `json:"tool_name,omitempty"` // Tool name, this is required when role is "tool" for Ollama)
ToolCallID string `json:"tool_call_id,omitempty"` // Tool call ID, this is required when role is "tool" for OpenAI API
}
Message represents a chat message in a conversation
type MessageRole ¶
type MessageRole string
const ( MessageRoleUser MessageRole = "user" MessageRoleAssistant MessageRole = "assistant" MessageRoleSystem MessageRole = "system" MessageRoleTool MessageRole = "tool" )
func (MessageRole) String ¶
func (r MessageRole) String() string
type OllamaProvider ¶
type OllamaProvider struct {
// contains filtered or unexported fields
}
OllamaProvider implements the Provider interface for Ollama
func NewOllamaProvider ¶
func NewOllamaProvider(modelName string, cfg *config.OrlaConfig) (*OllamaProvider, error)
NewOllamaProvider creates a new Ollama provider
func (*OllamaProvider) Chat ¶
func (p *OllamaProvider) Chat(ctx context.Context, messages []Message, tools []*mcp.Tool, stream bool, maxTokens int) (*Response, <-chan StreamEvent, error)
Chat sends a chat request to Ollama
func (*OllamaProvider) EnsureReady ¶
func (p *OllamaProvider) EnsureReady(ctx context.Context) error
EnsureReady ensures Ollama is running and ready It checks if Ollama is accessible via HTTP health check.
func (*OllamaProvider) SetTimeout ¶
func (p *OllamaProvider) SetTimeout(timeout time.Duration)
SetTimeout sets the timeout for the Ollama provider
type OpenAIProvider ¶ added in v1.0.0
type OpenAIProvider struct {
// contains filtered or unexported fields
}
OpenAIProvider implements the Provider interface for OpenAI-compatible APIs. This provider is intended to work with any server that implements the OpenAI Chat Completions API format such as LM Studio, vLLM, and even ollama (even though we have a separate Ollama provider). For ollama, this goes through Ollama's Open-AI compatible API [1]. [1] https://docs.ollama.com/api/openai-compatibility
func NewOpenAIProvider ¶ added in v1.0.0
func NewOpenAIProvider(modelName string, cfg *config.OrlaConfig) (*OpenAIProvider, error)
NewOpenAIProvider creates a new OpenAI-compatible provider. This works with any server that implements the OpenAI Chat Completions API format.
func (*OpenAIProvider) Chat ¶ added in v1.0.0
func (p *OpenAIProvider) Chat(ctx context.Context, messages []Message, tools []*mcp.Tool, stream bool, maxTokens int) (*Response, <-chan StreamEvent, error)
Chat sends a chat request to the OpenAI-compatible API. This works with any server implementing the OpenAI Chat Completions API format.
func (*OpenAIProvider) EnsureReady ¶ added in v1.0.0
func (p *OpenAIProvider) EnsureReady(ctx context.Context) error
EnsureReady is a no-op for the OpenAI-compatible provider.
func (*OpenAIProvider) Name ¶ added in v1.0.0
func (p *OpenAIProvider) Name() string
Name returns the provider name
type Provider ¶
type Provider interface {
// Name returns the provider name (e.g., "ollama", "openai", "anthropic")
Name() string
// Chat sends a chat request to the model and returns the response
// messages: conversation history
// tools: available tools (for tool calling) - uses mcp.Tool for MCP compatibility
// stream: if true, stream responses via the returned channel
// maxTokens: maximum number of tokens to generate; 0 means no limit (use provider default)
Chat(ctx context.Context, messages []Message, tools []*mcp.Tool, stream bool, maxTokens int) (*Response, <-chan StreamEvent, error)
// EnsureReady ensures the model provider is ready (e.g., starts Ollama if needed)
// Returns an error if the provider cannot be made ready
EnsureReady(ctx context.Context) error
}
Provider is the interface that all model providers must implement
func NewProvider ¶
func NewProvider(cfg *config.OrlaConfig) (Provider, error)
NewProvider creates a new model provider based on the configuration
func NewProviderFromLLMServerConfig ¶ added in v1.0.0
func NewProviderFromLLMServerConfig(serverConfig *config.LLMServerConfig) (Provider, error)
NewProviderFromLLMServerConfig creates a new model provider from an LLM server configuration (RFC 5)
type Response ¶
type Response struct {
Content string `json:"content"` // Text content from the model
Thinking string `json:"thinking"` // Thinking trace from the model (if supported)
ToolCalls []ToolCallWithID `json:"tool_calls"` // Tool calls requested by the model
ToolResults []ToolResultWithID `json:"tool_results"` // Tool results returned by the model
Metrics *ResponseMetrics `json:"metrics"` // Response metrics
}
Response represents a model response
type ResponseMetrics ¶ added in v1.1.0
type ResponseMetrics struct {
// TTFTMs is time to first token in milliseconds. Only set when task was executed with streaming.
TTFTMs int64 `json:"ttft_ms,omitempty"`
// TPOTMs is time per output token in milliseconds. Only set when task was executed with streaming.
TPOTMs int64 `json:"tpot_ms,omitempty"`
}
type SGLangCacheController ¶ added in v1.0.0
type SGLangCacheController struct {
// contains filtered or unexported fields
}
SGLangCacheController implements cache control for SGLang backends
func NewSGLangCacheController ¶ added in v1.0.0
func NewSGLangCacheController(baseURL string, client *http.Client) *SGLangCacheController
NewSGLangCacheController creates a new SGLang cache controller
func (*SGLangCacheController) FlushCache ¶ added in v1.0.0
func (c *SGLangCacheController) FlushCache(ctx context.Context) error
FlushCache flushes the KV cache by calling SGLang's /flush_cache endpoint
func (*SGLangCacheController) GetCacheState ¶ added in v1.0.0
func (c *SGLangCacheController) GetCacheState() CacheState
GetCacheState returns the current cache state
func (*SGLangCacheController) GetMemoryPressure ¶ added in v1.0.2
func (c *SGLangCacheController) GetMemoryPressure(ctx context.Context) (float64, error)
GetMemoryPressure queries SGLang for current KV cache memory pressure Returns the KV cache utilization as a fraction (0.0-1.0)
type StreamEvent ¶
type StreamEvent interface {
// Type returns the type of stream event
Type() StreamEventType
}
StreamEvent represents a single event in the streaming response
type StreamEventType ¶
type StreamEventType string
StreamEventType represents the type of stream event
const ( StreamEventTypeContent StreamEventType = "content" // Text content chunk StreamEventTypeToolCall StreamEventType = "toolcall" // Tool call notification StreamEventTypeThinking StreamEventType = "thinking" // Thinking trace chunk )
type StreamWriter ¶
StreamWriter is an interface for writing streaming responses
type ThinkingEvent ¶
type ThinkingEvent struct {
Content string
}
ThinkingEvent represents a thinking trace chunk in the stream
func (*ThinkingEvent) Type ¶
func (e *ThinkingEvent) Type() StreamEventType
type ToolCallEvent ¶
ToolCallEvent represents a tool call notification in the stream
func (*ToolCallEvent) Type ¶
func (e *ToolCallEvent) Type() StreamEventType
type ToolCallWithID ¶
type ToolCallWithID struct {
ID string `json:"id"` // Unique identifier for this tool call
McpCallToolParams mcp.CallToolParams
}
ToolCallWithID represents a tool invocation request from the model. It embeds mcp.CallToolParams for MCP compatibility, and adds an ID for tracking in the agent loop (to match results back to calls).
type ToolResultWithID ¶
type ToolResultWithID struct {
ID string `json:"id"` // Tool call ID this result corresponds to
McpCallToolResult mcp.CallToolResult
}
ToolResultWithID represents the result of a tool execution. It embeds mcp.CallToolResult for MCP compatibility, and adds an ID to match back to the original ToolCall.