Documentation
¶
Overview ¶
Package router defines HTTP route registration and middleware chaining.
Package router defines HTTP route registration and middleware chaining, as well as model selection based on request scenarios.
Package router defines HTTP route registration and middleware chaining.
Index ¶
- func GetFallbackChain(primary config.ModelConfig, fallbacks map[string][]config.ModelConfig) []config.ModelConfig
- func IsRetryableError(err error) bool
- func IsUsageLimitError(err error) bool
- type CapacityDecision
- type CircuitBreaker
- type CircuitState
- type EvaluationContext
- type FallbackHandler
- type FallbackResult
- type MessageContent
- type ModelOverridePolicy
- type ModelRouter
- func (r *ModelRouter) IsStreamingScenarioRoutingEnabled() bool
- func (r *ModelRouter) Route(messages []MessageContent, tokenCount int, requestedModel string) (RouteResult, error)
- func (r *ModelRouter) RouteForStreaming(messages []MessageContent, tokenCount int, requestedModel string) (RouteResult, error)
- func (r *ModelRouter) RouteWithOverride(requestedModel string) (RouteResult, bool)
- type Policy
- type PolicyEngine
- type RequestFacts
- type RouteDecision
- type RouteResult
- type Scenario
- type ScenarioPolicy
- type ScenarioResult
- type SkippedModel
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func GetFallbackChain ¶
func GetFallbackChain(primary config.ModelConfig, fallbacks map[string][]config.ModelConfig) []config.ModelConfig
GetFallbackChain returns the fallback chain for a given primary model.
func IsRetryableError ¶
IsRetryableError determines if an error is worth retrying with a fallback.
func IsUsageLimitError ¶ added in v0.4.4
IsUsageLimitError returns true if the error is a GoUsageLimitError. Usage limit errors should be passed directly to the client instead of triggering a fallback, as fallback attempts will also encounter the same usage limit within a short period.
Types ¶
type CapacityDecision ¶ added in v0.3.5
type CapacityDecision struct {
Models []config.ModelConfig
Skipped []SkippedModel
InputTokens int
RequestedMaxTokens int
SelectedMaxTokens int
ContextWindow int
ContextMargin int
NeedsVision bool
NeedsTools bool
}
func FilterByCapacity ¶ added in v0.3.5
func FilterByCapacity(chain []config.ModelConfig, inputTokens int, requestedMaxTokens int, needsVision bool, needsTools bool) (CapacityDecision, error)
type CircuitBreaker ¶
type CircuitBreaker struct {
// contains filtered or unexported fields
}
CircuitBreaker tracks failure rates and prevents calls to failing models.
func NewCircuitBreaker ¶
func NewCircuitBreaker(threshold int, recoveryTimeout time.Duration) *CircuitBreaker
NewCircuitBreaker creates a circuit breaker with default thresholds.
func (*CircuitBreaker) AllowRequest ¶
func (cb *CircuitBreaker) AllowRequest() bool
AllowRequest returns true if the circuit allows a request.
func (*CircuitBreaker) RecordFailure ¶
func (cb *CircuitBreaker) RecordFailure()
RecordFailure records a failed call.
func (*CircuitBreaker) RecordSuccess ¶
func (cb *CircuitBreaker) RecordSuccess()
RecordSuccess records a successful call.
func (*CircuitBreaker) State ¶
func (cb *CircuitBreaker) State() CircuitState
State returns the current circuit state.
type CircuitState ¶
type CircuitState int
CircuitState represents the state of a circuit breaker.
const ( CircuitClosed CircuitState = iota // Normal operation CircuitHalfOpen // Testing if service recovered CircuitOpen // Failing fast, not attempting calls )
type EvaluationContext ¶
type EvaluationContext struct {
Request *core.NormalizedRequest
TokenCount int
AvailableModels []config.ModelConfig
History []RouteDecision
}
EvaluationContext carries all information needed to evaluate routing policies.
type FallbackHandler ¶
type FallbackHandler struct {
// contains filtered or unexported fields
}
FallbackHandler manages model fallback with circuit breaker protection.
func NewFallbackHandler ¶
func NewFallbackHandler(logger *slog.Logger, cbThreshold int, cbTimeout time.Duration) *FallbackHandler
NewFallbackHandler creates a new fallback handler with circuit breakers.
func (*FallbackHandler) ExecuteWithFallback ¶
func (h *FallbackHandler) ExecuteWithFallback( ctx context.Context, models []config.ModelConfig, executor func(context.Context, config.ModelConfig) ([]byte, error), ) (*FallbackResult, []byte, error)
ExecuteWithFallback tries models in sequence until one succeeds. Respects circuit breaker state to skip models that are failing repeatedly.
func (*FallbackHandler) GetCircuitStates ¶
func (h *FallbackHandler) GetCircuitStates() map[string]string
GetCircuitStates returns the state of all circuit breakers.
type FallbackResult ¶
type FallbackResult struct {
ModelID string
Success bool
Error error
Attempted int
TotalModels int
}
FallbackResult contains the result of a fallback attempt.
type MessageContent ¶
MessageContent represents a single message in a conversation.
type ModelOverridePolicy ¶
type ModelOverridePolicy struct {
// contains filtered or unexported fields
}
ModelOverridePolicy checks whether the requested model has an entry in model_overrides. If so, it uses that override as the primary and appends the default fallback chain.
func NewModelOverridePolicy ¶
func NewModelOverridePolicy(router *ModelRouter) *ModelOverridePolicy
NewModelOverridePolicy creates a model override policy.
func (*ModelOverridePolicy) Evaluate ¶
func (p *ModelOverridePolicy) Evaluate(ctx *EvaluationContext) ([]config.ModelConfig, RouteDecision, error)
Evaluate checks model_overrides for the requested model.
func (*ModelOverridePolicy) Name ¶
func (p *ModelOverridePolicy) Name() string
Name returns the policy identifier.
type ModelRouter ¶
type ModelRouter struct {
// contains filtered or unexported fields
}
ModelRouter handles model selection based on scenarios.
func NewModelRouter ¶
func NewModelRouter(atomic *config.AtomicConfig) *ModelRouter
NewModelRouter creates a new model router.
func (*ModelRouter) IsStreamingScenarioRoutingEnabled ¶
func (r *ModelRouter) IsStreamingScenarioRoutingEnabled() bool
IsStreamingScenarioRoutingEnabled returns whether streaming requests should use scenario-based routing instead of always routing to the fast model.
func (*ModelRouter) Route ¶
func (r *ModelRouter) Route(messages []MessageContent, tokenCount int, requestedModel string) (RouteResult, error)
Route determines which model to use for a request. If respect_requested_model is enabled and requestedModel is provided, it overrides scenario-based routing.
func (*ModelRouter) RouteForStreaming ¶
func (r *ModelRouter) RouteForStreaming(messages []MessageContent, tokenCount int, requestedModel string) (RouteResult, error)
RouteForStreaming determines which model to use for streaming requests. Prioritizes fast TTFT (time-to-first-token) over capability. If respect_requested_model is enabled and requestedModel is provided, it overrides scenario-based routing.
func (*ModelRouter) RouteWithOverride ¶
func (r *ModelRouter) RouteWithOverride(requestedModel string) (RouteResult, bool)
RouteWithOverride checks if the requested model matches a model_overrides entry.
When matched, the returned RouteResult uses the override ModelConfig as the primary. The fallback chain is fallbacks[<requestedModel>], falling back to fallbacks["default"] when the override key has no entry (matching the behavior of Route and RouteForStreaming). The caller (MessagesHandler) is expected to merge a scenario-derived safety-net chain on top.
Returns the override RouteResult and true if matched, or a zero value and false if the requested model has no entry in model_overrides.
type Policy ¶
type Policy interface {
// Name returns the policy identifier.
Name() string
// Evaluate examines the context and returns the model chain to try, the
// decision explanation, or an error if no model matches this policy.
Evaluate(ctx *EvaluationContext) ([]config.ModelConfig, RouteDecision, error)
}
Policy evaluates a routing strategy and selects a model chain.
type PolicyEngine ¶
type PolicyEngine struct {
// contains filtered or unexported fields
}
PolicyEngine composes multiple policies with ordered evaluation. Policies are evaluated in registration order; the first policy that returns a non-empty chain wins.
func NewPolicyEngine ¶
func NewPolicyEngine() *PolicyEngine
NewPolicyEngine creates a policy engine with the default set of policies:
- ModelOverridePolicy — check model_overrides config entries
- RespectRequestModelPolicy — check respect_requested_model config
- ScenarioPolicy — scenario-based routing (existing DetectScenario logic)
func (*PolicyEngine) AddPolicy ¶
func (eng *PolicyEngine) AddPolicy(p Policy)
AddPolicy appends a policy to the evaluation chain.
func (*PolicyEngine) Evaluate ¶
func (eng *PolicyEngine) Evaluate(ctx *EvaluationContext) ([]config.ModelConfig, RouteDecision, error)
Evaluate runs each policy in order and returns the first successful result.
func (*PolicyEngine) EvaluateDryRun ¶
func (eng *PolicyEngine) EvaluateDryRun(ctx *EvaluationContext) []RouteDecision
EvaluateDryRun returns all policy decisions without executing. Useful for debugging and the dry-run endpoint.
type RequestFacts ¶ added in v0.3.5
type RequestFacts struct {
LatestUserText string
LatestUserHasImage bool
LatestTextComplexIntent bool
NeedsVision bool
}
func AnalyzeRequestFacts ¶ added in v0.3.5
func AnalyzeRequestFacts(messages []MessageContent) RequestFacts
type RouteDecision ¶
type RouteDecision struct {
PolicyName string
ModelID string
Provider string
Reason string
Weight int
}
RouteDecision records a routing decision for observability.
type RouteResult ¶
type RouteResult struct {
Primary config.ModelConfig
Fallbacks []config.ModelConfig
Scenario Scenario
}
RouteResult contains the selected model and fallback chain.
func (*RouteResult) GetModelChain ¶
func (rr *RouteResult) GetModelChain() []config.ModelConfig
GetModelChain returns the full chain of models to try (primary + fallbacks).
type Scenario ¶
type Scenario string
Scenario represents the routing scenario for model selection.
const ( ScenarioDefault Scenario = "default" ScenarioBackground Scenario = "background" ScenarioThink Scenario = "think" ScenarioComplex Scenario = "complex" ScenarioLongContext Scenario = "long_context" ScenarioFast Scenario = "fast" ScenarioOverride Scenario = "override" ScenarioVision Scenario = "vision" ScenarioVisionComplex Scenario = "vision_complex" ScenarioVisionLongContext Scenario = "vision_long_context" )
type ScenarioPolicy ¶
type ScenarioPolicy struct {
// contains filtered or unexported fields
}
ScenarioPolicy runs scenario-based routing using the existing DetectScenario logic. It handles both streaming and non-streaming paths.
func NewScenarioPolicy ¶
func NewScenarioPolicy(router *ModelRouter) *ScenarioPolicy
NewScenarioPolicy creates a scenario policy.
func (*ScenarioPolicy) Evaluate ¶
func (p *ScenarioPolicy) Evaluate(ctx *EvaluationContext) ([]config.ModelConfig, RouteDecision, error)
Evaluate runs scenario detection and returns the model chain.
func (*ScenarioPolicy) Name ¶
func (p *ScenarioPolicy) Name() string
Name returns the policy identifier.
type ScenarioResult ¶
ScenarioResult contains the detected scenario and token count.
func DetectScenario ¶
func DetectScenario(messages []MessageContent, tokenCount int, cfg *config.Config) ScenarioResult
DetectScenario analyzes a request to determine which model to use. Routing priority:
- Long Context (> threshold)
- Complex (architectural patterns or tool-heavy operations)
- Think (reasoning patterns)
- Background (simple operations with NO tools)
- Default
For streaming requests, consider using RouteForStreaming() to prefer faster models.
func RouteForStreaming ¶
func RouteForStreaming(messages []MessageContent, tokenCount int, cfg *config.Config) ScenarioResult
RouteForStreaming selects a model optimized for streaming latency. For streaming, we prioritize fast TTFT (time-to-first-token) over capability. This may return a less capable model but one that streams faster.