evaluator

package
v0.411.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 1, 2026 License: MIT Imports: 20 Imported by: 0

Documentation

Index

Constants

View Source
const ContextProfileKey contextKey = "context_profile"

ContextProfileKey is used to store a ContextProfile in the request context.

View Source
const SelectedTemplateKey contextKey = "selected_template_id"

SelectedTemplateKey is used to store the selected template ID in context.

Variables

This section is empty.

Functions

func UCBScore

func UCBScore(avgReward float64, totalSessions, timesUsed int, explorationC float64) float64

UCBScore computes the UCB1 value for a template. UCB_i = avg_reward_i + c * sqrt(ln(N) / n_i) where N = total sessions evaluated, n_i = times template i was used.

Types

type ContextProfile added in v0.407.0

type ContextProfile struct {
	// TaskType is the classified task type (e.g. "code", "debug", "refactor").
	TaskType string
	// RelevantToolNames lists the tool names to keep for this task.
	// An empty slice means keep all tools (no filtering).
	RelevantToolNames []string
	// SkipSections lists prompt section names that are not needed for this task.
	SkipSections []string
	// Confidence is a 0.0–1.0 measure of the trimmer's certainty.
	// Profiles with Confidence < 0.5 should be ignored and defaults used.
	Confidence float64
}

ContextProfile is produced by the ContextTrimmer at session start. It guides the PromptBuilder and agent tool assembly to include only what is relevant for the current task.

type ContextTrimmer added in v0.407.0

type ContextTrimmer struct {
	// contains filtered or unexported fields
}

ContextTrimmer uses a cheap LLM to produce a ContextProfile at session start. It complements the post-session Judge: instead of "what went wrong?", it asks "what is needed for this task?".

func NewContextTrimmer added in v0.407.0

func NewContextTrimmer(cfg config.EvaluatorConfig, j *Judge) *ContextTrimmer

NewContextTrimmer creates a ContextTrimmer that reuses the evaluator's judge infrastructure. Returns nil if the evaluator has no judge configured.

func (*ContextTrimmer) ProfileTask added in v0.407.0

func (ct *ContextTrimmer) ProfileTask(
	ctx context.Context,
	firstMessage string,
	availableTools []ToolInfo,
) (*ContextProfile, error)

ProfileTask analyzes the user's first message and returns a ContextProfile describing which tools and prompt sections are relevant for the task.

The LLM call is bounded by a 3-second timeout; on timeout or error the method returns nil so callers can fall back to defaults. Results are cached by a hash of (firstMessage + toolNames) to avoid redundant LLM calls.

type EvaluatorService

type EvaluatorService struct {
	// contains filtered or unexported fields
}

EvaluatorService is the concrete implementation of Service.

func New

New creates a new EvaluatorService. Returns nil if disabled.

func (*EvaluatorService) ClassifyTask added in v0.407.0

func (s *EvaluatorService) ClassifyTask(text string) string

ClassifyTask returns a task type label from the user's first message. It uses compiled patterns from config, evaluated in order; first match wins. Returns "general" if no pattern matches or the text is empty.

func (*EvaluatorService) EvaluateSession

func (s *EvaluatorService) EvaluateSession(ctx context.Context, sessionID string) error

EvaluateSession triggers evaluation of a completed session.

func (*EvaluatorService) GetActiveSkills

func (s *EvaluatorService) GetActiveSkills(ctx context.Context, taskType string) ([]Skill, error)

GetActiveSkills returns active skills for a task type.

func (*EvaluatorService) GetStats

func (s *EvaluatorService) GetStats(ctx context.Context) (*Stats, error)

GetStats returns system statistics for TUI display.

func (*EvaluatorService) IsEnabled

func (s *EvaluatorService) IsEnabled() bool

IsEnabled returns whether the evaluator is active.

func (*EvaluatorService) NewContextTrimmer added in v0.407.0

func (s *EvaluatorService) NewContextTrimmer() *ContextTrimmer

NewContextTrimmer creates a ContextTrimmer backed by this service's judge infrastructure. Returns nil if the evaluator has no judge configured or if the evaluator itself is nil.

func (*EvaluatorService) RecordTemplateSelection

func (s *EvaluatorService) RecordTemplateSelection(_ context.Context, sessionID, templateID string)

RecordTemplateSelection stores the template used in this session for later evaluation.

func (*EvaluatorService) SelectTemplate

func (s *EvaluatorService) SelectTemplate(ctx context.Context, sectionName string) (*PromptTemplate, error)

SelectTemplate returns the best template for a section using UCB.

type Judge added in v0.244.0

type Judge struct {
	// contains filtered or unexported fields
}

Judge calls an LLM model to evaluate session quality.

func (*Judge) Evaluate added in v0.244.0

func (j *Judge) Evaluate(ctx context.Context, meta JudgeMeta, customPromptTemplate string) (*JudgeOutput, error)

Evaluate calls the judge model with the session transcript and returns structured output.

type JudgeMeta

type JudgeMeta struct {
	TemplateName    string
	TemplateVersion int
	Corrections     int
	Tokens          int64
	Transcript      string
}

JudgeMeta holds metadata passed to the judge prompt.

type JudgeOutput

type JudgeOutput struct {
	Reasoning  string   `json:"reasoning"`
	KeyPoints  []string `json:"key_points"`
	NewSkill   string   `json:"new_skill"`
	TaskType   string   `json:"task_type"`
	Confidence float64  `json:"confidence"`
}

JudgeOutput is the structured response from the LLM judge model.

type PromptTemplate

type PromptTemplate struct {
	ID        string
	Name      string
	Section   string
	Content   string
	Version   int
	IsDefault bool
}

PromptTemplate represents a versioned prompt template variant.

type RewardResult

type RewardResult struct {
	Total            float64
	SuccessScore     float64
	EfficiencyScore  float64
	PromptTokens     int64
	CompletionTokens int64
	MessageCount     int64
	UserCorrections  int
}

RewardResult holds the decomposed reward calculation for a session.

type Service

type Service interface {
	// EvaluateSession triggers evaluation of a completed session (async if configured).
	EvaluateSession(ctx context.Context, sessionID string) error

	// SelectTemplate returns the best prompt template for a section using UCB.
	// Returns nil if insufficient history or evaluator disabled.
	SelectTemplate(ctx context.Context, sectionName string) (*PromptTemplate, error)

	// GetActiveSkills returns skills to inject into prompts for a given task type.
	GetActiveSkills(ctx context.Context, taskType string) ([]Skill, error)

	// GetStats returns current UCB rankings and skill library summary.
	GetStats(ctx context.Context) (*Stats, error)

	// IsEnabled returns whether the evaluator is active.
	IsEnabled() bool

	// RecordTemplateSelection records which template was selected for a session.
	RecordTemplateSelection(ctx context.Context, sessionID, templateID string)

	// ClassifyTask returns a task type label from the user's first message.
	// Returns "general" if no pattern matches.
	ClassifyTask(text string) string
}

Service defines the evaluator interface used by other packages.

type Skill

type Skill struct {
	ID          string
	Title       string
	Content     string
	TaskType    string
	SuccessRate float64
	UsageCount  int
}

Skill represents a learned optimization rule from the Skill Library.

type Stats

type Stats struct {
	TotalEvaluations int
	Templates        []TemplateStats
	SkillCount       int
	TopSkills        []Skill
	AvgReward        float64
	LastEvaluation   time.Time
	IsEnabled        bool
}

Stats is the overall self-improvement system statistics.

type TemplateStats

type TemplateStats struct {
	Template  PromptTemplate
	TimesUsed int
	AvgReward float64
	UCBScore  float64
	Rank      int
}

TemplateStats holds UCB statistics for a template (for TUI display).

type ToolInfo added in v0.407.0

type ToolInfo struct {
	Name        string
	Description string
}

ToolInfo carries the minimal information the trimmer needs to describe a tool.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL