evaluator

package
v0.60.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 24, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Index

Constants

View Source
const SelectedTemplateKey contextKey = "selected_template_id"

SelectedTemplateKey is used to store the selected template ID in context.

Variables

This section is empty.

Functions

func UCBScore

func UCBScore(avgReward float64, totalSessions, timesUsed int, explorationC float64) float64

UCBScore computes the UCB1 value for a template. UCB_i = avg_reward_i + c * sqrt(ln(N) / n_i) where N = total sessions evaluated, n_i = times template i was used.

Types

type EvaluatorService

type EvaluatorService struct {
	// contains filtered or unexported fields
}

EvaluatorService is the concrete implementation of Service.

func New

New creates a new EvaluatorService. Returns nil if disabled.

func (*EvaluatorService) EvaluateSession

func (s *EvaluatorService) EvaluateSession(ctx context.Context, sessionID string) error

EvaluateSession triggers evaluation of a completed session.

func (*EvaluatorService) GetActiveSkills

func (s *EvaluatorService) GetActiveSkills(ctx context.Context, taskType string) ([]Skill, error)

GetActiveSkills returns active skills for a task type.

func (*EvaluatorService) GetStats

func (s *EvaluatorService) GetStats(ctx context.Context) (*Stats, error)

GetStats returns system statistics for TUI display.

func (*EvaluatorService) IsEnabled

func (s *EvaluatorService) IsEnabled() bool

IsEnabled returns whether the evaluator is active.

func (*EvaluatorService) RecordTemplateSelection

func (s *EvaluatorService) RecordTemplateSelection(_ context.Context, sessionID, templateID string)

RecordTemplateSelection stores the template used in this session for later evaluation.

func (*EvaluatorService) SelectTemplate

func (s *EvaluatorService) SelectTemplate(ctx context.Context, sectionName string) (*PromptTemplate, error)

SelectTemplate returns the best template for a section using UCB.

type JudgeMeta

type JudgeMeta struct {
	TemplateName    string
	TemplateVersion int
	Corrections     int
	Tokens          int64
	Transcript      string
}

JudgeMeta holds metadata passed to the judge prompt.

type JudgeOutput

type JudgeOutput struct {
	Reasoning  string   `json:"reasoning"`
	KeyPoints  []string `json:"key_points"`
	NewSkill   string   `json:"new_skill"`
	TaskType   string   `json:"task_type"`
	Confidence float64  `json:"confidence"`
}

JudgeOutput is the structured response from the LLM judge model.

type PromptTemplate

type PromptTemplate struct {
	ID        string
	Name      string
	Section   string
	Content   string
	Version   int
	IsDefault bool
}

PromptTemplate represents a versioned prompt template variant.

type RewardResult

type RewardResult struct {
	Total            float64
	SuccessScore     float64
	EfficiencyScore  float64
	PromptTokens     int64
	CompletionTokens int64
	MessageCount     int64
	UserCorrections  int
}

RewardResult holds the decomposed reward calculation for a session.

type Service

type Service interface {
	// EvaluateSession triggers evaluation of a completed session (async if configured).
	EvaluateSession(ctx context.Context, sessionID string) error

	// SelectTemplate returns the best prompt template for a section using UCB.
	// Returns nil if insufficient history or evaluator disabled.
	SelectTemplate(ctx context.Context, sectionName string) (*PromptTemplate, error)

	// GetActiveSkills returns skills to inject into prompts for a given task type.
	GetActiveSkills(ctx context.Context, taskType string) ([]Skill, error)

	// GetStats returns current UCB rankings and skill library summary.
	GetStats(ctx context.Context) (*Stats, error)

	// IsEnabled returns whether the evaluator is active.
	IsEnabled() bool

	// RecordTemplateSelection records which template was selected for a session.
	RecordTemplateSelection(ctx context.Context, sessionID, templateID string)
}

Service defines the evaluator interface used by other packages.

type Skill

type Skill struct {
	ID          string
	Title       string
	Content     string
	TaskType    string
	SuccessRate float64
	UsageCount  int
}

Skill represents a learned optimization rule from the Skill Library.

type Stats

type Stats struct {
	TotalEvaluations int
	Templates        []TemplateStats
	SkillCount       int
	TopSkills        []Skill
	AvgReward        float64
	LastEvaluation   time.Time
	IsEnabled        bool
}

Stats is the overall self-improvement system statistics.

type TemplateStats

type TemplateStats struct {
	Template  PromptTemplate
	TimesUsed int
	AvgReward float64
	UCBScore  float64
	Rank      int
}

TemplateStats holds UCB statistics for a template (for TUI display).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL