evaluator

package

v0.407.0 Latest Latest Go to latest Published: May 30, 2026 License: MIT Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/digiogithub/pando

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func UCBScore(avgReward float64, totalSessions, timesUsed int, explorationC float64) float64
type ContextProfile
type ContextTrimmer
- func NewContextTrimmer(cfg config.EvaluatorConfig, j *Judge) *ContextTrimmer
- func (ct *ContextTrimmer) ProfileTask(ctx context.Context, firstMessage string, availableTools []ToolInfo) (*ContextProfile, error)
type EvaluatorService
- func New(cfg config.EvaluatorConfig, q db.Querier, msgs message.Service) (*EvaluatorService, error)
- func (s *EvaluatorService) ClassifyTask(text string) string
- func (s *EvaluatorService) EvaluateSession(ctx context.Context, sessionID string) error
- func (s *EvaluatorService) GetActiveSkills(ctx context.Context, taskType string) ([]Skill, error)
- func (s *EvaluatorService) GetStats(ctx context.Context) (*Stats, error)
- func (s *EvaluatorService) IsEnabled() bool
- func (s *EvaluatorService) NewContextTrimmer() *ContextTrimmer
- func (s *EvaluatorService) RecordTemplateSelection(_ context.Context, sessionID, templateID string)
- func (s *EvaluatorService) SelectTemplate(ctx context.Context, sectionName string) (*PromptTemplate, error)
type Judge
- func (j *Judge) Evaluate(ctx context.Context, meta JudgeMeta, customPromptTemplate string) (*JudgeOutput, error)
type JudgeMeta
type JudgeOutput
type PromptTemplate
type RewardResult
type Service
type Skill
type Stats
type TemplateStats
type ToolInfo

Constants ¶

View Source

const ContextProfileKey contextKey = "context_profile"

ContextProfileKey is used to store a ContextProfile in the request context.

View Source

const SelectedTemplateKey contextKey = "selected_template_id"

SelectedTemplateKey is used to store the selected template ID in context.

Variables ¶

This section is empty.

Functions ¶

func UCBScore ¶

func UCBScore(avgReward float64, totalSessions, timesUsed int, explorationC float64) float64

UCBScore computes the UCB1 value for a template. UCB_i = avg_reward_i + c * sqrt(ln(N) / n_i) where N = total sessions evaluated, n_i = times template i was used.

Types ¶

type ContextProfile ¶ added in v0.407.0

type ContextProfile struct {
	// TaskType is the classified task type (e.g. "code", "debug", "refactor").
	TaskType string
	// RelevantToolNames lists the tool names to keep for this task.
	// An empty slice means keep all tools (no filtering).
	RelevantToolNames []string
	// SkipSections lists prompt section names that are not needed for this task.
	SkipSections []string
	// Confidence is a 0.0–1.0 measure of the trimmer's certainty.
	// Profiles with Confidence < 0.5 should be ignored and defaults used.
	Confidence float64
}

ContextProfile is produced by the ContextTrimmer at session start. It guides the PromptBuilder and agent tool assembly to include only what is relevant for the current task.

type ContextTrimmer ¶ added in v0.407.0

type ContextTrimmer struct {
	// contains filtered or unexported fields
}

ContextTrimmer uses a cheap LLM to produce a ContextProfile at session start. It complements the post-session Judge: instead of "what went wrong?", it asks "what is needed for this task?".

func NewContextTrimmer ¶ added in v0.407.0

func NewContextTrimmer(cfg config.EvaluatorConfig, j *Judge) *ContextTrimmer

NewContextTrimmer creates a ContextTrimmer that reuses the evaluator's judge infrastructure. Returns nil if the evaluator has no judge configured.

func (*ContextTrimmer) ProfileTask ¶ added in v0.407.0

func (ct *ContextTrimmer) ProfileTask(
	ctx context.Context,
	firstMessage string,
	availableTools []ToolInfo,
) (*ContextProfile, error)

ProfileTask analyzes the user's first message and returns a ContextProfile describing which tools and prompt sections are relevant for the task.

The LLM call is bounded by a 3-second timeout; on timeout or error the method returns nil so callers can fall back to defaults. Results are cached by a hash of (firstMessage + toolNames) to avoid redundant LLM calls.

type EvaluatorService ¶

type EvaluatorService struct {
	// contains filtered or unexported fields
}

EvaluatorService is the concrete implementation of Service.

func New ¶

func New(cfg config.EvaluatorConfig, q db.Querier, msgs message.Service) (*EvaluatorService, error)

New creates a new EvaluatorService. Returns nil if disabled.

func (*EvaluatorService) ClassifyTask ¶ added in v0.407.0

func (s *EvaluatorService) ClassifyTask(text string) string

ClassifyTask returns a task type label from the user's first message. It uses compiled patterns from config, evaluated in order; first match wins. Returns "general" if no pattern matches or the text is empty.

func (*EvaluatorService) EvaluateSession ¶

func (s *EvaluatorService) EvaluateSession(ctx context.Context, sessionID string) error

EvaluateSession triggers evaluation of a completed session.

func (*EvaluatorService) GetActiveSkills ¶

func (s *EvaluatorService) GetActiveSkills(ctx context.Context, taskType string) ([]Skill, error)

GetActiveSkills returns active skills for a task type.

func (*EvaluatorService) GetStats ¶

func (s *EvaluatorService) GetStats(ctx context.Context) (*Stats, error)

GetStats returns system statistics for TUI display.

func (*EvaluatorService) IsEnabled ¶

func (s *EvaluatorService) IsEnabled() bool

IsEnabled returns whether the evaluator is active.

func (*EvaluatorService) NewContextTrimmer ¶ added in v0.407.0

func (s *EvaluatorService) NewContextTrimmer() *ContextTrimmer

NewContextTrimmer creates a ContextTrimmer backed by this service's judge infrastructure. Returns nil if the evaluator has no judge configured or if the evaluator itself is nil.

func (*EvaluatorService) RecordTemplateSelection ¶

func (s *EvaluatorService) RecordTemplateSelection(_ context.Context, sessionID, templateID string)

RecordTemplateSelection stores the template used in this session for later evaluation.

func (*EvaluatorService) SelectTemplate ¶

func (s *EvaluatorService) SelectTemplate(ctx context.Context, sectionName string) (*PromptTemplate, error)

SelectTemplate returns the best template for a section using UCB.

type Judge ¶ added in v0.244.0

type Judge struct {
	// contains filtered or unexported fields
}

Judge calls an LLM model to evaluate session quality.

func (*Judge) Evaluate ¶ added in v0.244.0

func (j *Judge) Evaluate(ctx context.Context, meta JudgeMeta, customPromptTemplate string) (*JudgeOutput, error)

Evaluate calls the judge model with the session transcript and returns structured output.

type JudgeMeta ¶

type JudgeMeta struct {
	TemplateName    string
	TemplateVersion int
	Corrections     int
	Tokens          int64
	Transcript      string
}

JudgeMeta holds metadata passed to the judge prompt.

type JudgeOutput ¶

type JudgeOutput struct {
	Reasoning  string   `json:"reasoning"`
	KeyPoints  []string `json:"key_points"`
	NewSkill   string   `json:"new_skill"`
	TaskType   string   `json:"task_type"`
	Confidence float64  `json:"confidence"`
}

JudgeOutput is the structured response from the LLM judge model.

type PromptTemplate ¶

type PromptTemplate struct {
	ID        string
	Name      string
	Section   string
	Content   string
	Version   int
	IsDefault bool
}

PromptTemplate represents a versioned prompt template variant.

type RewardResult ¶

type RewardResult struct {
	Total            float64
	SuccessScore     float64
	EfficiencyScore  float64
	PromptTokens     int64
	CompletionTokens int64
	MessageCount     int64
	UserCorrections  int
}

RewardResult holds the decomposed reward calculation for a session.

type Service ¶

type Service interface {
	// EvaluateSession triggers evaluation of a completed session (async if configured).
	EvaluateSession(ctx context.Context, sessionID string) error

	// SelectTemplate returns the best prompt template for a section using UCB.
	// Returns nil if insufficient history or evaluator disabled.
	SelectTemplate(ctx context.Context, sectionName string) (*PromptTemplate, error)

	// GetActiveSkills returns skills to inject into prompts for a given task type.
	GetActiveSkills(ctx context.Context, taskType string) ([]Skill, error)

	// GetStats returns current UCB rankings and skill library summary.
	GetStats(ctx context.Context) (*Stats, error)

	// IsEnabled returns whether the evaluator is active.
	IsEnabled() bool

	// RecordTemplateSelection records which template was selected for a session.
	RecordTemplateSelection(ctx context.Context, sessionID, templateID string)

	// ClassifyTask returns a task type label from the user's first message.
	// Returns "general" if no pattern matches.
	ClassifyTask(text string) string
}

Service defines the evaluator interface used by other packages.

type Skill ¶

type Skill struct {
	ID          string
	Title       string
	Content     string
	TaskType    string
	SuccessRate float64
	UsageCount  int
}

Skill represents a learned optimization rule from the Skill Library.

type Stats ¶

type Stats struct {
	TotalEvaluations int
	Templates        []TemplateStats
	SkillCount       int
	TopSkills        []Skill
	AvgReward        float64
	LastEvaluation   time.Time
	IsEnabled        bool
}

Stats is the overall self-improvement system statistics.

type TemplateStats ¶

type TemplateStats struct {
	Template  PromptTemplate
	TimesUsed int
	AvgReward float64
	UCBScore  float64
	Rank      int
}

TemplateStats holds UCB statistics for a template (for TUI display).

type ToolInfo ¶ added in v0.407.0

type ToolInfo struct {
	Name        string
	Description string
}

ToolInfo carries the minimal information the trimmer needs to describe a tool.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL