review

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 26, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package review is the Stage-1 namespace for self-review / critique / quality scoring types in package engine. See ../REFACTOR_PLAN.md.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CompareApproaches

func CompareApproaches(solutions []Solution) string

CompareApproaches analyzes how different attempts approached the problem and returns a human-readable comparison.

func DefaultScoreFn

func DefaultScoreFn(solution string) float64

DefaultScoreFn combines length and completeness scoring.

func FormatConsensus

func FormatConsensus(result *ConsensusResult) string

FormatConsensus produces a human-readable summary of the consensus result.

func FormatInline

func FormatInline(comments []ReviewComment) string

FormatInline produces GitHub-style inline review comments.

func FormatReport

func FormatReport(report *ReviewReport) string

FormatReport produces a human-readable summary of a review report.

func FormatReview

func FormatReview(result *ReviewResult) string

FormatReview produces a formatted summary of the review result.

func FormatSelfAssessment

func FormatSelfAssessment(a *Assessment) string

FormatSelfAssessment produces a human-readable summary of an assessment.

func PairwiseSimilarity

func PairwiseSimilarity(a, b string) float64

PairwiseSimilarity computes the Jaccard similarity between two strings based on their word sets.

func ScoreByCompleteness

func ScoreByCompleteness(content string) float64

ScoreByCompleteness scores based on structural indicators: code blocks, file mentions, numbered steps.

func ScoreByLength

func ScoreByLength(content string) float64

ScoreByLength scores content based on reasonable length. Too short or too long content gets penalized.

func ShouldRetry

func ShouldRetry(solutions []Solution) bool

ShouldRetry determines whether additional attempts should be made. Returns true if the best score so far is below 0.7.

Types

type Assessment

type Assessment struct {
	Score        float64            // overall score 0.0 - 1.0
	Dimensions   map[string]float64 // per-dimension scores
	Strengths    []string           // things that went well
	Weaknesses   []string           // things that went poorly
	Improvements []string           // actionable suggestions for next time
	TaskType     string             // classification of the task
	Timestamp    time.Time          // when the assessment was made
}

Assessment captures the result of a self-evaluation after completing a task.

type Bot

type Bot = ReviewBot

Bot is the rule-driven review bot for diffs.

func NewBot

func NewBot() *Bot

NewBot returns a fresh review bot with the default rule set.

type Comment

type Comment = ReviewComment

Comment is one finding emitted by a Bot.

type ConsensusResult

type ConsensusResult struct {
	Winner     *Sample
	AllSamples []Sample
	Agreement  float64
	Method     string
}

ConsensusResult holds the outcome of multi-sample consensus.

type ConsensusSampler

type ConsensusSampler struct {
	NumSamples int
	Strategy   string // "majority", "best_score", "synthesize"
	ScoreFn    func(solution string) float64
	// contains filtered or unexported fields
}

ConsensusSampler implements the multi-sample consensus pattern inspired by SWE-agent's "Ask Colleagues" approach: generate N solutions in parallel, then select the best one using a configurable strategy.

func NewConsensusSampler

func NewConsensusSampler(numSamples int) *ConsensusSampler

NewConsensusSampler creates a ConsensusSampler with the given number of samples. If numSamples is <= 0, it defaults to 3.

func (*ConsensusSampler) SampleSolutions

func (cs *ConsensusSampler) SampleSolutions(ctx context.Context, prompt string, generateFn func(context.Context, string) (string, error)) (*ConsensusResult, error)

SampleSolutions generates N solutions in parallel, scores each, and selects a winner based on the configured strategy.

type Critic

type Critic struct {
	// contains filtered or unexported fields
}

Critic provides fast pre-validation of patches using a cheap model before expensive execution. It generates a prompt for the cheap model and parses the response into a structured verdict.

func NewCritic

func NewCritic(model string) *Critic

NewCritic creates a new critic that will use the given model for screening.

func (*Critic) BuildPrompt

func (c *Critic) BuildPrompt(original, patched, intent string) string

BuildPrompt constructs a prompt for the cheap model to evaluate a patch.

func (*Critic) Model

func (c *Critic) Model() string

Model returns the model name used for pre-screening.

func (*Critic) ParseVerdict

func (c *Critic) ParseVerdict(response string) *PatchVerdict

ParseVerdict parses a model response into a structured PatchVerdict.

func (*Critic) PreScreenPatch

func (c *Critic) PreScreenPatch(originalContent, patchedContent, intent string) *PatchVerdict

PreScreenPatch asks the cheap model whether a patch looks correct given the stated intent. It builds a prompt, and returns a verdict. In this implementation, the caller is expected to send the prompt to the model and pass the response to ParseVerdict. This method constructs a PatchVerdict based on a simple heuristic comparison when no model call is available.

func (*Critic) ShouldBlock

func (c *Critic) ShouldBlock(verdict *PatchVerdict) bool

ShouldBlock returns true if the verdict indicates the patch should be blocked (verdict is "incorrect" with confidence > 0.8).

type PatchVerdict

type PatchVerdict struct {
	Likely     string   // "correct", "incorrect", "uncertain"
	Issues     []string // specific issues found
	Confidence float64  // 0-1
}

PatchVerdict is the result of a critic's pre-screening of a patch.

type QualityScorer

type QualityScorer struct {
	Weights ScoreWeights
	History []ScoredResponse
	// contains filtered or unexported fields
}

QualityScorer evaluates LLM response quality across multiple dimensions and provides feedback for the self-improvement loop.

func NewQualityScorer

func NewQualityScorer() *QualityScorer

NewQualityScorer creates a QualityScorer with default weights.

func (*QualityScorer) AverageScore

func (qs *QualityScorer) AverageScore(n int) float64

AverageScore computes the average composite score over the last n responses.

func (*QualityScorer) FormatReport

func (qs *QualityScorer) FormatReport(last int) string

FormatReport generates a formatted quality report for the last n responses.

func (*QualityScorer) GenerateFeedback

func (qs *QualityScorer) GenerateFeedback(scored *ScoredResponse) []string

GenerateFeedback produces human-readable suggestions based on the scored response.

func (*QualityScorer) Score

Score evaluates a response across all quality dimensions and returns a composite result.

func (*QualityScorer) TrendAnalysis

func (qs *QualityScorer) TrendAnalysis() string

TrendAnalysis returns a human-readable description of quality trends.

type Report

type Report = ReviewReport

Report aggregates Comments for a single review run.

type ResponseContext

type ResponseContext struct {
	UserPrompt        string
	AssistantResponse string
	ToolCallCount     int
	ToolErrors        int
	FilesModified     []string
	TestsPassed       bool
	LintPassed        bool
	TokensUsed        int
	Duration          time.Duration
}

ResponseContext provides the context needed to evaluate a response's quality.

type ReviewBot

type ReviewBot struct {
	Rules    []ReviewRule
	Severity string // minimum severity to report: "error", "warning", "info"
	// contains filtered or unexported fields
}

ReviewBot is a rule-based code review engine that produces structured feedback without requiring an LLM call.

func NewReviewBot

func NewReviewBot() *ReviewBot

NewReviewBot creates a ReviewBot pre-loaded with 20+ built-in rules.

func (*ReviewBot) ReviewDiff

func (rb *ReviewBot) ReviewDiff(diffInput string) (*ReviewReport, error)

ReviewDiff parses a unified diff and reviews only changed lines.

func (*ReviewBot) ReviewFile

func (rb *ReviewBot) ReviewFile(path, content string) (*ReviewReport, error)

ReviewFile reviews a full file's content.

type ReviewComment

type ReviewComment struct {
	File       string
	Line       int
	Severity   string // "error", "warning", "info"
	Category   string
	Message    string
	Suggestion string
	RuleID     string
}

ReviewComment represents a single piece of review feedback.

func FilterBySeverity

func FilterBySeverity(comments []ReviewComment, minSeverity string) []ReviewComment

FilterBySeverity returns only comments at or above the specified minimum severity.

type ReviewReport

type ReviewReport struct {
	Comments      []ReviewComment
	FilesReviewed int
	IssuesFound   int
	BySeverity    map[string]int
	Duration      time.Duration
}

ReviewReport summarizes the results of a code review.

type ReviewResult

type ReviewResult struct {
	Best          *Solution
	All           []Solution
	Attempts      int
	TotalDuration time.Duration
	TotalTokens   int
	Agreement     float64
}

ReviewResult holds the outcome of the multi-attempt review process.

type ReviewRule

type ReviewRule struct {
	ID       string
	Name     string
	Category string // "security", "performance", "correctness", "style", "testing"
	Language string
	Check    func(file string, lines []string, diffLines []diff.DiffLine) []ReviewComment
}

ReviewRule defines a single review check that can be applied to code.

type Rule

type Rule = ReviewRule

Rule is a single check in a Bot's rule set.

type Sample

type Sample struct {
	ID         int
	Content    string
	Score      float64
	Duration   time.Duration
	TokensUsed int
}

Sample represents a single generated solution with metadata.

func BestScore

func BestScore(samples []Sample) *Sample

BestScore picks the highest-scored solution.

func MajorityVote

func MajorityVote(samples []Sample) *Sample

MajorityVote finds the most similar/common solution using pairwise similarity. The sample with the highest average similarity to all others wins.

func Synthesize

func Synthesize(samples []Sample) *Sample

Synthesize combines elements from all solutions, weighted by score. It selects unique paragraphs from higher-scoring samples first.

type ScoreWeights

type ScoreWeights struct {
	Completeness float64 // did it address the full request?
	Correctness  float64 // is the code syntactically valid?
	Conciseness  float64 // not overly verbose?
	ToolUsage    float64 // efficient use of tools?
	Safety       float64 // no dangerous operations?
}

ScoreWeights defines the relative importance of each quality dimension. All values should be in [0,1] and sum to 1.

func DefaultWeights

func DefaultWeights() ScoreWeights

DefaultWeights returns a balanced set of scoring weights.

type ScoredResponse

type ScoredResponse struct {
	Score     float64            // 0-1 overall composite score
	Breakdown map[string]float64 // per-dimension scores
	Feedback  []string           // human-readable improvement suggestions
	Timestamp time.Time
	Model     string
	TaskType  string
}

ScoredResponse holds the quality evaluation of a single LLM response.

type SelfAssessor

type SelfAssessor struct {
	History []Assessment
	// contains filtered or unexported fields
}

SelfAssessor evaluates agent performance after each task and tracks trends.

func NewSelfAssessor

func NewSelfAssessor() *SelfAssessor

NewSelfAssessor creates a new SelfAssessor with an empty history.

func (*SelfAssessor) Assess

func (sa *SelfAssessor) Assess(ctx TaskContext) *Assessment

Assess evaluates the agent's performance on a completed task across multiple dimensions and records the assessment in history.

func (*SelfAssessor) AverageScore

func (sa *SelfAssessor) AverageScore(n int) float64

AverageScore computes the average overall score of the last n assessments. If n is 0 or exceeds history length, all assessments are averaged.

func (*SelfAssessor) GetTrend

func (sa *SelfAssessor) GetTrend(dimension string) string

GetTrend analyzes the trend of a given dimension over the last 10 assessments. Returns "improving", "stable", or "declining".

func (*SelfAssessor) IdentifyStrengths

func (sa *SelfAssessor) IdentifyStrengths(ctx TaskContext) []string

IdentifyStrengths returns a list of things that went well.

func (*SelfAssessor) IdentifyWeaknesses

func (sa *SelfAssessor) IdentifyWeaknesses(ctx TaskContext) []string

IdentifyWeaknesses returns a list of things that went poorly.

func (*SelfAssessor) SuggestImprovements

func (sa *SelfAssessor) SuggestImprovements(ctx TaskContext) []string

SuggestImprovements returns actionable suggestions for future tasks.

type Solution

type Solution struct {
	ID            int
	Content       string
	Score         float64
	Duration      time.Duration
	TokensUsed    int
	Errors        []string
	FilesModified []string
}

Solution represents a single attempted solution with metadata.

type SolutionReviewer

type SolutionReviewer struct {
	MaxAttempts int
	ScoreFn     func(solution string) float64
	// contains filtered or unexported fields
}

SolutionReviewer implements a multi-attempt solution review pattern inspired by SWE-agent's reviewer: run the agent N times, score each solution, and select the best one. This improves reliability by sampling multiple approaches.

func NewSolutionReviewer

func NewSolutionReviewer(maxAttempts int) *SolutionReviewer

NewSolutionReviewer creates a SolutionReviewer with the given max attempts. If maxAttempts is <= 0, it defaults to 3.

func (*SolutionReviewer) ReviewAndSelect

func (sr *SolutionReviewer) ReviewAndSelect(ctx context.Context, task string, solveFn func(context.Context, string) (*Solution, error)) (*ReviewResult, error)

ReviewAndSelect runs solveFn up to MaxAttempts times, scores each solution, selects the best one, and calculates agreement across attempts.

func (*SolutionReviewer) ScoreSolution

func (sr *SolutionReviewer) ScoreSolution(solution *Solution) float64

ScoreSolution evaluates a solution using default scoring criteria:

  • Has code changes (+0.3)
  • No errors (+0.3)
  • Reasonable length (+0.2)
  • Files modified (+0.2)

type TaskContext

type TaskContext struct {
	Goal          string
	ToolCalls     int
	Errors        int
	Retries       int
	FilesModified int
	TestsPassed   bool
	Duration      time.Duration
	TokensUsed    int
	UserFeedback  string
}

TaskContext captures all relevant metrics from a completed task for assessment.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL