review

package

v0.3.0 Latest Latest Go to latest Published: May 26, 2026 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/GrayCodeAI/hawk

Links

Open Source Insights

Documentation ¶

Overview ¶

Package review is the Stage-1 namespace for self-review / critique / quality scoring types in package engine. See ../REFACTOR_PLAN.md.

Index ¶

func CompareApproaches(solutions []Solution) string
func DefaultScoreFn(solution string) float64
func FormatConsensus(result *ConsensusResult) string
func FormatInline(comments []ReviewComment) string
func FormatReport(report *ReviewReport) string
func FormatReview(result *ReviewResult) string
func FormatSelfAssessment(a *Assessment) string
func PairwiseSimilarity(a, b string) float64
func ScoreByCompleteness(content string) float64
func ScoreByLength(content string) float64
func ShouldRetry(solutions []Solution) bool
type Assessment
type Bot
- func NewBot() *Bot
type Comment
type ConsensusResult
type ConsensusSampler
- func NewConsensusSampler(numSamples int) *ConsensusSampler
- func (cs *ConsensusSampler) SampleSolutions(ctx context.Context, prompt string, ...) (*ConsensusResult, error)
type Critic
- func NewCritic(model string) *Critic
- func (c *Critic) BuildPrompt(original, patched, intent string) string
- func (c *Critic) Model() string
- func (c *Critic) ParseVerdict(response string) *PatchVerdict
- func (c *Critic) PreScreenPatch(originalContent, patchedContent, intent string) *PatchVerdict
- func (c *Critic) ShouldBlock(verdict *PatchVerdict) bool
type PatchVerdict
type QualityScorer
- func NewQualityScorer() *QualityScorer
- func (qs *QualityScorer) AverageScore(n int) float64
- func (qs *QualityScorer) FormatReport(last int) string
- func (qs *QualityScorer) GenerateFeedback(scored *ScoredResponse) []string
- func (qs *QualityScorer) Score(ctx ResponseContext) *ScoredResponse
- func (qs *QualityScorer) TrendAnalysis() string
type Report
type ResponseContext
type ReviewBot
- func NewReviewBot() *ReviewBot
- func (rb *ReviewBot) ReviewDiff(diffInput string) (*ReviewReport, error)
- func (rb *ReviewBot) ReviewFile(path, content string) (*ReviewReport, error)
type ReviewComment
- func FilterBySeverity(comments []ReviewComment, minSeverity string) []ReviewComment
type ReviewReport
type ReviewResult
type ReviewRule
type Rule
type Sample
- func BestScore(samples []Sample) *Sample
- func MajorityVote(samples []Sample) *Sample
- func Synthesize(samples []Sample) *Sample
type ScoreWeights
- func DefaultWeights() ScoreWeights
type ScoredResponse
type SelfAssessor
- func NewSelfAssessor() *SelfAssessor
- func (sa *SelfAssessor) Assess(ctx TaskContext) *Assessment
- func (sa *SelfAssessor) AverageScore(n int) float64
- func (sa *SelfAssessor) GetTrend(dimension string) string
- func (sa *SelfAssessor) IdentifyStrengths(ctx TaskContext) []string
- func (sa *SelfAssessor) IdentifyWeaknesses(ctx TaskContext) []string
- func (sa *SelfAssessor) SuggestImprovements(ctx TaskContext) []string
type Solution
type SolutionReviewer
- func NewSolutionReviewer(maxAttempts int) *SolutionReviewer
- func (sr *SolutionReviewer) ReviewAndSelect(ctx context.Context, task string, ...) (*ReviewResult, error)
- func (sr *SolutionReviewer) ScoreSolution(solution *Solution) float64
type TaskContext

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CompareApproaches ¶

func CompareApproaches(solutions []Solution) string

CompareApproaches analyzes how different attempts approached the problem and returns a human-readable comparison.

func DefaultScoreFn ¶

func DefaultScoreFn(solution string) float64

DefaultScoreFn combines length and completeness scoring.

func FormatConsensus ¶

func FormatConsensus(result *ConsensusResult) string

FormatConsensus produces a human-readable summary of the consensus result.

func FormatInline ¶

func FormatInline(comments []ReviewComment) string

FormatInline produces GitHub-style inline review comments.

func FormatReport ¶

func FormatReport(report *ReviewReport) string

FormatReport produces a human-readable summary of a review report.

func FormatReview ¶

func FormatReview(result *ReviewResult) string

FormatReview produces a formatted summary of the review result.

func FormatSelfAssessment ¶

func FormatSelfAssessment(a *Assessment) string

FormatSelfAssessment produces a human-readable summary of an assessment.

func PairwiseSimilarity ¶

func PairwiseSimilarity(a, b string) float64

PairwiseSimilarity computes the Jaccard similarity between two strings based on their word sets.

func ScoreByCompleteness ¶

func ScoreByCompleteness(content string) float64

ScoreByCompleteness scores based on structural indicators: code blocks, file mentions, numbered steps.

func ScoreByLength ¶

func ScoreByLength(content string) float64

ScoreByLength scores content based on reasonable length. Too short or too long content gets penalized.

func ShouldRetry ¶

func ShouldRetry(solutions []Solution) bool

ShouldRetry determines whether additional attempts should be made. Returns true if the best score so far is below 0.7.

Types ¶

type Assessment ¶

type Assessment struct {
	Score        float64            // overall score 0.0 - 1.0
	Dimensions   map[string]float64 // per-dimension scores
	Strengths    []string           // things that went well
	Weaknesses   []string           // things that went poorly
	Improvements []string           // actionable suggestions for next time
	TaskType     string             // classification of the task
	Timestamp    time.Time          // when the assessment was made
}

Assessment captures the result of a self-evaluation after completing a task.

type Bot ¶

type Bot = ReviewBot

Bot is the rule-driven review bot for diffs.

func NewBot ¶

func NewBot() *Bot

NewBot returns a fresh review bot with the default rule set.

type ConsensusResult ¶

type ConsensusResult struct {
	Winner     *Sample
	AllSamples []Sample
	Agreement  float64
	Method     string
}

ConsensusResult holds the outcome of multi-sample consensus.

type ConsensusSampler ¶

type ConsensusSampler struct {
	NumSamples int
	Strategy   string // "majority", "best_score", "synthesize"
	ScoreFn    func(solution string) float64
	// contains filtered or unexported fields
}

ConsensusSampler implements the multi-sample consensus pattern inspired by SWE-agent's "Ask Colleagues" approach: generate N solutions in parallel, then select the best one using a configurable strategy.

func NewConsensusSampler ¶

func NewConsensusSampler(numSamples int) *ConsensusSampler

NewConsensusSampler creates a ConsensusSampler with the given number of samples. If numSamples is <= 0, it defaults to 3.

func (*ConsensusSampler) SampleSolutions ¶

func (cs *ConsensusSampler) SampleSolutions(ctx context.Context, prompt string, generateFn func(context.Context, string) (string, error)) (*ConsensusResult, error)

SampleSolutions generates N solutions in parallel, scores each, and selects a winner based on the configured strategy.

type Critic ¶

type Critic struct {
	// contains filtered or unexported fields
}

Critic provides fast pre-validation of patches using a cheap model before expensive execution. It generates a prompt for the cheap model and parses the response into a structured verdict.

func NewCritic ¶

func NewCritic(model string) *Critic

NewCritic creates a new critic that will use the given model for screening.

func (*Critic) BuildPrompt ¶

func (c *Critic) BuildPrompt(original, patched, intent string) string

BuildPrompt constructs a prompt for the cheap model to evaluate a patch.

func (*Critic) Model ¶

func (c *Critic) Model() string

Model returns the model name used for pre-screening.

func (*Critic) ParseVerdict ¶

func (c *Critic) ParseVerdict(response string) *PatchVerdict

ParseVerdict parses a model response into a structured PatchVerdict.

func (*Critic) PreScreenPatch ¶

func (c *Critic) PreScreenPatch(originalContent, patchedContent, intent string) *PatchVerdict

PreScreenPatch asks the cheap model whether a patch looks correct given the stated intent. It builds a prompt, and returns a verdict. In this implementation, the caller is expected to send the prompt to the model and pass the response to ParseVerdict. This method constructs a PatchVerdict based on a simple heuristic comparison when no model call is available.

func (*Critic) ShouldBlock ¶

func (c *Critic) ShouldBlock(verdict *PatchVerdict) bool

ShouldBlock returns true if the verdict indicates the patch should be blocked (verdict is "incorrect" with confidence > 0.8).

type PatchVerdict ¶

type PatchVerdict struct {
	Likely     string   // "correct", "incorrect", "uncertain"
	Issues     []string // specific issues found
	Confidence float64  // 0-1
}

PatchVerdict is the result of a critic's pre-screening of a patch.

type QualityScorer ¶

type QualityScorer struct {
	Weights ScoreWeights
	History []ScoredResponse
	// contains filtered or unexported fields
}

QualityScorer evaluates LLM response quality across multiple dimensions and provides feedback for the self-improvement loop.

func NewQualityScorer ¶

func NewQualityScorer() *QualityScorer

NewQualityScorer creates a QualityScorer with default weights.

func (*QualityScorer) AverageScore ¶

func (qs *QualityScorer) AverageScore(n int) float64

AverageScore computes the average composite score over the last n responses.

func (*QualityScorer) FormatReport ¶

func (qs *QualityScorer) FormatReport(last int) string

FormatReport generates a formatted quality report for the last n responses.

func (*QualityScorer) GenerateFeedback ¶

func (qs *QualityScorer) GenerateFeedback(scored *ScoredResponse) []string

GenerateFeedback produces human-readable suggestions based on the scored response.

func (*QualityScorer) Score ¶

func (qs *QualityScorer) Score(ctx ResponseContext) *ScoredResponse

Score evaluates a response across all quality dimensions and returns a composite result.

func (*QualityScorer) TrendAnalysis ¶

func (qs *QualityScorer) TrendAnalysis() string

TrendAnalysis returns a human-readable description of quality trends.

type Report ¶

type Report = ReviewReport

Report aggregates Comments for a single review run.

type ResponseContext ¶

type ResponseContext struct {
	UserPrompt        string
	AssistantResponse string
	ToolCallCount     int
	ToolErrors        int
	FilesModified     []string
	TestsPassed       bool
	LintPassed        bool
	TokensUsed        int
	Duration          time.Duration
}

ResponseContext provides the context needed to evaluate a response's quality.

type ReviewBot ¶

type ReviewBot struct {
	Rules    []ReviewRule
	Severity string // minimum severity to report: "error", "warning", "info"
	// contains filtered or unexported fields
}

ReviewBot is a rule-based code review engine that produces structured feedback without requiring an LLM call.

func NewReviewBot ¶

func NewReviewBot() *ReviewBot

NewReviewBot creates a ReviewBot pre-loaded with 20+ built-in rules.

func (*ReviewBot) ReviewDiff ¶

func (rb *ReviewBot) ReviewDiff(diffInput string) (*ReviewReport, error)

ReviewDiff parses a unified diff and reviews only changed lines.

func (*ReviewBot) ReviewFile ¶

func (rb *ReviewBot) ReviewFile(path, content string) (*ReviewReport, error)

ReviewFile reviews a full file's content.

type ReviewComment ¶

type ReviewComment struct {
	File       string
	Line       int
	Severity   string // "error", "warning", "info"
	Category   string
	Message    string
	Suggestion string
	RuleID     string
}

ReviewComment represents a single piece of review feedback.

func FilterBySeverity ¶

func FilterBySeverity(comments []ReviewComment, minSeverity string) []ReviewComment

FilterBySeverity returns only comments at or above the specified minimum severity.

type ReviewReport ¶

type ReviewReport struct {
	Comments      []ReviewComment
	FilesReviewed int
	IssuesFound   int
	BySeverity    map[string]int
	Duration      time.Duration
}

ReviewReport summarizes the results of a code review.

type ReviewResult ¶

type ReviewResult struct {
	Best          *Solution
	All           []Solution
	Attempts      int
	TotalDuration time.Duration
	TotalTokens   int
	Agreement     float64
}

ReviewResult holds the outcome of the multi-attempt review process.

type ReviewRule ¶

type ReviewRule struct {
	ID       string
	Name     string
	Category string // "security", "performance", "correctness", "style", "testing"
	Language string
	Check    func(file string, lines []string, diffLines []diff.DiffLine) []ReviewComment
}

ReviewRule defines a single review check that can be applied to code.

type Rule ¶

type Rule = ReviewRule

Rule is a single check in a Bot's rule set.

type Sample ¶

type Sample struct {
	ID         int
	Content    string
	Score      float64
	Duration   time.Duration
	TokensUsed int
}

Sample represents a single generated solution with metadata.

func BestScore ¶

func BestScore(samples []Sample) *Sample

BestScore picks the highest-scored solution.

func MajorityVote ¶

func MajorityVote(samples []Sample) *Sample

MajorityVote finds the most similar/common solution using pairwise similarity. The sample with the highest average similarity to all others wins.

func Synthesize ¶

func Synthesize(samples []Sample) *Sample

Synthesize combines elements from all solutions, weighted by score. It selects unique paragraphs from higher-scoring samples first.

type ScoreWeights ¶

type ScoreWeights struct {
	Completeness float64 // did it address the full request?
	Correctness  float64 // is the code syntactically valid?
	Conciseness  float64 // not overly verbose?
	ToolUsage    float64 // efficient use of tools?
	Safety       float64 // no dangerous operations?
}

ScoreWeights defines the relative importance of each quality dimension. All values should be in [0,1] and sum to 1.

func DefaultWeights ¶

func DefaultWeights() ScoreWeights

DefaultWeights returns a balanced set of scoring weights.

type ScoredResponse ¶

type ScoredResponse struct {
	Score     float64            // 0-1 overall composite score
	Breakdown map[string]float64 // per-dimension scores
	Feedback  []string           // human-readable improvement suggestions
	Timestamp time.Time
	Model     string
	TaskType  string
}

ScoredResponse holds the quality evaluation of a single LLM response.

type SelfAssessor ¶

type SelfAssessor struct {
	History []Assessment
	// contains filtered or unexported fields
}

SelfAssessor evaluates agent performance after each task and tracks trends.

func NewSelfAssessor ¶

func NewSelfAssessor() *SelfAssessor

NewSelfAssessor creates a new SelfAssessor with an empty history.

func (*SelfAssessor) Assess ¶

func (sa *SelfAssessor) Assess(ctx TaskContext) *Assessment

Assess evaluates the agent's performance on a completed task across multiple dimensions and records the assessment in history.

func (*SelfAssessor) AverageScore ¶

func (sa *SelfAssessor) AverageScore(n int) float64

AverageScore computes the average overall score of the last n assessments. If n is 0 or exceeds history length, all assessments are averaged.

func (*SelfAssessor) GetTrend ¶

func (sa *SelfAssessor) GetTrend(dimension string) string

GetTrend analyzes the trend of a given dimension over the last 10 assessments. Returns "improving", "stable", or "declining".

func (*SelfAssessor) IdentifyStrengths ¶

func (sa *SelfAssessor) IdentifyStrengths(ctx TaskContext) []string

IdentifyStrengths returns a list of things that went well.

func (*SelfAssessor) IdentifyWeaknesses ¶

func (sa *SelfAssessor) IdentifyWeaknesses(ctx TaskContext) []string

IdentifyWeaknesses returns a list of things that went poorly.

func (*SelfAssessor) SuggestImprovements ¶

func (sa *SelfAssessor) SuggestImprovements(ctx TaskContext) []string

SuggestImprovements returns actionable suggestions for future tasks.

type Solution ¶

type Solution struct {
	ID            int
	Content       string
	Score         float64
	Duration      time.Duration
	TokensUsed    int
	Errors        []string
	FilesModified []string
}

Solution represents a single attempted solution with metadata.

type SolutionReviewer ¶

type SolutionReviewer struct {
	MaxAttempts int
	ScoreFn     func(solution string) float64
	// contains filtered or unexported fields
}

SolutionReviewer implements a multi-attempt solution review pattern inspired by SWE-agent's reviewer: run the agent N times, score each solution, and select the best one. This improves reliability by sampling multiple approaches.

func NewSolutionReviewer ¶

func NewSolutionReviewer(maxAttempts int) *SolutionReviewer

NewSolutionReviewer creates a SolutionReviewer with the given max attempts. If maxAttempts is <= 0, it defaults to 3.

func (*SolutionReviewer) ReviewAndSelect ¶

func (sr *SolutionReviewer) ReviewAndSelect(ctx context.Context, task string, solveFn func(context.Context, string) (*Solution, error)) (*ReviewResult, error)

ReviewAndSelect runs solveFn up to MaxAttempts times, scores each solution, selects the best one, and calculates agreement across attempts.

func (*SolutionReviewer) ScoreSolution ¶

func (sr *SolutionReviewer) ScoreSolution(solution *Solution) float64

ScoreSolution evaluates a solution using default scoring criteria:

Has code changes (+0.3)
No errors (+0.3)
Reasonable length (+0.2)
Files modified (+0.2)

type TaskContext ¶

type TaskContext struct {
	Goal          string
	ToolCalls     int
	Errors        int
	Retries       int
	FilesModified int
	TestsPassed   bool
	Duration      time.Duration
	TokensUsed    int
	UserFeedback  string
}

TaskContext captures all relevant metrics from a completed task for assessment.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL