judge

package
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 11, 2026 License: MIT Imports: 15 Imported by: 0

Documentation

Overview

Package judge provides LLM-based quality scoring for skill files. It sends SKILL.md and reference file content to an LLM judge that rates them on dimensions like clarity, actionability, token efficiency, and novelty. Results are cached per provider/model/file to avoid redundant API calls.

Stability

This package is EXPERIMENTAL. Its API may change in minor releases without a major version bump. See the project README for the full stability policy.

Index

Examples

Constants

View Source
const DefaultMaxContentLen = 8000

DefaultMaxContentLen is the default maximum content length sent to the judge (characters). Use 0 to disable truncation.

Variables

This section is empty.

Functions

func CacheDir

func CacheDir(skillDir string) string

CacheDir returns the cache directory path for a given skill directory.

func CacheKey

func CacheKey(provider, model, scoreType, skillContext, filePath string) string

CacheKey generates a deterministic cache key from provider, model, score type, skill context, and file path. Returns the first 16 hex characters of a SHA-256 hash. Using file path (not content) means editing a file and re-running overwrites the same cache entry rather than creating an orphan.

func ContentHash

func ContentHash(content string) string

ContentHash returns a SHA-256 hash of the content for invalidation checks.

func DeserializeScored

func DeserializeScored(r *CachedResult) (types.Scored, error)

DeserializeScored unmarshals a CachedResult's Scores into the appropriate concrete type and returns it as a Scored interface. It uses the Type field to determine whether the result is a skill or reference score, falling back to checking File == "SKILL.md" for compatibility with older cache entries.

func LatestByFile

func LatestByFile(results []*CachedResult) map[string]*CachedResult

LatestByFile returns the most recent cached result for each unique file, across all models. If model is non-empty, filters to that model first.

func RefDimensions

func RefDimensions() []string

RefDimensions returns the dimension names for reference file scoring.

func SaveCache

func SaveCache(cacheDir, key string, result *CachedResult) error

SaveCache writes a result to the cache directory.

func SkillDimensions

func SkillDimensions() []string

SkillDimensions returns the dimension names for SKILL.md scoring.

Types

type CachedResult

type CachedResult struct {
	Provider    string          `json:"provider"`
	Model       string          `json:"model"`
	File        string          `json:"file"`
	Type        string          `json:"type"`
	ContentHash string          `json:"content_hash"`
	ScoredAt    time.Time       `json:"scored_at"`
	Scores      json.RawMessage `json:"scores"`
}

CachedResult holds a scoring result with metadata for cache storage.

func FilterByModel

func FilterByModel(results []*CachedResult, model string) []*CachedResult

FilterByModel returns only results matching the given model name.

func GetCached

func GetCached(cacheDir, key string) (*CachedResult, bool)

GetCached reads a cached result by key. Returns nil, false if not found.

func ListCached

func ListCached(cacheDir string) ([]*CachedResult, error)

ListCached reads all cached results from the cache directory. Results are sorted by ScoredAt (most recent first).

type ClientOptions

type ClientOptions struct {
	Provider          string // "anthropic" or "openai"
	APIKey            string // Required
	BaseURL           string // Optional; defaults per provider
	Model             string // Optional; defaults per provider
	MaxTokensStyle    string // "auto", "max_tokens", or "max_completion_tokens"
	MaxResponseTokens int    // Maximum tokens in the LLM response; 0 defaults to 500
}

ClientOptions holds configuration for creating an LLM client.

type LLMClient

type LLMClient interface {
	// Complete sends a system prompt and user content to the LLM and returns the text response.
	Complete(ctx context.Context, systemPrompt, userContent string) (string, error)
	// Provider returns the provider name (e.g. "anthropic", "openai").
	Provider() string
	// Model returns the model identifier.
	ModelName() string
}

LLMClient is the interface for making LLM API calls.

func NewClient

func NewClient(opts ClientOptions) (LLMClient, error)

NewClient creates an LLMClient for the given options. If Model is empty, a default is chosen per provider. For the openai provider, BaseURL defaults to "https://api.openai.com/v1" if empty.

Example
package main

import (
	"fmt"

	"github.com/agent-ecosystem/skill-validator/judge"
)

func main() {
	client, err := judge.NewClient(judge.ClientOptions{
		Provider: "anthropic",
		APIKey:   "your-api-key",
		// Model defaults to claude-sonnet-4-5-20250929
	})
	if err != nil {
		panic(err)
	}

	fmt.Printf("Provider: %s, Model: %s\n", client.Provider(), client.ModelName())
}
Output:
Provider: anthropic, Model: claude-sonnet-4-5-20250929
Example (Openai)
package main

import (
	"fmt"

	"github.com/agent-ecosystem/skill-validator/judge"
)

func main() {
	client, err := judge.NewClient(judge.ClientOptions{
		Provider: "openai",
		APIKey:   "your-api-key",
		Model:    "gpt-5.2",
	})
	if err != nil {
		panic(err)
	}

	fmt.Printf("Provider: %s, Model: %s\n", client.Provider(), client.ModelName())
}
Output:
Provider: openai, Model: gpt-5.2

type RefScores

type RefScores struct {
	Clarity            int     `json:"clarity"`
	InstructionalValue int     `json:"instructional_value"`
	TokenEfficiency    int     `json:"token_efficiency"`
	Novelty            int     `json:"novelty"`
	SkillRelevance     int     `json:"skill_relevance"`
	Overall            float64 `json:"overall"`
	BriefAssessment    string  `json:"brief_assessment"`
	NovelInfo          string  `json:"novel_info,omitempty"`
}

RefScores holds the LLM judge scores for a reference file.

func AggregateRefScores

func AggregateRefScores(results []*RefScores) *RefScores

AggregateRefScores computes mean scores across multiple reference file results.

func ScoreReference

func ScoreReference(ctx context.Context, content, skillName, skillDesc string, client LLMClient, maxLen int) (*RefScores, error)

ScoreReference sends a reference file's content to the LLM judge and returns parsed scores. maxLen controls content truncation (0 = no truncation).

Example

This example shows how to score a reference file against its parent skill.

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/agent-ecosystem/skill-validator/judge"
)

func main() {
	client, err := judge.NewClient(judge.ClientOptions{
		Provider: "anthropic",
		APIKey:   os.Getenv("ANTHROPIC_API_KEY"),
	})
	if err != nil {
		panic(err)
	}

	refContent := "# API Reference\n\nDetailed API documentation..."

	scores, err := judge.ScoreReference(
		context.Background(),
		refContent,
		"my-skill",                 // parent skill name
		"A skill for doing things", // parent skill description
		client,
		judge.DefaultMaxContentLen,
	)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Overall: %.2f/5\n", scores.Overall)
	for _, d := range scores.DimensionScores() {
		fmt.Printf("  %s: %d/5\n", d.Label, d.Value)
	}
}

func (*RefScores) Assessment

func (s *RefScores) Assessment() string

Assessment returns the brief assessment text.

func (*RefScores) DimensionScores

func (s *RefScores) DimensionScores() []types.DimensionScore

DimensionScores returns the ordered dimension scores for reference file scoring.

func (*RefScores) NovelDetails

func (s *RefScores) NovelDetails() string

NovelDetails returns novel information details, if any.

func (*RefScores) OverallScore

func (s *RefScores) OverallScore() float64

OverallScore returns the computed overall score.

type SkillScores

type SkillScores struct {
	Clarity            int     `json:"clarity"`
	Actionability      int     `json:"actionability"`
	TokenEfficiency    int     `json:"token_efficiency"`
	ScopeDiscipline    int     `json:"scope_discipline"`
	DirectivePrecision int     `json:"directive_precision"`
	Novelty            int     `json:"novelty"`
	Overall            float64 `json:"overall"`
	BriefAssessment    string  `json:"brief_assessment"`
	NovelInfo          string  `json:"novel_info,omitempty"`
}

SkillScores holds the LLM judge scores for a SKILL.md file.

func ScoreSkill

func ScoreSkill(ctx context.Context, content string, client LLMClient, maxLen int) (*SkillScores, error)

ScoreSkill sends a SKILL.md's content to the LLM judge and returns parsed scores. maxLen controls content truncation (0 = no truncation).

Example

This example shows how to score a SKILL.md file. It requires a valid API key, so it is not executed as a test.

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/agent-ecosystem/skill-validator/judge"
)

func main() {
	client, err := judge.NewClient(judge.ClientOptions{
		Provider: "anthropic",
		APIKey:   os.Getenv("ANTHROPIC_API_KEY"),
	})
	if err != nil {
		panic(err)
	}

	skillContent := "# My Skill\n\nInstructions for the agent..."

	scores, err := judge.ScoreSkill(context.Background(), skillContent, client, judge.DefaultMaxContentLen)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Overall: %.2f/5\n", scores.Overall)
	fmt.Printf("Assessment: %s\n", scores.BriefAssessment)
	for _, d := range scores.DimensionScores() {
		fmt.Printf("  %s: %d/5\n", d.Label, d.Value)
	}
}

func (*SkillScores) Assessment

func (s *SkillScores) Assessment() string

Assessment returns the brief assessment text.

func (*SkillScores) DimensionScores

func (s *SkillScores) DimensionScores() []types.DimensionScore

DimensionScores returns the ordered dimension scores for SKILL.md scoring.

func (*SkillScores) NovelDetails

func (s *SkillScores) NovelDetails() string

NovelDetails returns novel information details, if any.

func (*SkillScores) OverallScore

func (s *SkillScores) OverallScore() float64

OverallScore returns the computed overall score.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL