judge

package

v1.2.1 Latest Latest Go to latest Published: Mar 11, 2026 License: MIT Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/agent-ecosystem/skill-validator

Links

Open Source Insights

Documentation ¶

Overview ¶

Package judge provides LLM-based quality scoring for skill files. It sends SKILL.md and reference file content to an LLM judge that rates them on dimensions like clarity, actionability, token efficiency, and novelty. Results are cached per provider/model/file to avoid redundant API calls.

Stability ¶

This package is EXPERIMENTAL. Its API may change in minor releases without a major version bump. See the project README for the full stability policy.

Index ¶

Constants
func CacheDir(skillDir string) string
func CacheKey(provider, model, scoreType, skillContext, filePath string) string
func ContentHash(content string) string
func DeserializeScored(r *CachedResult) (types.Scored, error)
func LatestByFile(results []*CachedResult) map[string]*CachedResult
func RefDimensions() []string
func SaveCache(cacheDir, key string, result *CachedResult) error
func SkillDimensions() []string
type CachedResult
- func FilterByModel(results []*CachedResult, model string) []*CachedResult
- func GetCached(cacheDir, key string) (*CachedResult, bool)
- func ListCached(cacheDir string) ([]*CachedResult, error)
type ClientOptions
type LLMClient
- func NewClient(opts ClientOptions) (LLMClient, error)
type RefScores
- func AggregateRefScores(results []*RefScores) *RefScores
- func ScoreReference(ctx context.Context, content, skillName, skillDesc string, client LLMClient, ...) (*RefScores, error)
- func (s *RefScores) Assessment() string
- func (s *RefScores) DimensionScores() []types.DimensionScore
- func (s *RefScores) NovelDetails() string
- func (s *RefScores) OverallScore() float64
type SkillScores
- func ScoreSkill(ctx context.Context, content string, client LLMClient, maxLen int) (*SkillScores, error)
- func (s *SkillScores) Assessment() string
- func (s *SkillScores) DimensionScores() []types.DimensionScore
- func (s *SkillScores) NovelDetails() string
- func (s *SkillScores) OverallScore() float64

Constants ¶

View Source

const DefaultMaxContentLen = 8000

DefaultMaxContentLen is the default maximum content length sent to the judge (characters). Use 0 to disable truncation.

Variables ¶

This section is empty.

Functions ¶

func CacheDir ¶

func CacheDir(skillDir string) string

CacheDir returns the cache directory path for a given skill directory.

func CacheKey ¶

func CacheKey(provider, model, scoreType, skillContext, filePath string) string

CacheKey generates a deterministic cache key from provider, model, score type, skill context, and file path. Returns the first 16 hex characters of a SHA-256 hash. Using file path (not content) means editing a file and re-running overwrites the same cache entry rather than creating an orphan.

func ContentHash ¶

func ContentHash(content string) string

ContentHash returns a SHA-256 hash of the content for invalidation checks.

func DeserializeScored ¶

func DeserializeScored(r *CachedResult) (types.Scored, error)

DeserializeScored unmarshals a CachedResult's Scores into the appropriate concrete type and returns it as a Scored interface. It uses the Type field to determine whether the result is a skill or reference score, falling back to checking File == "SKILL.md" for compatibility with older cache entries.

func LatestByFile ¶

func LatestByFile(results []*CachedResult) map[string]*CachedResult

LatestByFile returns the most recent cached result for each unique file, across all models. If model is non-empty, filters to that model first.

func RefDimensions ¶

func RefDimensions() []string

RefDimensions returns the dimension names for reference file scoring.

func SaveCache ¶

func SaveCache(cacheDir, key string, result *CachedResult) error

SaveCache writes a result to the cache directory.

func SkillDimensions ¶

func SkillDimensions() []string

SkillDimensions returns the dimension names for SKILL.md scoring.

Types ¶

type CachedResult ¶

type CachedResult struct {
	Provider    string          `json:"provider"`
	Model       string          `json:"model"`
	File        string          `json:"file"`
	Type        string          `json:"type"`
	ContentHash string          `json:"content_hash"`
	ScoredAt    time.Time       `json:"scored_at"`
	Scores      json.RawMessage `json:"scores"`
}

CachedResult holds a scoring result with metadata for cache storage.

func FilterByModel ¶

func FilterByModel(results []*CachedResult, model string) []*CachedResult

FilterByModel returns only results matching the given model name.

func GetCached ¶

func GetCached(cacheDir, key string) (*CachedResult, bool)

GetCached reads a cached result by key. Returns nil, false if not found.

func ListCached ¶

func ListCached(cacheDir string) ([]*CachedResult, error)

ListCached reads all cached results from the cache directory. Results are sorted by ScoredAt (most recent first).

type ClientOptions ¶

type ClientOptions struct {
	Provider          string // "anthropic" or "openai"
	APIKey            string // Required
	BaseURL           string // Optional; defaults per provider
	Model             string // Optional; defaults per provider
	MaxTokensStyle    string // "auto", "max_tokens", or "max_completion_tokens"
	MaxResponseTokens int    // Maximum tokens in the LLM response; 0 defaults to 500
}

ClientOptions holds configuration for creating an LLM client.

type LLMClient ¶

type LLMClient interface {
	// Complete sends a system prompt and user content to the LLM and returns the text response.
	Complete(ctx context.Context, systemPrompt, userContent string) (string, error)
	// Provider returns the provider name (e.g. "anthropic", "openai").
	Provider() string
	// Model returns the model identifier.
	ModelName() string
}

LLMClient is the interface for making LLM API calls.

func NewClient ¶

func NewClient(opts ClientOptions) (LLMClient, error)

NewClient creates an LLMClient for the given options. If Model is empty, a default is chosen per provider. For the openai provider, BaseURL defaults to "https://api.openai.com/v1" if empty.

Example ¶

package main

import (
	"fmt"

	"github.com/agent-ecosystem/skill-validator/judge"
)

func main() {
	client, err := judge.NewClient(judge.ClientOptions{
		Provider: "anthropic",
		APIKey:   "your-api-key",
		// Model defaults to claude-sonnet-4-5-20250929
	})
	if err != nil {
		panic(err)
	}

	fmt.Printf("Provider: %s, Model: %s\n", client.Provider(), client.ModelName())
}

Output:
Provider: anthropic, Model: claude-sonnet-4-5-20250929

Example (Openai) ¶

package main

import (
	"fmt"

	"github.com/agent-ecosystem/skill-validator/judge"
)

func main() {
	client, err := judge.NewClient(judge.ClientOptions{
		Provider: "openai",
		APIKey:   "your-api-key",
		Model:    "gpt-5.2",
	})
	if err != nil {
		panic(err)
	}

	fmt.Printf("Provider: %s, Model: %s\n", client.Provider(), client.ModelName())
}

Output:
Provider: openai, Model: gpt-5.2

type RefScores ¶

type RefScores struct {
	Clarity            int     `json:"clarity"`
	InstructionalValue int     `json:"instructional_value"`
	TokenEfficiency    int     `json:"token_efficiency"`
	Novelty            int     `json:"novelty"`
	SkillRelevance     int     `json:"skill_relevance"`
	Overall            float64 `json:"overall"`
	BriefAssessment    string  `json:"brief_assessment"`
	NovelInfo          string  `json:"novel_info,omitempty"`
}

RefScores holds the LLM judge scores for a reference file.

func AggregateRefScores ¶

func AggregateRefScores(results []*RefScores) *RefScores

AggregateRefScores computes mean scores across multiple reference file results.

func ScoreReference ¶

func ScoreReference(ctx context.Context, content, skillName, skillDesc string, client LLMClient, maxLen int) (*RefScores, error)

ScoreReference sends a reference file's content to the LLM judge and returns parsed scores. maxLen controls content truncation (0 = no truncation).

Example ¶

This example shows how to score a reference file against its parent skill.

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/agent-ecosystem/skill-validator/judge"
)

func main() {
	client, err := judge.NewClient(judge.ClientOptions{
		Provider: "anthropic",
		APIKey:   os.Getenv("ANTHROPIC_API_KEY"),
	})
	if err != nil {
		panic(err)
	}

	refContent := "# API Reference\n\nDetailed API documentation..."

	scores, err := judge.ScoreReference(
		context.Background(),
		refContent,
		"my-skill",                 // parent skill name
		"A skill for doing things", // parent skill description
		client,
		judge.DefaultMaxContentLen,
	)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Overall: %.2f/5\n", scores.Overall)
	for _, d := range scores.DimensionScores() {
		fmt.Printf("  %s: %d/5\n", d.Label, d.Value)
	}
}

Output:

func (*RefScores) Assessment ¶

func (s *RefScores) Assessment() string

Assessment returns the brief assessment text.

func (*RefScores) DimensionScores ¶

func (s *RefScores) DimensionScores() []types.DimensionScore

DimensionScores returns the ordered dimension scores for reference file scoring.

func (*RefScores) NovelDetails ¶

func (s *RefScores) NovelDetails() string

NovelDetails returns novel information details, if any.

func (*RefScores) OverallScore ¶

func (s *RefScores) OverallScore() float64

OverallScore returns the computed overall score.

type SkillScores ¶

type SkillScores struct {
	Clarity            int     `json:"clarity"`
	Actionability      int     `json:"actionability"`
	TokenEfficiency    int     `json:"token_efficiency"`
	ScopeDiscipline    int     `json:"scope_discipline"`
	DirectivePrecision int     `json:"directive_precision"`
	Novelty            int     `json:"novelty"`
	Overall            float64 `json:"overall"`
	BriefAssessment    string  `json:"brief_assessment"`
	NovelInfo          string  `json:"novel_info,omitempty"`
}

SkillScores holds the LLM judge scores for a SKILL.md file.

func ScoreSkill ¶

func ScoreSkill(ctx context.Context, content string, client LLMClient, maxLen int) (*SkillScores, error)

ScoreSkill sends a SKILL.md's content to the LLM judge and returns parsed scores. maxLen controls content truncation (0 = no truncation).

Example ¶

This example shows how to score a SKILL.md file. It requires a valid API key, so it is not executed as a test.

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/agent-ecosystem/skill-validator/judge"
)

func main() {
	client, err := judge.NewClient(judge.ClientOptions{
		Provider: "anthropic",
		APIKey:   os.Getenv("ANTHROPIC_API_KEY"),
	})
	if err != nil {
		panic(err)
	}

	skillContent := "# My Skill\n\nInstructions for the agent..."

	scores, err := judge.ScoreSkill(context.Background(), skillContent, client, judge.DefaultMaxContentLen)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Overall: %.2f/5\n", scores.Overall)
	fmt.Printf("Assessment: %s\n", scores.BriefAssessment)
	for _, d := range scores.DimensionScores() {
		fmt.Printf("  %s: %d/5\n", d.Label, d.Value)
	}
}

Output:

func (*SkillScores) Assessment ¶

func (s *SkillScores) Assessment() string

Assessment returns the brief assessment text.

func (*SkillScores) DimensionScores ¶

func (s *SkillScores) DimensionScores() []types.DimensionScore

DimensionScores returns the ordered dimension scores for SKILL.md scoring.

func (*SkillScores) NovelDetails ¶

func (s *SkillScores) NovelDetails() string

NovelDetails returns novel information details, if any.

func (*SkillScores) OverallScore ¶

func (s *SkillScores) OverallScore() float64

OverallScore returns the computed overall score.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL