Documentation
¶
Overview ¶
Package judge provides LLM-based quality scoring for skill files. It sends SKILL.md and reference file content to an LLM judge that rates them on dimensions like clarity, actionability, token efficiency, and novelty. Results are cached per provider/model/file to avoid redundant API calls.
Stability ¶
This package is EXPERIMENTAL. Its API may change in minor releases without a major version bump. See the project README for the full stability policy.
Index ¶
- Constants
- func CacheDir(skillDir string) string
- func CacheKey(provider, model, scoreType, skillContext, filePath string) string
- func ContentHash(content string) string
- func DeserializeScored(r *CachedResult) (types.Scored, error)
- func LatestByFile(results []*CachedResult) map[string]*CachedResult
- func RefDimensions() []string
- func SaveCache(cacheDir, key string, result *CachedResult) error
- func SkillDimensions() []string
- type CachedResult
- type ClientOptions
- type LLMClient
- type RefScores
- type SkillScores
Examples ¶
Constants ¶
const DefaultMaxContentLen = 8000
DefaultMaxContentLen is the default maximum content length sent to the judge (characters). Use 0 to disable truncation.
Variables ¶
This section is empty.
Functions ¶
func CacheKey ¶
CacheKey generates a deterministic cache key from provider, model, score type, skill context, and file path. Returns the first 16 hex characters of a SHA-256 hash. Using file path (not content) means editing a file and re-running overwrites the same cache entry rather than creating an orphan.
func ContentHash ¶
ContentHash returns a SHA-256 hash of the content for invalidation checks.
func DeserializeScored ¶
func DeserializeScored(r *CachedResult) (types.Scored, error)
DeserializeScored unmarshals a CachedResult's Scores into the appropriate concrete type and returns it as a Scored interface. It uses the Type field to determine whether the result is a skill or reference score, falling back to checking File == "SKILL.md" for compatibility with older cache entries.
func LatestByFile ¶
func LatestByFile(results []*CachedResult) map[string]*CachedResult
LatestByFile returns the most recent cached result for each unique file, across all models. If model is non-empty, filters to that model first.
func RefDimensions ¶
func RefDimensions() []string
RefDimensions returns the dimension names for reference file scoring.
func SaveCache ¶
func SaveCache(cacheDir, key string, result *CachedResult) error
SaveCache writes a result to the cache directory.
func SkillDimensions ¶
func SkillDimensions() []string
SkillDimensions returns the dimension names for SKILL.md scoring.
Types ¶
type CachedResult ¶
type CachedResult struct {
Provider string `json:"provider"`
Model string `json:"model"`
File string `json:"file"`
Type string `json:"type"`
ContentHash string `json:"content_hash"`
ScoredAt time.Time `json:"scored_at"`
Scores json.RawMessage `json:"scores"`
}
CachedResult holds a scoring result with metadata for cache storage.
func FilterByModel ¶
func FilterByModel(results []*CachedResult, model string) []*CachedResult
FilterByModel returns only results matching the given model name.
func GetCached ¶
func GetCached(cacheDir, key string) (*CachedResult, bool)
GetCached reads a cached result by key. Returns nil, false if not found.
func ListCached ¶
func ListCached(cacheDir string) ([]*CachedResult, error)
ListCached reads all cached results from the cache directory. Results are sorted by ScoredAt (most recent first).
type ClientOptions ¶
type ClientOptions struct {
Provider string // "anthropic", "openai", or "claude-cli"
APIKey string // Required for anthropic and openai; unused for claude-cli
BaseURL string // Optional; defaults per provider
Model string // Optional; defaults per provider
MaxTokensStyle string // "auto", "max_tokens", or "max_completion_tokens"
MaxResponseTokens int // Maximum tokens in the LLM response; 0 defaults to 500
OrgID string // Optional OpenAI organization ID; sent as OpenAI-Organization header
ProjectID string // Optional OpenAI project ID; sent as OpenAI-Project header
}
ClientOptions holds configuration for creating an LLM client.
type LLMClient ¶
type LLMClient interface {
// Complete sends a system prompt and user content to the LLM and returns the text response.
Complete(ctx context.Context, systemPrompt, userContent string) (string, error)
// Provider returns the provider name (e.g. "anthropic", "openai").
Provider() string
// Model returns the model identifier.
ModelName() string
}
LLMClient is the interface for making LLM API calls.
func NewClient ¶
func NewClient(opts ClientOptions) (LLMClient, error)
NewClient creates an LLMClient for the given options. If Model is empty, a default is chosen per provider. For the openai provider, BaseURL defaults to "https://api.openai.com/v1" if empty. The claude-cli provider shells out to the "claude" CLI and does not require an API key.
Example ¶
package main
import (
"fmt"
"github.com/agent-ecosystem/skill-validator/judge"
)
func main() {
client, err := judge.NewClient(judge.ClientOptions{
Provider: "anthropic",
APIKey: "your-api-key",
// Model defaults to claude-sonnet-4-5-20250929
})
if err != nil {
panic(err)
}
fmt.Printf("Provider: %s, Model: %s\n", client.Provider(), client.ModelName())
}
Output: Provider: anthropic, Model: claude-sonnet-4-5-20250929
Example (ClaudeCLI) ¶
ExampleNewClient_claudeCLI demonstrates creating a claude-cli client. This example is not executed as a test because it requires the claude binary.
package main
import (
"fmt"
"github.com/agent-ecosystem/skill-validator/judge"
)
func main() {
client, err := judge.NewClient(judge.ClientOptions{
Provider: "claude-cli",
// Model defaults to "sonnet"; no API key needed
})
if err != nil {
panic(err)
}
fmt.Printf("Provider: %s, Model: %s\n", client.Provider(), client.ModelName())
}
Output:
Example (Openai) ¶
package main
import (
"fmt"
"github.com/agent-ecosystem/skill-validator/judge"
)
func main() {
client, err := judge.NewClient(judge.ClientOptions{
Provider: "openai",
APIKey: "your-api-key",
Model: "gpt-5.2",
})
if err != nil {
panic(err)
}
fmt.Printf("Provider: %s, Model: %s\n", client.Provider(), client.ModelName())
}
Output: Provider: openai, Model: gpt-5.2
type RefScores ¶
type RefScores struct {
Clarity int `json:"clarity"`
InstructionalValue int `json:"instructional_value"`
TokenEfficiency int `json:"token_efficiency"`
Novelty int `json:"novelty"`
SkillRelevance int `json:"skill_relevance"`
Overall float64 `json:"overall"`
BriefAssessment string `json:"brief_assessment"`
NovelInfo string `json:"novel_info,omitempty"`
}
RefScores holds the LLM judge scores for a reference file.
func AggregateRefScores ¶
AggregateRefScores computes mean scores across multiple reference file results.
func ScoreReference ¶
func ScoreReference(ctx context.Context, content, skillName, skillDesc string, client LLMClient, maxLen int) (*RefScores, error)
ScoreReference sends a reference file's content to the LLM judge and returns parsed scores. maxLen controls content truncation (0 = no truncation).
Example ¶
This example shows how to score a reference file against its parent skill.
package main
import (
"context"
"fmt"
"os"
"github.com/agent-ecosystem/skill-validator/judge"
)
func main() {
client, err := judge.NewClient(judge.ClientOptions{
Provider: "anthropic",
APIKey: os.Getenv("ANTHROPIC_API_KEY"),
})
if err != nil {
panic(err)
}
refContent := "# API Reference\n\nDetailed API documentation..."
scores, err := judge.ScoreReference(
context.Background(),
refContent,
"my-skill", // parent skill name
"A skill for doing things", // parent skill description
client,
judge.DefaultMaxContentLen,
)
if err != nil {
panic(err)
}
fmt.Printf("Overall: %.2f/5\n", scores.Overall)
for _, d := range scores.DimensionScores() {
fmt.Printf(" %s: %d/5\n", d.Label, d.Value)
}
}
Output:
func (*RefScores) Assessment ¶
Assessment returns the brief assessment text.
func (*RefScores) DimensionScores ¶
func (s *RefScores) DimensionScores() []types.DimensionScore
DimensionScores returns the ordered dimension scores for reference file scoring.
func (*RefScores) NovelDetails ¶
NovelDetails returns novel information details, if any.
func (*RefScores) OverallScore ¶
OverallScore returns the computed overall score.
type SkillScores ¶
type SkillScores struct {
Clarity int `json:"clarity"`
Actionability int `json:"actionability"`
TokenEfficiency int `json:"token_efficiency"`
ScopeDiscipline int `json:"scope_discipline"`
DirectivePrecision int `json:"directive_precision"`
Novelty int `json:"novelty"`
Overall float64 `json:"overall"`
BriefAssessment string `json:"brief_assessment"`
NovelInfo string `json:"novel_info,omitempty"`
}
SkillScores holds the LLM judge scores for a SKILL.md file.
func ScoreSkill ¶
func ScoreSkill(ctx context.Context, content string, client LLMClient, maxLen int) (*SkillScores, error)
ScoreSkill sends a SKILL.md's content to the LLM judge and returns parsed scores. maxLen controls content truncation (0 = no truncation).
Example ¶
This example shows how to score a SKILL.md file. It requires a valid API key, so it is not executed as a test.
package main
import (
"context"
"fmt"
"os"
"github.com/agent-ecosystem/skill-validator/judge"
)
func main() {
client, err := judge.NewClient(judge.ClientOptions{
Provider: "anthropic",
APIKey: os.Getenv("ANTHROPIC_API_KEY"),
})
if err != nil {
panic(err)
}
skillContent := "# My Skill\n\nInstructions for the agent..."
scores, err := judge.ScoreSkill(context.Background(), skillContent, client, judge.DefaultMaxContentLen)
if err != nil {
panic(err)
}
fmt.Printf("Overall: %.2f/5\n", scores.Overall)
fmt.Printf("Assessment: %s\n", scores.BriefAssessment)
for _, d := range scores.DimensionScores() {
fmt.Printf(" %s: %d/5\n", d.Label, d.Value)
}
}
Output:
func (*SkillScores) Assessment ¶
func (s *SkillScores) Assessment() string
Assessment returns the brief assessment text.
func (*SkillScores) DimensionScores ¶
func (s *SkillScores) DimensionScores() []types.DimensionScore
DimensionScores returns the ordered dimension scores for SKILL.md scoring.
func (*SkillScores) NovelDetails ¶
func (s *SkillScores) NovelDetails() string
NovelDetails returns novel information details, if any.
func (*SkillScores) OverallScore ¶
func (s *SkillScores) OverallScore() float64
OverallScore returns the computed overall score.