metric

package
v1.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: MIT Imports: 33 Imported by: 0

Documentation

Overview

Package metric defines the Metric interface used by gaugo to evaluate one case at a time, plus the built-in deterministic and LLM-judge metrics.

Users may implement Metric directly to plug custom evaluation logic into gaugo.Suite.Assert / gaugo.Runner.Run. The root gaugo package re-exports every public identifier in this package as a type alias or one-line wrapper, so callers can use either gaugo.Faithfulness or metric.Faithfulness interchangeably.

Index

Constants

View Source
const (
	NameJSONValidity     = "JSONValidity"
	NameSchemaCompliance = "SchemaCompliance"
	NameExpectedJSON     = "ExpectedJSON"
	NameAnswerSimilarity = "AnswerSimilarity"
	NameLatency          = "Latency"
	NameAnswerLength     = "AnswerLength"
	NameExpectedRegex    = "ExpectedRegex"
)

Canonical names for deterministic (non-judge) metrics.

View Source
const (
	DetailKeyValid              = "valid"
	DetailKeyError              = "error"
	DetailKeyFields             = "fields"
	DetailKeyElapsedMS          = "elapsed_ms"
	DetailKeyMaxMS              = "max_ms"
	DetailKeyLength             = "length"
	DetailKeyPattern            = "pattern"
	DetailKeyMatch              = "match"
	DetailKeyIntersectionTokens = "intersection_tokens"
	DetailKeyUnionTokens        = "union_tokens"
)

Standard detail keys exposed on Result.Details for built-in metrics. They are part of the observable contract and remain stable.

View Source
const (
	NameFaithfulness         = "Faithfulness"
	NameAnswerRelevancy      = "AnswerRelevancy"
	NameContextRelevancy     = "ContextRelevancy"
	NameContextPrecision     = "ContextPrecision"
	NameContextRecall        = "ContextRecall"
	NameAnswerCorrectness    = "AnswerCorrectness"
	NameHallucination        = "Hallucination"
	NameToxicity             = "Toxicity"
	NameBias                 = "Bias"
	NameCoherence            = "Coherence"
	NameConciseness          = "Conciseness"
	NameCompleteness         = "Completeness"
	NameInstructionAdherence = "InstructionAdherence"
	NameGEval                = "GEval"
	NameCitationAccuracy     = "CitationAccuracy"
	NameSummarizationQuality = "SummarizationQuality"
)

Canonical metric names (the strings returned by Metric.Name and stored in Result.Name). Each metric's human-readable label is derived from its name via labelOf at runtime.

Variables

This section is empty.

Functions

This section is empty.

Types

type Document

type Document struct {
	ID   string
	Text string
}

Document is one retrieved context record for a case.

func Doc

func Doc(id, text string) Document

Doc is a convenience constructor for Document.

type EvalInput

type EvalInput struct {
	CaseName string
	Input    Input
	Output   Output
	Expected Expected
	Elapsed  time.Duration
}

EvalInput is provided to a Metric after a case run completes.

type Expected

type Expected struct {
	Contains     []string
	Answer       string
	Instructions string
}

Expected holds simple non-LLM assertions for a case.

type Input

type Input struct {
	Question string
	Context  []Document
}

Input is the test input passed to a target system under evaluation.

type Judge

type Judge interface {
	EvaluateJSON(ctx context.Context, req JudgeRequest) (JudgeResponse, error)
}

Judge evaluates metric prompts and must return strictly-structured JSON.

type JudgeRequest

type JudgeRequest struct {
	Metric               string
	Question             string
	Answer               string
	ExpectedAnswer       string
	ExpectedInstructions string
	ContextDocs          []Document
	Instructions         string
	Schema               json.RawMessage
}

JudgeRequest describes a metric evaluation request sent to a Judge.

type JudgeResponse

type JudgeResponse struct {
	RawJSON   []byte
	Provider  string
	Model     string
	RequestID string
	Latency   time.Duration
}

JudgeResponse contains the raw structured output from a Judge.

type Metric

type Metric interface {
	Name() string
	Evaluate(ctx context.Context, in EvalInput, j Judge) (Result, error)
}

Metric evaluates a completed case and returns a Result. Built-in metrics implement this interface; users may also implement it for custom logic.

func AnswerCorrectness

func AnswerCorrectness(opts ...Option) Metric

AnswerCorrectness scores how well the answer matches Expected.Answer. Requires Expected.Answer.

func AnswerLength

func AnswerLength(opts ...Option) Metric

AnswerLength scores whether the answer length lies in [min,max] runes. Requires WithMinLength or WithMaxLength.

func AnswerRelevancy

func AnswerRelevancy(opts ...Option) Metric

AnswerRelevancy scores how well the answer addresses the question.

func AnswerSimilarity

func AnswerSimilarity(opts ...Option) Metric

AnswerSimilarity scores Jaccard token overlap against Expected.Answer.

func Bias

func Bias(opts ...Option) Metric

Bias scores how unbiased the answer is (higher is fairer).

func CitationAccuracy

func CitationAccuracy(opts ...Option) Metric

CitationAccuracy scores how accurate inline citations are against context.

func Coherence

func Coherence(opts ...Option) Metric

Coherence scores how internally consistent the answer is.

func Completeness

func Completeness(opts ...Option) Metric

Completeness scores how completely the answer addresses the question.

func Conciseness

func Conciseness(opts ...Option) Metric

Conciseness scores how succinct the answer is without losing meaning.

func ContextPrecision

func ContextPrecision(opts ...Option) Metric

ContextPrecision scores the fraction of context documents that are useful.

func ContextRecall

func ContextRecall(opts ...Option) Metric

ContextRecall scores how much of the expected answer is supported by the context. Requires Expected.Answer.

func ContextRelevancy

func ContextRelevancy(opts ...Option) Metric

ContextRelevancy scores how relevant each context document is to the question.

func ExpectedJSON

func ExpectedJSON(opts ...Option) Metric

ExpectedJSON scores how many expected JSON fields match the answer at the configured dotted paths. Requires WithExpectedFields.

func ExpectedRegex

func ExpectedRegex(pattern string, opts ...Option) Metric

ExpectedRegex scores whether the answer matches the given Go regular expression.

func Faithfulness

func Faithfulness(opts ...Option) Metric

Faithfulness scores whether the answer is supported by the provided context.

func GEval

func GEval(criteria string, opts ...Option) Metric

GEval scores along a free-form criteria string using the answer-relevancy schema. Criteria must be non-empty.

func Hallucination

func Hallucination(opts ...Option) Metric

Hallucination scores the fraction of claims that are not hallucinated.

func InstructionAdherence

func InstructionAdherence(opts ...Option) Metric

InstructionAdherence scores how closely the answer followed Expected.Instructions. Requires Expected.Instructions.

func JSONValidity

func JSONValidity(opts ...Option) Metric

JSONValidity scores whether the answer parses as valid JSON.

func Latency

func Latency(opts ...Option) Metric

Latency scores whether the run elapsed within the configured maximum. Requires WithMaxLatency.

func SchemaCompliance

func SchemaCompliance(opts ...Option) Metric

SchemaCompliance scores whether the answer matches the configured JSON schema. Requires WithSchema.

func SummarizationQuality

func SummarizationQuality(opts ...Option) Metric

SummarizationQuality scores summary coverage, fidelity and conciseness as one score.

func Toxicity

func Toxicity(opts ...Option) Metric

Toxicity scores how non-toxic the answer is (higher is safer).

type Option

type Option func(*config)

Option configures a built-in metric. Options are validated eagerly; invalid combinations surface as a metric-level error during Evaluate.

func WithExpectedFields

func WithExpectedFields(fields map[string]any) Option

WithExpectedFields sets the dotted JSON paths and expected values used by ExpectedJSON. Keys are trimmed; an empty key is rejected.

func WithMaxLatency

func WithMaxLatency(d time.Duration) Option

WithMaxLatency sets the maximum allowed run latency for Latency.

func WithMaxLength

func WithMaxLength(n int) Option

WithMaxLength sets the maximum allowed answer length in runes.

func WithMinLength

func WithMinLength(n int) Option

WithMinLength sets the minimum allowed answer length in runes.

func WithSchema

func WithSchema(schema json.RawMessage) Option

WithSchema sets the JSON Schema used by SchemaCompliance.

func WithThreshold

func WithThreshold(v float64) Option

WithThreshold sets the pass/fail threshold in [0,1]. Default is 0.7.

type Output

type Output struct {
	Answer string
}

Output is the target system answer under evaluation.

type Result

type Result struct {
	Name         string
	Score        float64
	Pass         bool
	Reason       string
	Details      []byte
	Provider     string
	Model        string
	RequestID    string
	JudgeLatency time.Duration
}

Result represents a score and pass/fail outcome for one metric. Re-exported as gaugo.MetricResult.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL