Documentation
¶
Overview ¶
Package metric defines the Metric interface used by gaugo to evaluate one case at a time, plus the built-in deterministic and LLM-judge metrics.
Users may implement Metric directly to plug custom evaluation logic into gaugo.Suite.Assert / gaugo.Runner.Run. The root gaugo package re-exports every public identifier in this package as a type alias or one-line wrapper, so callers can use either gaugo.Faithfulness or metric.Faithfulness interchangeably.
Index ¶
- Constants
- type Document
- type EvalInput
- type Expected
- type Input
- type Judge
- type JudgeRequest
- type JudgeResponse
- type Metric
- func AnswerCorrectness(opts ...Option) Metric
- func AnswerLength(opts ...Option) Metric
- func AnswerRelevancy(opts ...Option) Metric
- func AnswerSimilarity(opts ...Option) Metric
- func Bias(opts ...Option) Metric
- func CitationAccuracy(opts ...Option) Metric
- func Coherence(opts ...Option) Metric
- func Completeness(opts ...Option) Metric
- func Conciseness(opts ...Option) Metric
- func ContextPrecision(opts ...Option) Metric
- func ContextRecall(opts ...Option) Metric
- func ContextRelevancy(opts ...Option) Metric
- func ExpectedJSON(opts ...Option) Metric
- func ExpectedRegex(pattern string, opts ...Option) Metric
- func Faithfulness(opts ...Option) Metric
- func GEval(criteria string, opts ...Option) Metric
- func Hallucination(opts ...Option) Metric
- func InstructionAdherence(opts ...Option) Metric
- func JSONValidity(opts ...Option) Metric
- func Latency(opts ...Option) Metric
- func SchemaCompliance(opts ...Option) Metric
- func SummarizationQuality(opts ...Option) Metric
- func Toxicity(opts ...Option) Metric
- type Option
- type Output
- type Result
Constants ¶
const ( NameJSONValidity = "JSONValidity" NameSchemaCompliance = "SchemaCompliance" NameExpectedJSON = "ExpectedJSON" NameAnswerSimilarity = "AnswerSimilarity" NameLatency = "Latency" NameAnswerLength = "AnswerLength" NameExpectedRegex = "ExpectedRegex" )
Canonical names for deterministic (non-judge) metrics.
const ( DetailKeyValid = "valid" DetailKeyError = "error" DetailKeyFields = "fields" DetailKeyElapsedMS = "elapsed_ms" DetailKeyMaxMS = "max_ms" DetailKeyLength = "length" DetailKeyPattern = "pattern" DetailKeyMatch = "match" DetailKeyIntersectionTokens = "intersection_tokens" DetailKeyUnionTokens = "union_tokens" )
Standard detail keys exposed on Result.Details for built-in metrics. They are part of the observable contract and remain stable.
const ( NameFaithfulness = "Faithfulness" NameAnswerRelevancy = "AnswerRelevancy" NameContextRelevancy = "ContextRelevancy" NameContextPrecision = "ContextPrecision" NameContextRecall = "ContextRecall" NameAnswerCorrectness = "AnswerCorrectness" NameHallucination = "Hallucination" NameToxicity = "Toxicity" NameBias = "Bias" NameCoherence = "Coherence" NameConciseness = "Conciseness" NameCompleteness = "Completeness" NameInstructionAdherence = "InstructionAdherence" NameGEval = "GEval" NameCitationAccuracy = "CitationAccuracy" NameSummarizationQuality = "SummarizationQuality" )
Canonical metric names (the strings returned by Metric.Name and stored in Result.Name). Each metric's human-readable label is derived from its name via labelOf at runtime.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type EvalInput ¶
type EvalInput struct {
CaseName string
Input Input
Output Output
Expected Expected
Elapsed time.Duration
}
EvalInput is provided to a Metric after a case run completes.
type Judge ¶
type Judge interface {
EvaluateJSON(ctx context.Context, req JudgeRequest) (JudgeResponse, error)
}
Judge evaluates metric prompts and must return strictly-structured JSON.
type JudgeRequest ¶
type JudgeRequest struct {
Metric string
Question string
Answer string
ExpectedAnswer string
ExpectedInstructions string
ContextDocs []Document
Instructions string
Schema json.RawMessage
}
JudgeRequest describes a metric evaluation request sent to a Judge.
type JudgeResponse ¶
type JudgeResponse struct {
RawJSON []byte
Provider string
Model string
RequestID string
Latency time.Duration
}
JudgeResponse contains the raw structured output from a Judge.
type Metric ¶
type Metric interface {
Name() string
Evaluate(ctx context.Context, in EvalInput, j Judge) (Result, error)
}
Metric evaluates a completed case and returns a Result. Built-in metrics implement this interface; users may also implement it for custom logic.
func AnswerCorrectness ¶
AnswerCorrectness scores how well the answer matches Expected.Answer. Requires Expected.Answer.
func AnswerLength ¶
AnswerLength scores whether the answer length lies in [min,max] runes. Requires WithMinLength or WithMaxLength.
func AnswerRelevancy ¶
AnswerRelevancy scores how well the answer addresses the question.
func AnswerSimilarity ¶
AnswerSimilarity scores Jaccard token overlap against Expected.Answer.
func CitationAccuracy ¶
CitationAccuracy scores how accurate inline citations are against context.
func Completeness ¶
Completeness scores how completely the answer addresses the question.
func Conciseness ¶
Conciseness scores how succinct the answer is without losing meaning.
func ContextPrecision ¶
ContextPrecision scores the fraction of context documents that are useful.
func ContextRecall ¶
ContextRecall scores how much of the expected answer is supported by the context. Requires Expected.Answer.
func ContextRelevancy ¶
ContextRelevancy scores how relevant each context document is to the question.
func ExpectedJSON ¶
ExpectedJSON scores how many expected JSON fields match the answer at the configured dotted paths. Requires WithExpectedFields.
func ExpectedRegex ¶
ExpectedRegex scores whether the answer matches the given Go regular expression.
func Faithfulness ¶
Faithfulness scores whether the answer is supported by the provided context.
func GEval ¶
GEval scores along a free-form criteria string using the answer-relevancy schema. Criteria must be non-empty.
func Hallucination ¶
Hallucination scores the fraction of claims that are not hallucinated.
func InstructionAdherence ¶
InstructionAdherence scores how closely the answer followed Expected.Instructions. Requires Expected.Instructions.
func JSONValidity ¶
JSONValidity scores whether the answer parses as valid JSON.
func Latency ¶
Latency scores whether the run elapsed within the configured maximum. Requires WithMaxLatency.
func SchemaCompliance ¶
SchemaCompliance scores whether the answer matches the configured JSON schema. Requires WithSchema.
func SummarizationQuality ¶
SummarizationQuality scores summary coverage, fidelity and conciseness as one score.
type Option ¶
type Option func(*config)
Option configures a built-in metric. Options are validated eagerly; invalid combinations surface as a metric-level error during Evaluate.
func WithExpectedFields ¶
WithExpectedFields sets the dotted JSON paths and expected values used by ExpectedJSON. Keys are trimmed; an empty key is rejected.
func WithMaxLatency ¶
WithMaxLatency sets the maximum allowed run latency for Latency.
func WithMaxLength ¶
WithMaxLength sets the maximum allowed answer length in runes.
func WithMinLength ¶
WithMinLength sets the minimum allowed answer length in runes.
func WithSchema ¶
func WithSchema(schema json.RawMessage) Option
WithSchema sets the JSON Schema used by SchemaCompliance.
func WithThreshold ¶
WithThreshold sets the pass/fail threshold in [0,1]. Default is 0.7.