metric

package

v1.1.1 Latest Latest Go to latest Published: May 16, 2026 License: MIT Imports: 33 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/nnull13/gaugo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package metric defines the Metric interface used by gaugo to evaluate one case at a time, plus the built-in deterministic and LLM-judge metrics.

Users may implement Metric directly to plug custom evaluation logic into gaugo.Suite.Assert / gaugo.Runner.Run. The root gaugo package re-exports every public identifier in this package as a type alias or one-line wrapper, so callers can use either gaugo.Faithfulness or metric.Faithfulness interchangeably.

Constants ¶

View Source

const (
	NameJSONValidity     = "JSONValidity"
	NameSchemaCompliance = "SchemaCompliance"
	NameExpectedJSON     = "ExpectedJSON"
	NameAnswerSimilarity = "AnswerSimilarity"
	NameLatency          = "Latency"
	NameAnswerLength     = "AnswerLength"
	NameExpectedRegex    = "ExpectedRegex"
)

Canonical names for deterministic (non-judge) metrics.

View Source

const (
	DetailKeyValid              = "valid"
	DetailKeyError              = "error"
	DetailKeyFields             = "fields"
	DetailKeyElapsedMS          = "elapsed_ms"
	DetailKeyMaxMS              = "max_ms"
	DetailKeyLength             = "length"
	DetailKeyPattern            = "pattern"
	DetailKeyMatch              = "match"
	DetailKeyIntersectionTokens = "intersection_tokens"
	DetailKeyUnionTokens        = "union_tokens"
)

Standard detail keys exposed on Result.Details for built-in metrics. They are part of the observable contract and remain stable.

View Source

const (
	NameFaithfulness         = "Faithfulness"
	NameAnswerRelevancy      = "AnswerRelevancy"
	NameContextRelevancy     = "ContextRelevancy"
	NameContextPrecision     = "ContextPrecision"
	NameContextRecall        = "ContextRecall"
	NameAnswerCorrectness    = "AnswerCorrectness"
	NameHallucination        = "Hallucination"
	NameToxicity             = "Toxicity"
	NameBias                 = "Bias"
	NameCoherence            = "Coherence"
	NameConciseness          = "Conciseness"
	NameCompleteness         = "Completeness"
	NameInstructionAdherence = "InstructionAdherence"
	NameGEval                = "GEval"
	NameCitationAccuracy     = "CitationAccuracy"
	NameSummarizationQuality = "SummarizationQuality"
)

Canonical metric names (the strings returned by Metric.Name and stored in Result.Name). Each metric's human-readable label is derived from its name via labelOf at runtime.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Document ¶

type Document struct {
	ID   string
	Text string
}

Document is one retrieved context record for a case.

func Doc ¶

func Doc(id, text string) Document

Doc is a convenience constructor for Document.

type EvalInput ¶

type EvalInput struct {
	CaseName string
	Input    Input
	Output   Output
	Expected Expected
	Elapsed  time.Duration
}

EvalInput is provided to a Metric after a case run completes.

type Expected ¶

type Expected struct {
	Contains     []string
	Answer       string
	Instructions string
}

Expected holds simple non-LLM assertions for a case.

type Input ¶

type Input struct {
	Question string
	Context  []Document
}

Input is the test input passed to a target system under evaluation.

type Judge ¶

type Judge interface {
	EvaluateJSON(ctx context.Context, req JudgeRequest) (JudgeResponse, error)
}

Judge evaluates metric prompts and must return strictly-structured JSON.

type JudgeRequest ¶

type JudgeRequest struct {
	Metric               string
	Question             string
	Answer               string
	ExpectedAnswer       string
	ExpectedInstructions string
	ContextDocs          []Document
	Instructions         string
	Schema               json.RawMessage
}

JudgeRequest describes a metric evaluation request sent to a Judge.

type JudgeResponse ¶

type JudgeResponse struct {
	RawJSON   []byte
	Provider  string
	Model     string
	RequestID string
	Latency   time.Duration
}

JudgeResponse contains the raw structured output from a Judge.

type Metric ¶

type Metric interface {
	Name() string
	Evaluate(ctx context.Context, in EvalInput, j Judge) (Result, error)
}

Metric evaluates a completed case and returns a Result. Built-in metrics implement this interface; users may also implement it for custom logic.

func AnswerCorrectness ¶

func AnswerCorrectness(opts ...Option) Metric

AnswerCorrectness scores how well the answer matches Expected.Answer. Requires Expected.Answer.

func AnswerLength ¶

func AnswerLength(opts ...Option) Metric

AnswerLength scores whether the answer length lies in [min,max] runes. Requires WithMinLength or WithMaxLength.

func AnswerRelevancy ¶

func AnswerRelevancy(opts ...Option) Metric

AnswerRelevancy scores how well the answer addresses the question.

func AnswerSimilarity ¶

func AnswerSimilarity(opts ...Option) Metric

AnswerSimilarity scores Jaccard token overlap against Expected.Answer.

func Bias ¶

func Bias(opts ...Option) Metric

Bias scores how unbiased the answer is (higher is fairer).

func CitationAccuracy ¶

func CitationAccuracy(opts ...Option) Metric

CitationAccuracy scores how accurate inline citations are against context.

func Coherence ¶

func Coherence(opts ...Option) Metric

Coherence scores how internally consistent the answer is.

func Completeness ¶

func Completeness(opts ...Option) Metric

Completeness scores how completely the answer addresses the question.

func Conciseness ¶

func Conciseness(opts ...Option) Metric

Conciseness scores how succinct the answer is without losing meaning.

func ContextPrecision ¶

func ContextPrecision(opts ...Option) Metric

ContextPrecision scores the fraction of context documents that are useful.

func ContextRecall ¶

func ContextRecall(opts ...Option) Metric

ContextRecall scores how much of the expected answer is supported by the context. Requires Expected.Answer.

func ContextRelevancy ¶

func ContextRelevancy(opts ...Option) Metric

ContextRelevancy scores how relevant each context document is to the question.

func ExpectedJSON ¶

func ExpectedJSON(opts ...Option) Metric

ExpectedJSON scores how many expected JSON fields match the answer at the configured dotted paths. Requires WithExpectedFields.

func ExpectedRegex ¶

func ExpectedRegex(pattern string, opts ...Option) Metric

ExpectedRegex scores whether the answer matches the given Go regular expression.

func Faithfulness ¶

func Faithfulness(opts ...Option) Metric

Faithfulness scores whether the answer is supported by the provided context.

func GEval ¶

func GEval(criteria string, opts ...Option) Metric

GEval scores along a free-form criteria string using the answer-relevancy schema. Criteria must be non-empty.

func Hallucination ¶

func Hallucination(opts ...Option) Metric

Hallucination scores the fraction of claims that are not hallucinated.

func InstructionAdherence ¶

func InstructionAdherence(opts ...Option) Metric

InstructionAdherence scores how closely the answer followed Expected.Instructions. Requires Expected.Instructions.

func JSONValidity ¶

func JSONValidity(opts ...Option) Metric

JSONValidity scores whether the answer parses as valid JSON.

func Latency ¶

func Latency(opts ...Option) Metric

Latency scores whether the run elapsed within the configured maximum. Requires WithMaxLatency.

func SchemaCompliance ¶

func SchemaCompliance(opts ...Option) Metric

SchemaCompliance scores whether the answer matches the configured JSON schema. Requires WithSchema.

func SummarizationQuality ¶

func SummarizationQuality(opts ...Option) Metric

SummarizationQuality scores summary coverage, fidelity and conciseness as one score.

func Toxicity ¶

func Toxicity(opts ...Option) Metric

Toxicity scores how non-toxic the answer is (higher is safer).

type Option ¶

type Option func(*config)

Option configures a built-in metric. Options are validated eagerly; invalid combinations surface as a metric-level error during Evaluate.

func WithExpectedFields ¶

func WithExpectedFields(fields map[string]any) Option

WithExpectedFields sets the dotted JSON paths and expected values used by ExpectedJSON. Keys are trimmed; an empty key is rejected.

func WithMaxLatency ¶

func WithMaxLatency(d time.Duration) Option

WithMaxLatency sets the maximum allowed run latency for Latency.

func WithMaxLength ¶

func WithMaxLength(n int) Option

WithMaxLength sets the maximum allowed answer length in runes.

func WithMinLength ¶

func WithMinLength(n int) Option

WithMinLength sets the minimum allowed answer length in runes.

func WithSchema ¶

func WithSchema(schema json.RawMessage) Option

WithSchema sets the JSON Schema used by SchemaCompliance.

func WithThreshold ¶

func WithThreshold(v float64) Option

WithThreshold sets the pass/fail threshold in [0,1]. Default is 0.7.

type Output ¶

type Output struct {
	Answer string
}

Output is the target system answer under evaluation.

type Result ¶

type Result struct {
	Name         string
	Score        float64
	Pass         bool
	Reason       string
	Details      []byte
	Provider     string
	Model        string
	RequestID    string
	JudgeLatency time.Duration
}

Result represents a score and pass/fail outcome for one metric. Re-exported as gaugo.MetricResult.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL