Documentation
¶
Overview ¶
Package gaugo provides an idiomatic Go testing harness for AI application evaluation.
Gaugo evaluates RAG pipelines and AI systems directly within Go's testing workflow, producing deterministic, concurrent, CI-friendly results without external orchestration.
Quick start ¶
Use Suite inside a standard Go test to register cases and assert metrics:
func TestRAG(t *testing.T) {
suite := gaugo.New(t, gaugo.WithJudge(judge))
suite.Case("basic",
gaugo.Question("What is Go?"),
gaugo.ContextDocs(gaugo.Doc("d1", "Go is a programming language.")),
)
suite.Assert(ctx, myRAG, gaugo.Faithfulness(), gaugo.AnswerRelevancy())
}
Programmatic usage ¶
Use Runner when you need structured results outside the testing framework:
runner, _ := gaugo.NewRunner(gaugo.WithJudge(judge))
runner.Case("example", gaugo.Question("Q?"), gaugo.ExpectedContains("answer"))
result, _ := runner.Run(ctx, myFunc)
fmt.Println(result.Summary())
Built-in metrics ¶
Built-in metrics cover RAG, safety, generation quality, structured output, instruction following, domain-specific checks, and deterministic contracts.
RAG and answer quality:
Safety and generation quality:
Structured output and deterministic checks:
Instruction and domain-specific metrics:
All metrics accept WithThreshold to set a custom pass/fail score in [0,1]. Metric interfaces, shared input/output types, and built-in constructors also live in the github.com/nnull13/gaugo/metric sub-package. The root package re-exports that public surface so callers can use either Faithfulness or metric.Faithfulness interchangeably.
Provider judges ¶
Metrics that require LLM evaluation use a Judge interface. Built-in adapters are provided for OpenAI, Anthropic, Gemini, xAI, and local models (Ollama). See the provider sub-packages for configuration details.
Index ¶
- Constants
- Variables
- func Assert(t testing.TB, result RunResult)
- type Case
- type CaseOption
- type CaseResult
- type Document
- type Error
- type ErrorCode
- type ErrorInfo
- type ErrorKind
- type EvalInput
- type Expected
- type Input
- type Judge
- type JudgeRequest
- type JudgeResponse
- type Metric
- func AnswerCorrectness(opts ...MetricOption) Metric
- func AnswerLength(opts ...MetricOption) Metric
- func AnswerRelevancy(opts ...MetricOption) Metric
- func AnswerSimilarity(opts ...MetricOption) Metric
- func Bias(opts ...MetricOption) Metric
- func CitationAccuracy(opts ...MetricOption) Metric
- func Coherence(opts ...MetricOption) Metric
- func Completeness(opts ...MetricOption) Metric
- func Conciseness(opts ...MetricOption) Metric
- func ContextPrecision(opts ...MetricOption) Metric
- func ContextRecall(opts ...MetricOption) Metric
- func ContextRelevancy(opts ...MetricOption) Metric
- func ExpectedJSON(opts ...MetricOption) Metric
- func ExpectedRegex(pattern string, opts ...MetricOption) Metric
- func Faithfulness(opts ...MetricOption) Metric
- func GEval(criteria string, opts ...MetricOption) Metric
- func Hallucination(opts ...MetricOption) Metric
- func InstructionAdherence(opts ...MetricOption) Metric
- func JSONValidity(opts ...MetricOption) Metric
- func Latency(opts ...MetricOption) Metric
- func SchemaCompliance(opts ...MetricOption) Metric
- func SummarizationQuality(opts ...MetricOption) Metric
- func Toxicity(opts ...MetricOption) Metric
- type MetricOption
- type MetricResult
- type Option
- type Output
- type Reporter
- type RetryConfig
- type RunFunc
- type RunResult
- type Runner
- type Suite
Constants ¶
const ( ErrorKindUnknown = failure.KindUnknown ErrorKindConfig = failure.KindConfig ErrorKindValidation = failure.KindValidation ErrorKindContextCanceled = failure.KindContextCanceled ErrorKindContextDeadline = failure.KindContextDeadline ErrorKindPanic = failure.KindPanic ErrorKindMetric = failure.KindMetric ErrorKindMetricParse = failure.KindMetricParse ErrorKindProviderRequest = failure.KindProviderRequest ErrorKindProviderAuth = failure.KindProviderAuth ErrorKindProviderRateLimit = failure.KindProviderRateLimit ErrorKindProviderResponse = failure.KindProviderResponse ErrorKindProviderRefusal = failure.KindProviderRefusal ErrorKindProviderTruncated = failure.KindProviderTruncated )
Variables ¶
var ( ErrConfig = failure.ErrConfig ErrValidation = failure.ErrValidation ErrMetric = failure.ErrMetric ErrMetricParse = failure.ErrMetricParse ErrProviderRequest = failure.ErrProviderRequest ErrProviderAuth = failure.ErrProviderAuth ErrProviderRateLimit = failure.ErrProviderRateLimit ErrProviderResponse = failure.ErrProviderResponse ErrProviderRefusal = failure.ErrProviderRefusal ErrProviderTruncated = failure.ErrProviderTruncated ErrPanic = failure.ErrPanic )
Functions ¶
Types ¶
type CaseOption ¶
type CaseOption func(*Case)
CaseOption mutates one Case definition.
func ContextDocs ¶
func ContextDocs(docs ...Document) CaseOption
ContextDocs sets retrieved context documents for a case.
func ExpectedAnswer ¶ added in v1.1.0
func ExpectedAnswer(answer string) CaseOption
ExpectedAnswer sets the reference answer for metrics that need ground truth.
func ExpectedContains ¶
func ExpectedContains(substr string) CaseOption
ExpectedContains requires the output answer to contain the given substring.
func ExpectedInstructions ¶ added in v1.1.0
func ExpectedInstructions(instructions string) CaseOption
ExpectedInstructions sets the reference instructions for instruction-following metrics.
func Question ¶
func Question(question string) CaseOption
Question sets the user question for a case.
type CaseResult ¶
type CaseResult struct {
Name string
Metrics []MetricResult
RunError error
Elapsed time.Duration
}
CaseResult contains execution and metric results for a single case.
func (CaseResult) Failed ¶ added in v1.1.0
func (c CaseResult) Failed() bool
Failed reports whether this case has a run error or any failing metric.
func (CaseResult) FailedMetrics ¶ added in v1.1.0
func (c CaseResult) FailedMetrics() []MetricResult
FailedMetrics returns only the metrics that did not pass.
func (CaseResult) MetricsByName ¶ added in v1.1.0
func (c CaseResult) MetricsByName(name string) []MetricResult
MetricsByName returns metrics matching the given name.
type Error ¶ added in v1.1.1
Error is Gaugo's typed operational error. It keeps a human message, structured integration metadata, and an optional wrapped cause.
type ErrorInfo ¶ added in v1.1.0
ErrorInfo is a redacted, structured description of an operational failure.
func ClassifyError ¶ added in v1.1.0
ClassifyError returns redacted operational metadata for err.
func MetricErrorInfo ¶ added in v1.1.0
func MetricErrorInfo(m MetricResult) (ErrorInfo, bool)
MetricErrorInfo extracts ErrorInfo stored in MetricResult details.
type ErrorKind ¶ added in v1.1.0
ErrorKind classifies operational failures separately from low quality scores.
type JudgeRequest ¶
type JudgeRequest = metric.JudgeRequest
type JudgeResponse ¶
type JudgeResponse = metric.JudgeResponse
type Metric ¶
func AnswerCorrectness ¶ added in v1.1.0
func AnswerCorrectness(opts ...MetricOption) Metric
AnswerCorrectness scores how well the answer matches Expected.Answer.
func AnswerLength ¶ added in v1.1.0
func AnswerLength(opts ...MetricOption) Metric
AnswerLength scores whether the answer length lies in [min,max] runes.
func AnswerRelevancy ¶
func AnswerRelevancy(opts ...MetricOption) Metric
AnswerRelevancy scores how well the answer addresses the question.
func AnswerSimilarity ¶ added in v1.1.0
func AnswerSimilarity(opts ...MetricOption) Metric
AnswerSimilarity scores Jaccard token overlap against Expected.Answer.
func Bias ¶ added in v1.1.0
func Bias(opts ...MetricOption) Metric
Bias scores how unbiased the answer is.
func CitationAccuracy ¶ added in v1.1.0
func CitationAccuracy(opts ...MetricOption) Metric
CitationAccuracy scores how accurate inline citations are against context.
func Coherence ¶ added in v1.1.0
func Coherence(opts ...MetricOption) Metric
Coherence scores how internally consistent the answer is.
func Completeness ¶ added in v1.1.0
func Completeness(opts ...MetricOption) Metric
Completeness scores how completely the answer addresses the question.
func Conciseness ¶ added in v1.1.0
func Conciseness(opts ...MetricOption) Metric
Conciseness scores how succinct the answer is without losing meaning.
func ContextPrecision ¶ added in v1.1.0
func ContextPrecision(opts ...MetricOption) Metric
ContextPrecision scores the fraction of context documents that are useful.
func ContextRecall ¶ added in v1.1.0
func ContextRecall(opts ...MetricOption) Metric
ContextRecall scores how much of the expected answer is supported by the context.
func ContextRelevancy ¶ added in v1.1.0
func ContextRelevancy(opts ...MetricOption) Metric
ContextRelevancy scores how relevant each context document is to the question.
func ExpectedJSON ¶ added in v1.1.0
func ExpectedJSON(opts ...MetricOption) Metric
ExpectedJSON scores how many expected JSON fields match the answer.
func ExpectedRegex ¶ added in v1.1.0
func ExpectedRegex(pattern string, opts ...MetricOption) Metric
ExpectedRegex scores whether the answer matches a Go regular expression.
func Faithfulness ¶
func Faithfulness(opts ...MetricOption) Metric
Faithfulness scores whether the answer is supported by the provided context.
func GEval ¶ added in v1.1.0
func GEval(criteria string, opts ...MetricOption) Metric
GEval scores along a free-form criteria string. Criteria must be non-empty.
func Hallucination ¶ added in v1.1.0
func Hallucination(opts ...MetricOption) Metric
Hallucination scores the fraction of claims that are not hallucinated.
func InstructionAdherence ¶ added in v1.1.0
func InstructionAdherence(opts ...MetricOption) Metric
InstructionAdherence scores how closely the answer followed Expected.Instructions.
func JSONValidity ¶ added in v1.1.0
func JSONValidity(opts ...MetricOption) Metric
JSONValidity scores whether the answer parses as valid JSON.
func Latency ¶ added in v1.1.0
func Latency(opts ...MetricOption) Metric
Latency scores whether the run elapsed within the configured maximum.
func SchemaCompliance ¶ added in v1.1.0
func SchemaCompliance(opts ...MetricOption) Metric
SchemaCompliance scores whether the answer matches the configured JSON schema.
func SummarizationQuality ¶ added in v1.1.0
func SummarizationQuality(opts ...MetricOption) Metric
SummarizationQuality scores summary coverage, fidelity and conciseness.
func Toxicity ¶ added in v1.1.0
func Toxicity(opts ...MetricOption) Metric
Toxicity scores how non-toxic the answer is.
type MetricOption ¶
func WithExpectedFields ¶ added in v1.1.0
func WithExpectedFields(fields map[string]any) MetricOption
WithExpectedFields sets the dotted JSON paths and expected values used by ExpectedJSON.
func WithMaxLatency ¶ added in v1.1.0
func WithMaxLatency(d time.Duration) MetricOption
WithMaxLatency sets the maximum allowed run latency for Latency.
func WithMaxLength ¶ added in v1.1.0
func WithMaxLength(n int) MetricOption
WithMaxLength sets the maximum allowed answer length in runes.
func WithMinLength ¶ added in v1.1.0
func WithMinLength(n int) MetricOption
WithMinLength sets the minimum allowed answer length in runes.
func WithSchema ¶ added in v1.1.0
func WithSchema(schema json.RawMessage) MetricOption
WithSchema sets the JSON Schema used by SchemaCompliance.
func WithThreshold ¶
func WithThreshold(v float64) MetricOption
WithThreshold sets the pass/fail threshold in [0,1]. Default is 0.7.
type MetricResult ¶
type Option ¶
type Option func(*config) error
Option configures a Runner or Suite.
func WithCaseTimeout ¶
WithCaseTimeout applies a per-case timeout to run and metric evaluation.
func WithMetricDetailsLimit ¶
WithMetricDetailsLimit caps stored metric detail bytes per metric result. Use 0 to disable details entirely.
func WithParallelism ¶
WithParallelism configures the maximum number of concurrent case executions.
func WithReporter ¶
WithReporter overrides the default testing reporter.
type RetryConfig ¶
type RetryConfig struct {
// MaxAttempts is the total number of attempts, including the first request.
// Zero uses the package default.
MaxAttempts int
// BaseDelay is the fallback delay before the second attempt.
// Zero uses the package default.
BaseDelay time.Duration
// MaxDelay caps exponential backoff and Retry-After delays.
// Zero uses the package default.
MaxDelay time.Duration
}
RetryConfig controls retries for transient provider HTTP failures.
func DefaultRetryConfig ¶
func DefaultRetryConfig() RetryConfig
DefaultRetryConfig returns the retry defaults used by bundled providers.
func (RetryConfig) Validate ¶
func (cfg RetryConfig) Validate() error
Validate reports whether cfg is internally consistent.
type RunResult ¶
type RunResult struct {
Cases []CaseResult
}
RunResult is the deterministic output of a suite execution.
func (RunResult) Failed ¶ added in v1.1.0
Failed reports whether any case has a run error or a failing metric.
type Runner ¶
type Runner struct {
// contains filtered or unexported fields
}
Runner executes registered cases and returns structured results without depending on the testing package.
type Suite ¶
type Suite struct {
// contains filtered or unexported fields
}
Suite is a testing wrapper around Runner.
func (*Suite) Case ¶
func (s *Suite) Case(name string, opts ...CaseOption)
Case registers one evaluation case and fails the test if it is invalid.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
internal
|
|
|
metrics
Package metrics provides validation helpers shared by every built-in judge-metric sub-package under internal/metrics/*.
|
Package metrics provides validation helpers shared by every built-in judge-metric sub-package under internal/metrics/*. |
|
strictjson
Package strictjson decodes JSON into Go values while rejecting unknown fields and trailing tokens.
|
Package strictjson decodes JSON into Go values while rejecting unknown fields and trailing tokens. |
|
Package metric defines the Metric interface used by gaugo to evaluate one case at a time, plus the built-in deterministic and LLM-judge metrics.
|
Package metric defines the Metric interface used by gaugo to evaluate one case at a time, plus the built-in deterministic and LLM-judge metrics. |
|
provider
|
|