Documentation
¶
Overview ¶
Package gaugo provides an idiomatic Go testing harness for AI application evaluation.
The package is designed to integrate directly with Go's testing workflow:
go test ./...
Gaugo focuses on deterministic, concurrent, CI-friendly evaluations without external orchestration systems.
Index ¶
- func Assert(t testing.TB, result RunResult)
- type Case
- type CaseOption
- type CaseResult
- type Document
- type EvalInput
- type Expected
- type Input
- type Judge
- type JudgeRequest
- type JudgeResponse
- type Metric
- type MetricOption
- type MetricResult
- type Option
- type Output
- type Reporter
- type RetryConfig
- type RunFunc
- type RunResult
- type Runner
- type Suite
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type CaseOption ¶
type CaseOption func(*Case)
CaseOption mutates one Case definition.
func ContextDocs ¶
func ContextDocs(docs ...Document) CaseOption
ContextDocs sets retrieved context documents for a case.
func ExpectedContains ¶
func ExpectedContains(substr string) CaseOption
ExpectedContains requires the output answer to contain the given substring.
func Question ¶
func Question(question string) CaseOption
Question sets the user question for a case.
type CaseResult ¶
type CaseResult struct {
Name string
Metrics []MetricResult
RunError error
Elapsed time.Duration
}
CaseResult contains execution and metric results for a single case.
type Expected ¶
type Expected struct {
Contains []string
}
Expected holds simple non-LLM assertions for a case.
type Judge ¶
type Judge interface {
EvaluateJSON(ctx context.Context, req JudgeRequest) (JudgeResponse, error)
}
Judge evaluates metric prompts and must return strictly-structured JSON.
type JudgeRequest ¶
type JudgeRequest struct {
Metric string
Question string
Answer string
ContextDocs []Document
Instructions string
Schema json.RawMessage
}
JudgeRequest describes a metric evaluation request sent to a Judge.
type JudgeResponse ¶
JudgeResponse contains the raw structured output from a Judge.
type Metric ¶
type Metric interface {
Name() string
Evaluate(ctx context.Context, in EvalInput, j Judge) (MetricResult, error)
}
Metric evaluates a completed case and returns a score plus pass/fail result.
func AnswerRelevancy ¶
func AnswerRelevancy(opts ...MetricOption) Metric
AnswerRelevancy returns a metric that scores whether the answer addresses the input.
func Faithfulness ¶
func Faithfulness(opts ...MetricOption) Metric
Faithfulness returns a metric that scores whether the answer is supported by context.
type MetricOption ¶
type MetricOption func(*metricConfig)
MetricOption configures a built-in metric.
func WithThreshold ¶
func WithThreshold(v float64) MetricOption
WithThreshold sets pass/fail threshold in [0,1].
type MetricResult ¶
MetricResult represents a score and pass/fail outcome for one metric.
type Option ¶
type Option func(*config) error
Option configures a Runner or Suite.
func WithCaseTimeout ¶
WithCaseTimeout applies a per-case timeout to run and metric evaluation.
func WithMetricDetailsLimit ¶
WithMetricDetailsLimit caps stored metric detail bytes per metric result. Use 0 to disable details entirely.
func WithParallelism ¶
WithParallelism configures the maximum number of concurrent case executions.
func WithReporter ¶
WithReporter overrides the default testing reporter.
type Output ¶
type Output struct {
Answer string
}
Output is the target system answer under evaluation.
type RetryConfig ¶
type RetryConfig struct {
// MaxAttempts is the total number of attempts, including the first request.
// Zero uses the package default.
MaxAttempts int
// BaseDelay is the fallback delay before the second attempt.
// Zero uses the package default.
BaseDelay time.Duration
// MaxDelay caps exponential backoff and Retry-After delays.
// Zero uses the package default.
MaxDelay time.Duration
}
RetryConfig controls retries for transient provider HTTP failures.
func DefaultRetryConfig ¶
func DefaultRetryConfig() RetryConfig
DefaultRetryConfig returns the retry defaults used by bundled providers.
func (RetryConfig) Validate ¶
func (cfg RetryConfig) Validate() error
Validate reports whether cfg is internally consistent.
type RunResult ¶
type RunResult struct {
Cases []CaseResult
}
RunResult is the deterministic output of a suite execution.
type Runner ¶
type Runner struct {
// contains filtered or unexported fields
}
Runner executes registered cases and returns structured results without depending on the testing package.
type Suite ¶
type Suite struct {
// contains filtered or unexported fields
}
Suite is a testing wrapper around Runner.
func (*Suite) Case ¶
func (s *Suite) Case(name string, opts ...CaseOption)
Case registers one evaluation case and fails the test if it is invalid.