gaugo

package module

v1.0.0 Latest Latest Go to latest Published: May 11, 2026 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/nnull13/gaugo

Links

Open Source Insights

README ¶

Gaugo

Go-native evaluations for AI applications, runnable through go test.

Gaugo lets Go teams evaluate RAG systems, agents, chatbots, and other AI-backed services with deterministic test cases, optional LLM judges, concurrent execution, and structured results that fit naturally into CI.

go get github.com/nnull13/gaugo

Why Gaugo

Capability	What it gives you
Native Go tests	Write AI evaluations as normal `testing` tests.
Deterministic reporting	Run cases concurrently while preserving registration order.
No-LLM checks	Catch required behavior with cheap `ExpectedContains` assertions.
LLM-judged metrics	Use structured-output judges for faithfulness and answer relevancy.
Programmatic runs	Use `Runner` to feed dashboards, CLIs, and internal pipelines.
Provider adapters	Start with OpenAI, Anthropic, Gemini, xAI, or a local model service.
Extension points	Bring your own judge, metric, or reporter.

Quickstart (No Provider Needed)

This is the smallest end-to-end evaluation you can run with go test.

package rag_test

import (
	"context"
	"testing"

	"github.com/nnull13/gaugo"
)

func TestEnterprisePricing(t *testing.T) {
	suite := gaugo.New(t)

	suite.Case("answer mentions sales",
		gaugo.Question("What is enterprise pricing?"),
		gaugo.ContextDocs(
			gaugo.Document{
				ID:   "pricing.md",
				Text: "Enterprise plans are custom and sold via sales.",
			},
		),
		gaugo.ExpectedContains("sales"),
	)

	suite.Assert(context.Background(), func(ctx context.Context, in gaugo.Input) (gaugo.Output, error) {
		return gaugo.Output{Answer: "Contact sales for enterprise pricing."}, nil
	})
}

Run it like any other Go test:

go test ./...

Add an LLM judge

package rag_test

import (
	"context"
	"os"
	"testing"
	"time"

	"github.com/nnull13/gaugo"
	"github.com/nnull13/gaugo/provider/openai"
)

func TestRAGQuality(t *testing.T) {
	apiKey := os.Getenv("OPENAI_API_KEY")
	if apiKey == "" {
		t.Skip("OPENAI_API_KEY is not set")
	}

	judge, err := openai.New(openai.Config{
		APIKey: apiKey,
		Model:  "gpt-4.1-mini",
	})
	if err != nil {
		t.Fatal(err)
	}

	suite := gaugo.New(t,
		gaugo.WithJudge(judge),
		gaugo.WithParallelism(8),
		gaugo.WithCaseTimeout(15*time.Second),
	)

	suite.Case("pricing answer",
		gaugo.Question("What is enterprise pricing?"),
		gaugo.ContextDocs(gaugo.Document{
			ID:   "pricing.md",
			Text: "Enterprise pricing is custom and handled by sales.",
		}),
		gaugo.ExpectedContains("sales"),
	)

	suite.Assert(context.Background(), yourGaugoEvaluation,
		gaugo.Faithfulness(gaugo.WithThreshold(0.8)),
		gaugo.AnswerRelevancy(gaugo.WithThreshold(0.7)),
	)
}

Hosted providers validate URLs in strict mode by default (https + official provider hosts). Use AllowUnsafeURL: true only for trusted local stubs or custom gateways.

yourGaugoEvaluation is your adapter:

func yourGaugoEvaluation(ctx context.Context, in gaugo.Input) (gaugo.Output, error) {
	answer, err := myApp.Answer(ctx, in.Question, in.Context)
	if err != nil {
		return gaugo.Output{}, err
	}
	return gaugo.Output{Answer: answer}, nil
}

Docs & Next Steps

I want to...	Go to
Browse all docs from one place	Documentation index
Write my first evaluation	Getting started
Understand the evaluation model	Concepts
Use Gaugo inside `go test`	Testing with Suite
Run evaluations from a CLI or pipeline	Programmatic Runner
Configure metrics and thresholds	Metrics reference
Choose and configure an LLM provider	Provider index (OpenAI, Anthropic, Gemini, xAI, Local)
Add a custom judge, metric, or reporter	Extending Gaugo
Debug a failure	Troubleshooting

License

See LICENSE.

_{If Gaugo helps your team ship safer AI, consider giving it a star.}

_{Crafted by NoName13.}

Documentation ¶

Overview ¶

Package gaugo provides an idiomatic Go testing harness for AI application evaluation.

The package is designed to integrate directly with Go's testing workflow:

go test ./...

Gaugo focuses on deterministic, concurrent, CI-friendly evaluations without external orchestration systems.

Index ¶

func Assert(t testing.TB, result RunResult)
type Case
type CaseOption
- func ContextDocs(docs ...Document) CaseOption
- func ExpectedContains(substr string) CaseOption
- func Question(question string) CaseOption
type CaseResult
type Document
type EvalInput
type Expected
type Input
type Judge
type JudgeRequest
type JudgeResponse
type Metric
- func AnswerRelevancy(opts ...MetricOption) Metric
- func Faithfulness(opts ...MetricOption) Metric
type MetricOption
- func WithThreshold(v float64) MetricOption
type MetricResult
type Option
- func WithCaseTimeout(d time.Duration) Option
- func WithJudge(j Judge) Option
- func WithMetricDetailsLimit(bytes int) Option
- func WithParallelism(n int) Option
- func WithReporter(r Reporter) Option
type Output
type Reporter
type RetryConfig
- func DefaultRetryConfig() RetryConfig
- func (cfg RetryConfig) Validate() error
type RunFunc
type RunResult
type Runner
- func NewRunner(opts ...Option) (*Runner, error)
- func (r *Runner) Case(name string, opts ...CaseOption) error
- func (r *Runner) Run(ctx context.Context, run RunFunc, metrics ...Metric) (RunResult, error)
type Suite
- func New(t testing.TB, opts ...Option) *Suite
- func (s *Suite) Assert(ctx context.Context, run RunFunc, metrics ...Metric)
- func (s *Suite) Case(name string, opts ...CaseOption)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Assert ¶

func Assert(t testing.TB, result RunResult)

Assert reports result failures through the Go testing package.

Types ¶

type Case ¶

type Case struct {
	Name     string
	Input    Input
	Expected Expected
}

Case defines one evaluation scenario.

type CaseOption ¶

type CaseOption func(*Case)

CaseOption mutates one Case definition.

func ContextDocs ¶

func ContextDocs(docs ...Document) CaseOption

ContextDocs sets retrieved context documents for a case.

func ExpectedContains ¶

func ExpectedContains(substr string) CaseOption

ExpectedContains requires the output answer to contain the given substring.

func Question ¶

func Question(question string) CaseOption

Question sets the user question for a case.

type CaseResult ¶

type CaseResult struct {
	Name     string
	Metrics  []MetricResult
	RunError error
	Elapsed  time.Duration
}

CaseResult contains execution and metric results for a single case.

type Document ¶

type Document struct {
	ID   string
	Text string
}

Document is one retrieved context record for a case.

type EvalInput ¶

type EvalInput struct {
	CaseName string
	Input    Input
	Output   Output
	Expected Expected
}

EvalInput is provided to metrics after a case run completes.

type Expected ¶

type Expected struct {
	Contains []string
}

Expected holds simple non-LLM assertions for a case.

type Input ¶

type Input struct {
	Question string
	Context  []Document
}

Input is the test input passed to a target system under evaluation.

type Judge ¶

type Judge interface {
	EvaluateJSON(ctx context.Context, req JudgeRequest) (JudgeResponse, error)
}

Judge evaluates metric prompts and must return strictly-structured JSON.

type JudgeRequest ¶

type JudgeRequest struct {
	Metric       string
	Question     string
	Answer       string
	ContextDocs  []Document
	Instructions string
	Schema       json.RawMessage
}

JudgeRequest describes a metric evaluation request sent to a Judge.

type JudgeResponse ¶

type JudgeResponse struct {
	RawJSON  []byte
	Provider string
	Model    string
	Latency  time.Duration
}

JudgeResponse contains the raw structured output from a Judge.

type Metric ¶

type Metric interface {
	Name() string
	Evaluate(ctx context.Context, in EvalInput, j Judge) (MetricResult, error)
}

Metric evaluates a completed case and returns a score plus pass/fail result.

func AnswerRelevancy ¶

func AnswerRelevancy(opts ...MetricOption) Metric

AnswerRelevancy returns a metric that scores whether the answer addresses the input.

func Faithfulness ¶

func Faithfulness(opts ...MetricOption) Metric

Faithfulness returns a metric that scores whether the answer is supported by context.

type MetricOption ¶

type MetricOption func(*metricConfig)

MetricOption configures a built-in metric.

func WithThreshold ¶

func WithThreshold(v float64) MetricOption

WithThreshold sets pass/fail threshold in [0,1].

type MetricResult ¶

type MetricResult struct {
	Name    string
	Score   float64
	Pass    bool
	Reason  string
	Details []byte
}

MetricResult represents a score and pass/fail outcome for one metric.

type Option ¶

type Option func(*config) error

Option configures a Runner or Suite.

func WithCaseTimeout ¶

func WithCaseTimeout(d time.Duration) Option

WithCaseTimeout applies a per-case timeout to run and metric evaluation.

func WithJudge ¶

func WithJudge(j Judge) Option

WithJudge configures the LLM judge used by metrics.

func WithMetricDetailsLimit ¶

func WithMetricDetailsLimit(bytes int) Option

WithMetricDetailsLimit caps stored metric detail bytes per metric result. Use 0 to disable details entirely.

func WithParallelism ¶

func WithParallelism(n int) Option

WithParallelism configures the maximum number of concurrent case executions.

func WithReporter ¶

func WithReporter(r Reporter) Option

WithReporter overrides the default testing reporter.

type Output ¶

type Output struct {
	Answer string
}

Output is the target system answer under evaluation.

type Reporter ¶

type Reporter interface {
	Report(ctx context.Context, result RunResult)
}

Reporter receives completed suite results without depending on testing.T.

type RetryConfig ¶

type RetryConfig struct {
	// MaxAttempts is the total number of attempts, including the first request.
	// Zero uses the package default.
	MaxAttempts int
	// BaseDelay is the fallback delay before the second attempt.
	// Zero uses the package default.
	BaseDelay time.Duration
	// MaxDelay caps exponential backoff and Retry-After delays.
	// Zero uses the package default.
	MaxDelay time.Duration
}

RetryConfig controls retries for transient provider HTTP failures.

func DefaultRetryConfig ¶

func DefaultRetryConfig() RetryConfig

DefaultRetryConfig returns the retry defaults used by bundled providers.

func (RetryConfig) Validate ¶

func (cfg RetryConfig) Validate() error

Validate reports whether cfg is internally consistent.

type RunFunc ¶

type RunFunc func(ctx context.Context, in Input) (Output, error)

RunFunc executes the system under test for one case.

type RunResult ¶

type RunResult struct {
	Cases []CaseResult
}

RunResult is the deterministic output of a suite execution.

type Runner ¶

type Runner struct {
	// contains filtered or unexported fields
}

Runner executes registered cases and returns structured results without depending on the testing package.

func NewRunner ¶

func NewRunner(opts ...Option) (*Runner, error)

NewRunner creates a programmatic evaluation runner.

func (*Runner) Case ¶

func (r *Runner) Case(name string, opts ...CaseOption) error

Case registers one evaluation case.

func (*Runner) Run ¶

func (r *Runner) Run(ctx context.Context, run RunFunc, metrics ...Metric) (RunResult, error)

Run executes all registered cases.

type Suite ¶

type Suite struct {
	// contains filtered or unexported fields
}

Suite is a testing wrapper around Runner.

func New ¶

func New(t testing.TB, opts ...Option) *Suite

New creates a testing Suite. Configuration errors fail the test immediately.

func (*Suite) Assert ¶

func (s *Suite) Assert(ctx context.Context, run RunFunc, metrics ...Metric)

Assert executes all registered cases and reports failures through testing.

func (*Suite) Case ¶

func (s *Suite) Case(name string, opts ...CaseOption)

Case registers one evaluation case and fails the test if it is invalid.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
jsonx
metrics/answerrelevancy
metrics/faithfulness
prompt
provider
provider/request
provider/validate
provider/wire
provider/wire/anthropic/messages
provider/wire/gemini/generatecontent
provider/wire/ollama/nativechat
provider/wire/openai/chat
provider/wire/openai/responses
runner
provider
anthropic
gemini
local
openai
xai

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL