gaugo

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2026 License: MIT Imports: 14 Imported by: 0

README

Gaugo logo

Gaugo

Go Reference

Go-native evaluations for AI applications, runnable through go test.

Gaugo lets Go teams evaluate RAG systems, agents, chatbots, and other AI-backed services with deterministic test cases, optional LLM judges, concurrent execution, and structured results that fit naturally into CI.

go get github.com/nnull13/gaugo

Why Gaugo

Capability What it gives you
Native Go tests Write AI evaluations as normal testing tests.
Deterministic reporting Run cases concurrently while preserving registration order.
No-LLM checks Catch required behavior with cheap ExpectedContains assertions.
LLM-judged metrics Use structured-output judges for faithfulness and answer relevancy.
Programmatic runs Use Runner to feed dashboards, CLIs, and internal pipelines.
Provider adapters Start with OpenAI, Anthropic, Gemini, xAI, or a local model service.
Extension points Bring your own judge, metric, or reporter.

Quickstart (No Provider Needed)

This is the smallest end-to-end evaluation you can run with go test.

package rag_test

import (
	"context"
	"testing"

	"github.com/nnull13/gaugo"
)

func TestEnterprisePricing(t *testing.T) {
	suite := gaugo.New(t)

	suite.Case("answer mentions sales",
		gaugo.Question("What is enterprise pricing?"),
		gaugo.ContextDocs(
			gaugo.Document{
				ID:   "pricing.md",
				Text: "Enterprise plans are custom and sold via sales.",
			},
		),
		gaugo.ExpectedContains("sales"),
	)

	suite.Assert(context.Background(), func(ctx context.Context, in gaugo.Input) (gaugo.Output, error) {
		return gaugo.Output{Answer: "Contact sales for enterprise pricing."}, nil
	})
}

Run it like any other Go test:

go test ./...

Add an LLM judge

package rag_test

import (
	"context"
	"os"
	"testing"
	"time"

	"github.com/nnull13/gaugo"
	"github.com/nnull13/gaugo/provider/openai"
)

func TestRAGQuality(t *testing.T) {
	apiKey := os.Getenv("OPENAI_API_KEY")
	if apiKey == "" {
		t.Skip("OPENAI_API_KEY is not set")
	}

	judge, err := openai.New(openai.Config{
		APIKey: apiKey,
		Model:  "gpt-4.1-mini",
	})
	if err != nil {
		t.Fatal(err)
	}

	suite := gaugo.New(t,
		gaugo.WithJudge(judge),
		gaugo.WithParallelism(8),
		gaugo.WithCaseTimeout(15*time.Second),
	)

	suite.Case("pricing answer",
		gaugo.Question("What is enterprise pricing?"),
		gaugo.ContextDocs(gaugo.Document{
			ID:   "pricing.md",
			Text: "Enterprise pricing is custom and handled by sales.",
		}),
		gaugo.ExpectedContains("sales"),
	)

	suite.Assert(context.Background(), yourGaugoEvaluation,
		gaugo.Faithfulness(gaugo.WithThreshold(0.8)),
		gaugo.AnswerRelevancy(gaugo.WithThreshold(0.7)),
	)
}

Hosted providers validate URLs in strict mode by default (https + official provider hosts). Use AllowUnsafeURL: true only for trusted local stubs or custom gateways.

yourGaugoEvaluation is your adapter:

func yourGaugoEvaluation(ctx context.Context, in gaugo.Input) (gaugo.Output, error) {
	answer, err := myApp.Answer(ctx, in.Question, in.Context)
	if err != nil {
		return gaugo.Output{}, err
	}
	return gaugo.Output{Answer: answer}, nil
}

Docs & Next Steps

I want to... Go to
Browse all docs from one place Documentation index
Write my first evaluation Getting started
Understand the evaluation model Concepts
Use Gaugo inside go test Testing with Suite
Run evaluations from a CLI or pipeline Programmatic Runner
Configure metrics and thresholds Metrics reference
Choose and configure an LLM provider Provider index (OpenAI, Anthropic, Gemini, xAI, Local)
Add a custom judge, metric, or reporter Extending Gaugo
Debug a failure Troubleshooting

License

See LICENSE.

If Gaugo helps your team ship safer AI, consider giving it a star.

Crafted by NoName13.

Documentation

Overview

Package gaugo provides an idiomatic Go testing harness for AI application evaluation.

The package is designed to integrate directly with Go's testing workflow:

go test ./...

Gaugo focuses on deterministic, concurrent, CI-friendly evaluations without external orchestration systems.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Assert

func Assert(t testing.TB, result RunResult)

Assert reports result failures through the Go testing package.

Types

type Case

type Case struct {
	Name     string
	Input    Input
	Expected Expected
}

Case defines one evaluation scenario.

type CaseOption

type CaseOption func(*Case)

CaseOption mutates one Case definition.

func ContextDocs

func ContextDocs(docs ...Document) CaseOption

ContextDocs sets retrieved context documents for a case.

func ExpectedContains

func ExpectedContains(substr string) CaseOption

ExpectedContains requires the output answer to contain the given substring.

func Question

func Question(question string) CaseOption

Question sets the user question for a case.

type CaseResult

type CaseResult struct {
	Name     string
	Metrics  []MetricResult
	RunError error
	Elapsed  time.Duration
}

CaseResult contains execution and metric results for a single case.

type Document

type Document struct {
	ID   string
	Text string
}

Document is one retrieved context record for a case.

type EvalInput

type EvalInput struct {
	CaseName string
	Input    Input
	Output   Output
	Expected Expected
}

EvalInput is provided to metrics after a case run completes.

type Expected

type Expected struct {
	Contains []string
}

Expected holds simple non-LLM assertions for a case.

type Input

type Input struct {
	Question string
	Context  []Document
}

Input is the test input passed to a target system under evaluation.

type Judge

type Judge interface {
	EvaluateJSON(ctx context.Context, req JudgeRequest) (JudgeResponse, error)
}

Judge evaluates metric prompts and must return strictly-structured JSON.

type JudgeRequest

type JudgeRequest struct {
	Metric       string
	Question     string
	Answer       string
	ContextDocs  []Document
	Instructions string
	Schema       json.RawMessage
}

JudgeRequest describes a metric evaluation request sent to a Judge.

type JudgeResponse

type JudgeResponse struct {
	RawJSON  []byte
	Provider string
	Model    string
	Latency  time.Duration
}

JudgeResponse contains the raw structured output from a Judge.

type Metric

type Metric interface {
	Name() string
	Evaluate(ctx context.Context, in EvalInput, j Judge) (MetricResult, error)
}

Metric evaluates a completed case and returns a score plus pass/fail result.

func AnswerRelevancy

func AnswerRelevancy(opts ...MetricOption) Metric

AnswerRelevancy returns a metric that scores whether the answer addresses the input.

func Faithfulness

func Faithfulness(opts ...MetricOption) Metric

Faithfulness returns a metric that scores whether the answer is supported by context.

type MetricOption

type MetricOption func(*metricConfig)

MetricOption configures a built-in metric.

func WithThreshold

func WithThreshold(v float64) MetricOption

WithThreshold sets pass/fail threshold in [0,1].

type MetricResult

type MetricResult struct {
	Name    string
	Score   float64
	Pass    bool
	Reason  string
	Details []byte
}

MetricResult represents a score and pass/fail outcome for one metric.

type Option

type Option func(*config) error

Option configures a Runner or Suite.

func WithCaseTimeout

func WithCaseTimeout(d time.Duration) Option

WithCaseTimeout applies a per-case timeout to run and metric evaluation.

func WithJudge

func WithJudge(j Judge) Option

WithJudge configures the LLM judge used by metrics.

func WithMetricDetailsLimit

func WithMetricDetailsLimit(bytes int) Option

WithMetricDetailsLimit caps stored metric detail bytes per metric result. Use 0 to disable details entirely.

func WithParallelism

func WithParallelism(n int) Option

WithParallelism configures the maximum number of concurrent case executions.

func WithReporter

func WithReporter(r Reporter) Option

WithReporter overrides the default testing reporter.

type Output

type Output struct {
	Answer string
}

Output is the target system answer under evaluation.

type Reporter

type Reporter interface {
	Report(ctx context.Context, result RunResult)
}

Reporter receives completed suite results without depending on testing.T.

type RetryConfig

type RetryConfig struct {
	// MaxAttempts is the total number of attempts, including the first request.
	// Zero uses the package default.
	MaxAttempts int
	// BaseDelay is the fallback delay before the second attempt.
	// Zero uses the package default.
	BaseDelay time.Duration
	// MaxDelay caps exponential backoff and Retry-After delays.
	// Zero uses the package default.
	MaxDelay time.Duration
}

RetryConfig controls retries for transient provider HTTP failures.

func DefaultRetryConfig

func DefaultRetryConfig() RetryConfig

DefaultRetryConfig returns the retry defaults used by bundled providers.

func (RetryConfig) Validate

func (cfg RetryConfig) Validate() error

Validate reports whether cfg is internally consistent.

type RunFunc

type RunFunc func(ctx context.Context, in Input) (Output, error)

RunFunc executes the system under test for one case.

type RunResult

type RunResult struct {
	Cases []CaseResult
}

RunResult is the deterministic output of a suite execution.

type Runner

type Runner struct {
	// contains filtered or unexported fields
}

Runner executes registered cases and returns structured results without depending on the testing package.

func NewRunner

func NewRunner(opts ...Option) (*Runner, error)

NewRunner creates a programmatic evaluation runner.

func (*Runner) Case

func (r *Runner) Case(name string, opts ...CaseOption) error

Case registers one evaluation case.

func (*Runner) Run

func (r *Runner) Run(ctx context.Context, run RunFunc, metrics ...Metric) (RunResult, error)

Run executes all registered cases.

type Suite

type Suite struct {
	// contains filtered or unexported fields
}

Suite is a testing wrapper around Runner.

func New

func New(t testing.TB, opts ...Option) *Suite

New creates a testing Suite. Configuration errors fail the test immediately.

func (*Suite) Assert

func (s *Suite) Assert(ctx context.Context, run RunFunc, metrics ...Metric)

Assert executes all registered cases and reports failures through testing.

func (*Suite) Case

func (s *Suite) Case(name string, opts ...CaseOption)

Case registers one evaluation case and fails the test if it is invalid.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL