report

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package report provides report generation functionality for evaluation results.

Overview

The report package offers different report generators that can process NamespacedObserver trees containing ResultCollector data to produce markdown-formatted evaluation reports.

Generator Types

All generators implement the Generator function type:

type Generator func(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)

Available generators:

  • Simple: Hierarchical report following namespace structure, showing pass rates, grades, and failures
  • ByEval: Report organized by evaluation type, then by model, then by test case (requires /{model}/{test case}/{eval} path structure)

Usage

import "chainguard.dev/driftlessaf/agents/evals/report"

// Create some evaluation data
obs := evals.NewNamespacedObserver(func(name string) *evals.ResultCollector {
	return evals.NewResultCollector(customObserver(name))
})

// Generate a simple hierarchical report
reportStr, hasFailures := report.Simple(obs, 0.8)
if hasFailures {
	fmt.Printf("Report:\n%s", reportStr)
}

// Generate a report organized by evaluation type
reportStr, hasFailures = report.ByEval(obs, 0.8)
if hasFailures {
	fmt.Printf("Report:\n%s", reportStr)
}

Report Format

Reports are generated in markdown format with:

  • Hierarchical headers based on namespace depth
  • Pass rates and average grades
  • Failure message lists
  • Below-threshold grade details

Thread Safety

All generators are safe for concurrent use as they are pure functions that do not modify their input parameters. Multiple goroutines can safely call any generator function simultaneously with the same or different observers.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func ByEval

func ByEval(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)

ByEval generates a report organized by evaluation, then by model, then by test case. Assumes paths follow the pattern /{model}/{test case}/{eval} for paths with results. Returns the report string and a boolean indicating if any evaluations fell below the threshold.

Example

ExampleByEval demonstrates basic usage of the ByEval report generator.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add evaluation data following /{model}/{test case}/{eval} pattern
	// Model: claude, Test case: security-check, Eval: no-vulnerabilities
	evalObs1 := obs.Child("claude").Child("security-check").Child("no-vulnerabilities")
	evalObs1.Fail("Buffer overflow detected")
	evalObs1.Increment()
	evalObs1.Increment()

	// Model: gemini, Test case: security-check, Eval: no-vulnerabilities
	evalObs2 := obs.Child("gemini").Child("security-check").Child("no-vulnerabilities")
	evalObs2.Increment()

	// Generate ByEval report with 80% threshold
	reportStr, hasFailures := report.ByEval(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}
Output:
Has failures: true
Report:
## Summary Table

| Evaluation Metric    | claude        | gemini      | Average  |
|----------------------|---------------|-------------|----------|
| no-vulnerabilities   | ❌ 50.0%      | 100.0%      | ❌ 75.0% |
|    └─ security-check | ❌ 0.50 (50%) | 1.00 (100%) | ❌ 75.0% |

no-vulnerabilities [❌ 66.7%] (2/3)
└ claude [❌ 50.0%] (1/2)
  └ security-check [❌ 50.0%] (1/2)
    └ 1 [FAIL] Buffer overflow detected
Example (MultipleEvaluations)

ExampleByEval_multipleEvaluations demonstrates multiple evaluations organized by eval type.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add data for multiple evaluations
	// Security evaluation - passing
	securityObs := obs.Child("claude").Child("auth-test").Child("security-check")
	securityObs.Increment()

	// Performance evaluation - passing
	perfObs := obs.Child("claude").Child("load-test").Child("performance-check")
	perfObs.Grade(0.85, "Good performance")
	perfObs.Increment()

	// Performance evaluation - failing (below 80% threshold)
	perfFailObs := obs.Child("claude").Child("stress-test").Child("performance-check")
	perfFailObs.Grade(0.65, "Performance issues under load")
	perfFailObs.Increment()

	// Generate report
	reportStr, hasFailures := report.ByEval(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}
Output:
Has failures: true
Report:
## Summary Table

| Evaluation Metric | claude         | Average  |
|-------------------|----------------|----------|
| performance-check | ❌ 2/2 (75.0%) | ❌ 75.0% |
|    └─ stress-test | ❌ 0.65 (65%)  | ❌ 65.0% |
| security-check    | 100.0%         | 100.0%   |

performance-check [❌ 0.75 avg] (2 results)
└ claude [❌ 0.75 avg] (2 results)
  └ stress-test [❌ 0.65 avg] (1 result)
    └ 1 [0.65] Performance issues under load
security-check [100.0%] (1/1)
Example (WithGrades)

ExampleByEval_withGrades demonstrates ByEval with graded evaluations.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add evaluation data with grades
	// Model: claude, Test case: code-quality, Eval: readability-score
	evalObs1 := obs.Child("claude").Child("code-quality").Child("readability-score")
	evalObs1.Grade(0.75, "Some improvements needed")
	evalObs1.Increment()

	// Model: gemini, Test case: code-quality, Eval: readability-score
	evalObs2 := obs.Child("gemini").Child("code-quality").Child("readability-score")
	evalObs2.Grade(0.90, "Very readable code")
	evalObs2.Increment()

	// Generate report with 80% threshold
	reportStr, hasFailures := report.ByEval(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}
Output:
Has failures: true
Report:
## Summary Table

| Evaluation Metric  | claude        | gemini     | Average |
|--------------------|---------------|------------|---------|
| readability-score  | ❌ 75.0%      | 90.0%      | 82.5%   |
|    └─ code-quality | ❌ 0.75 (75%) | 0.90 (90%) | 82.5%   |

readability-score [0.82 avg] (2 results)
└ claude [❌ 0.75 avg] (1 result)
  └ code-quality [❌ 0.75 avg] (1 result)
    └ 1 [0.75] Some improvements needed

func Simple

func Simple(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)

Simple walks a NamespacedObserver tree and generates a tree-based report showing pass rates, average grades, failures, and below-threshold grades. Returns the report string and a boolean indicating if any evaluations fell below the threshold.

Example

ExampleSimple demonstrates basic usage of the Simple report generator.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add some evaluation data
	testObs := obs.Child("security-tests")
	testObs.Fail("Buffer overflow detected")
	testObs.Grade(0.7, "Some security issues found")
	testObs.Increment()
	testObs.Increment()

	// Generate simple report with 80% threshold
	reportStr, hasFailures := report.Simple(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}
Output:
Has failures: true
Report:
security-tests [❌ 50.0% pass, 0.70 avg] (1/2)
├ 1 [FAIL] Buffer overflow detected
└ 2 [0.70] Some security issues found
Example (MultipleMetrics)

ExampleSimple_multipleMetrics demonstrates both pass rates and grades in the same report.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add test with both failures and grades
	testObs := obs.Child("integration-tests")
	testObs.Fail("Timeout in payment service")
	testObs.Grade(0.85, "Most functionality works")
	testObs.Grade(0.92, "Good error handling")
	testObs.Increment()
	testObs.Increment()
	testObs.Increment()

	// Generate report
	reportStr, hasFailures := report.Simple(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}
Output:
Has failures: true
Report:
integration-tests [❌ 66.7% pass, 0.89 avg] (2/3)
└ 1 [FAIL] Timeout in payment service
Example (NestedNamespaces)

ExampleSimple_nestedNamespaces demonstrates hierarchical namespace reporting.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Create nested evaluation structure
	frontend := obs.Child("frontend")
	frontendSecurity := frontend.Child("security")
	frontendSecurity.Grade(0.85, "Good security practices")
	frontendSecurity.Increment()

	backend := obs.Child("backend")
	backendSecurity := backend.Child("security")
	backendSecurity.Fail("SQL injection vulnerability")
	backendSecurity.Increment()

	// Generate report
	reportStr, _ := report.Simple(obs, 0.8)

	fmt.Printf("Report:\n%s", reportStr)

}
Output:
Report:
backend
└ security [❌ 0.0%] (0/1)
  └ 1 [FAIL] SQL injection vulnerability
frontend
└ security [0.85 avg] (1 result)
Example (SuccessfulEvaluation)

ExampleSimple_successfulEvaluation demonstrates a successful evaluation with no failures.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add successful evaluation data
	testObs := obs.Child("quality-check")
	testObs.Grade(0.95, "Excellent code quality")
	testObs.Grade(0.88, "Good test coverage")
	testObs.Increment()
	testObs.Increment()

	// Generate report with 80% threshold
	reportStr, hasFailures := report.Simple(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}
Output:
Has failures: false
Report:
quality-check [0.92 avg] (2 results)

Types

type Generator

type Generator func(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)

Generator is a function type that generates reports from a NamespacedObserver tree. It takes an observer tree and a threshold, returning a report string and a boolean indicating if any evaluations fell below the threshold.

Example

ExampleGenerator demonstrates using the Generator function type.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer with test data
	obs := evals.NewNamespacedObserver(factory)
	testObs := obs.Child("api-tests")
	testObs.Grade(0.9, "All endpoints working")
	testObs.Increment()

	// Use the Generator function type
	var generator report.Generator = report.Simple

	// Generate report using the function type
	reportStr, hasFailures := generator(obs, 0.8)

	fmt.Printf("Using Generator function type\n")
	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report contains: %t\n", len(reportStr) > 0)

}
Output:
Using Generator function type
Has failures: false
Report contains: true

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL