report

package

v0.5.0 Latest Latest Go to latest Published: Apr 18, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/driftlessaf/go-driftlessaf

Links

Open Source Insights

Documentation ¶

Overview ¶

Package report provides report generation functionality for evaluation results.

Overview ¶

The report package offers different report generators that can process NamespacedObserver trees containing ResultCollector data to produce markdown-formatted evaluation reports.

Generator Types ¶

All generators implement the Generator function type:

type Generator func(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)

Available generators:

Simple: Hierarchical report following namespace structure, showing pass rates, grades, and failures
ByEval: Report organized by evaluation type, then by model, then by test case (requires /{model}/{test case}/{eval} path structure)

Usage ¶

import "chainguard.dev/driftlessaf/agents/evals/report"

// Create some evaluation data
obs := evals.NewNamespacedObserver(func(name string) *evals.ResultCollector {
	return evals.NewResultCollector(customObserver(name))
})

// Generate a simple hierarchical report
reportStr, hasFailures := report.Simple(obs, 0.8)
if hasFailures {
	fmt.Printf("Report:\n%s", reportStr)
}

// Generate a report organized by evaluation type
reportStr, hasFailures = report.ByEval(obs, 0.8)
if hasFailures {
	fmt.Printf("Report:\n%s", reportStr)
}

Report Format ¶

Reports are generated in markdown format with:

Hierarchical headers based on namespace depth
Pass rates and average grades
Failure message lists
Below-threshold grade details

Thread Safety ¶

All generators are safe for concurrent use as they are pure functions that do not modify their input parameters. Multiple goroutines can safely call any generator function simultaneously with the same or different observers.

Index ¶

func ByEval(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)
func Simple(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)
type Generator

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ByEval ¶

func ByEval(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)

ByEval generates a report organized by evaluation, then by model, then by test case. Assumes paths follow the pattern /{model}/{test case}/{eval} for paths with results. Returns the report string and a boolean indicating if any evaluations fell below the threshold.

Example ¶

ExampleByEval demonstrates basic usage of the ByEval report generator.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add evaluation data following /{model}/{test case}/{eval} pattern
	// Model: claude, Test case: security-check, Eval: no-vulnerabilities
	evalObs1 := obs.Child("claude").Child("security-check").Child("no-vulnerabilities")
	evalObs1.Fail("Buffer overflow detected")
	evalObs1.Increment()
	evalObs1.Increment()

	// Model: gemini, Test case: security-check, Eval: no-vulnerabilities
	evalObs2 := obs.Child("gemini").Child("security-check").Child("no-vulnerabilities")
	evalObs2.Increment()

	// Generate ByEval report with 80% threshold
	reportStr, hasFailures := report.ByEval(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}

Output:
Has failures: true
Report:
## Summary Table

| Evaluation Metric    | claude        | gemini      | Average  |
|----------------------|---------------|-------------|----------|
| no-vulnerabilities   | ❌ 50.0%      | 100.0%      | ❌ 75.0% |
|    └─ security-check | ❌ 0.50 (50%) | 1.00 (100%) | ❌ 75.0% |

no-vulnerabilities [❌ 66.7%] (2/3)
└ claude [❌ 50.0%] (1/2)
  └ security-check [❌ 50.0%] (1/2)
    └ 1 [FAIL] Buffer overflow detected

Example (MultipleEvaluations) ¶

ExampleByEval_multipleEvaluations demonstrates multiple evaluations organized by eval type.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add data for multiple evaluations
	// Security evaluation - passing
	securityObs := obs.Child("claude").Child("auth-test").Child("security-check")
	securityObs.Increment()

	// Performance evaluation - passing
	perfObs := obs.Child("claude").Child("load-test").Child("performance-check")
	perfObs.Grade(0.85, "Good performance")
	perfObs.Increment()

	// Performance evaluation - failing (below 80% threshold)
	perfFailObs := obs.Child("claude").Child("stress-test").Child("performance-check")
	perfFailObs.Grade(0.65, "Performance issues under load")
	perfFailObs.Increment()

	// Generate report
	reportStr, hasFailures := report.ByEval(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}

Output:
Has failures: true
Report:
## Summary Table

| Evaluation Metric | claude         | Average  |
|-------------------|----------------|----------|
| performance-check | ❌ 2/2 (75.0%) | ❌ 75.0% |
|    └─ stress-test | ❌ 0.65 (65%)  | ❌ 65.0% |
| security-check    | 100.0%         | 100.0%   |

performance-check [❌ 0.75 avg] (2 results)
└ claude [❌ 0.75 avg] (2 results)
  └ stress-test [❌ 0.65 avg] (1 result)
    └ 1 [0.65] Performance issues under load
security-check [100.0%] (1/1)

Example (WithGrades) ¶

ExampleByEval_withGrades demonstrates ByEval with graded evaluations.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add evaluation data with grades
	// Model: claude, Test case: code-quality, Eval: readability-score
	evalObs1 := obs.Child("claude").Child("code-quality").Child("readability-score")
	evalObs1.Grade(0.75, "Some improvements needed")
	evalObs1.Increment()

	// Model: gemini, Test case: code-quality, Eval: readability-score
	evalObs2 := obs.Child("gemini").Child("code-quality").Child("readability-score")
	evalObs2.Grade(0.90, "Very readable code")
	evalObs2.Increment()

	// Generate report with 80% threshold
	reportStr, hasFailures := report.ByEval(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}

Output:
Has failures: true
Report:
## Summary Table

| Evaluation Metric  | claude        | gemini     | Average |
|--------------------|---------------|------------|---------|
| readability-score  | ❌ 75.0%      | 90.0%      | 82.5%   |
|    └─ code-quality | ❌ 0.75 (75%) | 0.90 (90%) | 82.5%   |

readability-score [0.82 avg] (2 results)
└ claude [❌ 0.75 avg] (1 result)
  └ code-quality [❌ 0.75 avg] (1 result)
    └ 1 [0.75] Some improvements needed

func Simple ¶

func Simple(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)

Simple walks a NamespacedObserver tree and generates a tree-based report showing pass rates, average grades, failures, and below-threshold grades. Returns the report string and a boolean indicating if any evaluations fell below the threshold.

Example ¶

ExampleSimple demonstrates basic usage of the Simple report generator.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add some evaluation data
	testObs := obs.Child("security-tests")
	testObs.Fail("Buffer overflow detected")
	testObs.Grade(0.7, "Some security issues found")
	testObs.Increment()
	testObs.Increment()

	// Generate simple report with 80% threshold
	reportStr, hasFailures := report.Simple(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}

Output:
Has failures: true
Report:
security-tests [❌ 50.0% pass, 0.70 avg] (1/2)
├ 1 [FAIL] Buffer overflow detected
└ 2 [0.70] Some security issues found

Example (MultipleMetrics) ¶

ExampleSimple_multipleMetrics demonstrates both pass rates and grades in the same report.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add test with both failures and grades
	testObs := obs.Child("integration-tests")
	testObs.Fail("Timeout in payment service")
	testObs.Grade(0.85, "Most functionality works")
	testObs.Grade(0.92, "Good error handling")
	testObs.Increment()
	testObs.Increment()
	testObs.Increment()

	// Generate report
	reportStr, hasFailures := report.Simple(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}

Output:
Has failures: true
Report:
integration-tests [❌ 66.7% pass, 0.89 avg] (2/3)
└ 1 [FAIL] Timeout in payment service

Example (NestedNamespaces) ¶

ExampleSimple_nestedNamespaces demonstrates hierarchical namespace reporting.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Create nested evaluation structure
	frontend := obs.Child("frontend")
	frontendSecurity := frontend.Child("security")
	frontendSecurity.Grade(0.85, "Good security practices")
	frontendSecurity.Increment()

	backend := obs.Child("backend")
	backendSecurity := backend.Child("security")
	backendSecurity.Fail("SQL injection vulnerability")
	backendSecurity.Increment()

	// Generate report
	reportStr, _ := report.Simple(obs, 0.8)

	fmt.Printf("Report:\n%s", reportStr)

}

Output:
Report:
backend
└ security [❌ 0.0%] (0/1)
  └ 1 [FAIL] SQL injection vulnerability
frontend
└ security [0.85 avg] (1 result)

Example (SuccessfulEvaluation) ¶

ExampleSimple_successfulEvaluation demonstrates a successful evaluation with no failures.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer
	obs := evals.NewNamespacedObserver(factory)

	// Add successful evaluation data
	testObs := obs.Child("quality-check")
	testObs.Grade(0.95, "Excellent code quality")
	testObs.Grade(0.88, "Good test coverage")
	testObs.Increment()
	testObs.Increment()

	// Generate report with 80% threshold
	reportStr, hasFailures := report.Simple(obs, 0.8)

	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report:\n%s", reportStr)

}

Output:
Has failures: false
Report:
quality-check [0.92 avg] (2 results)

Types ¶

type Generator ¶

type Generator func(obs *evals.NamespacedObserver[*evals.ResultCollector], threshold float64) (string, bool)

Generator is a function type that generates reports from a NamespacedObserver tree. It takes an observer tree and a threshold, returning a report string and a boolean indicating if any evaluations fell below the threshold.

Example ¶

ExampleGenerator demonstrates using the Generator function type.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/evals"
	"chainguard.dev/driftlessaf/agents/evals/report"
)

// exampleObserver implements evals.Observer for examples
type exampleObserver struct {
	name  string
	count int64
}

func (e *exampleObserver) Fail(msg string) {

}

func (e *exampleObserver) Log(msg string) {

}

func (e *exampleObserver) Grade(score float64, reasoning string) {

}

func (e *exampleObserver) Increment() {
	e.count++
}

func (e *exampleObserver) Total() int64 {
	return e.count
}

func main() {
	// Create a factory for result collectors
	factory := func(name string) *evals.ResultCollector {
		return evals.NewResultCollector(&exampleObserver{name: name})
	}

	// Create root observer with test data
	obs := evals.NewNamespacedObserver(factory)
	testObs := obs.Child("api-tests")
	testObs.Grade(0.9, "All endpoints working")
	testObs.Increment()

	// Use the Generator function type
	var generator report.Generator = report.Simple

	// Generate report using the function type
	reportStr, hasFailures := generator(obs, 0.8)

	fmt.Printf("Using Generator function type\n")
	fmt.Printf("Has failures: %t\n", hasFailures)
	fmt.Printf("Report contains: %t\n", len(reportStr) > 0)

}

Output:
Using Generator function type
Has failures: false
Report contains: true

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL