eval

package
v0.7.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 8, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package eval provides small, deterministic local evaluations that gate regressions in the three properties that matter most for anchored: retrieval recall, sync privacy safety, and project-identity resolution. The evals run offline (BM25-only, no embeddings, no network) so they fit in CI and on a dev laptop.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DefaultFixture

func DefaultFixture(name string) ([]byte, error)

DefaultFixture returns the embedded fixture bytes for a base name like "recall_basic.yaml", so the installed binary runs evals without the repo checked out. Callers may override with a file via --fixture.

func FixtureBytes

func FixtureBytes(path, defaultName string) ([]byte, error)

FixtureBytes resolves fixture data: the file at path when non-empty, otherwise the embedded default named by defaultName.

Types

type CaseResult

type CaseResult struct {
	Name   string  `json:"name"`
	Passed bool    `json:"passed"`
	Score  float64 `json:"score,omitempty"`
	Detail string  `json:"detail,omitempty"`
}

type IdentityCase

type IdentityCase struct {
	Name   string `yaml:"name"`
	Origin string `yaml:"origin"`
	// ExpectCanonical / ExpectLegacy are the keys the derivation must produce.
	// Empty means "don't assert this field".
	ExpectCanonical string `yaml:"expect_canonical"`
	ExpectLegacy    string `yaml:"expect_legacy"`
	// SameAs names another case whose canonical key this one must equal (e.g.
	// scp-syntax and https forms of the same repo resolve identically).
	SameAs string `yaml:"same_as"`
	// MustDiffer names another case whose canonical key this one must NOT equal.
	MustDiffer string `yaml:"must_differ"`
}

type IdentityFixture

type IdentityFixture struct {
	Description string         `yaml:"description"`
	Cases       []IdentityCase `yaml:"cases"`
}

IdentityFixture asserts remote-key derivation invariants from git origins.

type PrivacyFixture

type PrivacyFixture struct {
	Description string        `yaml:"description"`
	Items       []PrivacyItem `yaml:"items"`
}

PrivacyFixture lists content that must never reach a remote, each with the expectation of being blocked or rewritten by the safety filter + sanitizer.

type PrivacyItem

type PrivacyItem struct {
	Name     string         `yaml:"name"`
	Content  string         `yaml:"content"`
	Metadata map[string]any `yaml:"metadata"`
	// MustBlock requires the safety filter to block the item outright.
	// MustRedact requires the sanitizer/filter to rewrite the content (secret
	// or local path removed) even if not blocked.
	MustBlock  bool `yaml:"must_block"`
	MustRedact bool `yaml:"must_redact"`
}

type RecallFixture

type RecallFixture struct {
	K           int            `yaml:"k"`
	MinRecall   float64        `yaml:"min_recall"`
	Memories    []RecallMemory `yaml:"memories"`
	Queries     []RecallQuery  `yaml:"queries"`
	Description string         `yaml:"description"`
}

RecallFixture seeds a corpus and asserts Recall@K for a set of queries.

type RecallMemory

type RecallMemory struct {
	Key      string   `yaml:"key"`
	Category string   `yaml:"category"`
	Content  string   `yaml:"content"`
	Keywords []string `yaml:"keywords"`
}

type RecallQuery

type RecallQuery struct {
	Query     string   `yaml:"query"`
	Expect    []string `yaml:"expect"`
	MinRecall float64  `yaml:"min_recall"` // overrides fixture MinRecall when > 0
}

type RecallStore

type RecallStore interface {
	Save(ctx context.Context, m memory.Memory) error
	Search(ctx context.Context, query string, opts memory.SearchOptions) ([]memory.SearchResult, error)
}

RecallStore is the slice of a memory store the recall eval needs: seed a corpus and search it. *memory.SQLiteStore satisfies this.

type Report

type Report struct {
	Name    string       `json:"name"`
	Passed  bool         `json:"passed"`
	Score   float64      `json:"score"`
	Summary string       `json:"summary"`
	Cases   []CaseResult `json:"cases"`
}

Report is the common result of an eval run.

func RunIdentity

func RunIdentity(fixture []byte) (Report, error)

RunIdentity asserts the project-identity invariants: a git origin derives a stable canonical (and legacy) remote key, equivalent URL forms (scp-syntax vs https) of the same repo resolve to the same canonical key, and distinct repos never collide. This is the pure-function core of "no silent project fallback" — if two repos collided on a key, sync could land memories in the wrong project.

func RunRecall

func RunRecall(ctx context.Context, store RecallStore, fixture []byte) (Report, error)

RunRecall seeds the fixture's corpus into store, runs each query, and scores Recall@K against the expected memories. Recall@K = |retrieved∩expected| / |expected|. The run passes when every query meets its minimum recall.

func RunSyncSafety

func RunSyncSafety(fixture []byte) (Report, error)

RunSyncSafety verifies that every item in the privacy fixture is stopped by the remote safety filter (+ sanitizer) before it could sync: blocked when must_block, and rewritten/redacted when must_redact. A single leak fails the run — this is the gate that protects against pushing secrets, local paths, or personal-scope content to a shared server.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL