Documentation
¶
Overview ¶
Package eval provides small, deterministic local evaluations that gate regressions in the three properties that matter most for anchored: retrieval recall, sync privacy safety, and project-identity resolution. The evals run offline (BM25-only, no embeddings, no network) so they fit in CI and on a dev laptop.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DefaultFixture ¶
DefaultFixture returns the embedded fixture bytes for a base name like "recall_basic.yaml", so the installed binary runs evals without the repo checked out. Callers may override with a file via --fixture.
func FixtureBytes ¶
FixtureBytes resolves fixture data: the file at path when non-empty, otherwise the embedded default named by defaultName.
Types ¶
type CaseResult ¶
type IdentityCase ¶
type IdentityCase struct {
Name string `yaml:"name"`
Origin string `yaml:"origin"`
// ExpectCanonical / ExpectLegacy are the keys the derivation must produce.
// Empty means "don't assert this field".
ExpectCanonical string `yaml:"expect_canonical"`
ExpectLegacy string `yaml:"expect_legacy"`
// SameAs names another case whose canonical key this one must equal (e.g.
// scp-syntax and https forms of the same repo resolve identically).
SameAs string `yaml:"same_as"`
// MustDiffer names another case whose canonical key this one must NOT equal.
MustDiffer string `yaml:"must_differ"`
}
type IdentityFixture ¶
type IdentityFixture struct {
Description string `yaml:"description"`
Cases []IdentityCase `yaml:"cases"`
}
IdentityFixture asserts remote-key derivation invariants from git origins.
type PrivacyFixture ¶
type PrivacyFixture struct {
Description string `yaml:"description"`
Items []PrivacyItem `yaml:"items"`
}
PrivacyFixture lists content that must never reach a remote, each with the expectation of being blocked or rewritten by the safety filter + sanitizer.
type PrivacyItem ¶
type PrivacyItem struct {
Name string `yaml:"name"`
Content string `yaml:"content"`
Metadata map[string]any `yaml:"metadata"`
// MustBlock requires the safety filter to block the item outright.
// MustRedact requires the sanitizer/filter to rewrite the content (secret
// or local path removed) even if not blocked.
MustBlock bool `yaml:"must_block"`
MustRedact bool `yaml:"must_redact"`
}
type RecallFixture ¶
type RecallFixture struct {
K int `yaml:"k"`
MinRecall float64 `yaml:"min_recall"`
Memories []RecallMemory `yaml:"memories"`
Queries []RecallQuery `yaml:"queries"`
Description string `yaml:"description"`
}
RecallFixture seeds a corpus and asserts Recall@K for a set of queries.
type RecallMemory ¶
type RecallQuery ¶
type RecallStore ¶
type RecallStore interface {
Save(ctx context.Context, m memory.Memory) error
Search(ctx context.Context, query string, opts memory.SearchOptions) ([]memory.SearchResult, error)
}
RecallStore is the slice of a memory store the recall eval needs: seed a corpus and search it. *memory.SQLiteStore satisfies this.
type Report ¶
type Report struct {
Name string `json:"name"`
Passed bool `json:"passed"`
Score float64 `json:"score"`
Summary string `json:"summary"`
Cases []CaseResult `json:"cases"`
}
Report is the common result of an eval run.
func RunIdentity ¶
RunIdentity asserts the project-identity invariants: a git origin derives a stable canonical (and legacy) remote key, equivalent URL forms (scp-syntax vs https) of the same repo resolve to the same canonical key, and distinct repos never collide. This is the pure-function core of "no silent project fallback" — if two repos collided on a key, sync could land memories in the wrong project.
func RunRecall ¶
RunRecall seeds the fixture's corpus into store, runs each query, and scores Recall@K against the expected memories. Recall@K = |retrieved∩expected| / |expected|. The run passes when every query meets its minimum recall.
func RunSyncSafety ¶
RunSyncSafety verifies that every item in the privacy fixture is stopped by the remote safety filter (+ sanitizer) before it could sync: blocked when must_block, and rewritten/redacted when must_redact. A single leak fails the run — this is the gate that protects against pushing secrets, local paths, or personal-scope content to a shared server.