Documentation
¶
Overview ¶
Package codeanalysis is the core typed surface for Gemba's agentic code-analysis capability (gm-l1i, design docs/design/code-analysis.md).
A Provider is a READ-ONLY, pluggable knowledge-graph adapter that answers structural questions about one or more workspace repos: "what calls this symbol?", "what's the blast radius of changing this file?", "which modules are untested?". GitNexus is the reference backend; Sourcegraph, CodeQL, and tree-sitter / LSP indexers are anticipated adapters.
This package owns the boundary between configuration on disk and runtime backends:
- Provider is the interface every backend implements.
- Manifest, RepoRef, Module, HealthReport, ImpactReport, and ReindexFlags are the shared value types passed across the interface.
- LoadConfig / Config parse `.gemba/code_analysis.toml` into a typed bundle of repos and backend settings.
- Register / Resolve are the backend-name-keyed registry callers reach for at startup. The `"gitnexus"` slot is wired here with a placeholder factory; the real adapter ships in gm-56z.
What this package does NOT own:
- Any concrete backend implementation. The `gitnexus` factory installed by this package returns ErrNotImplemented until gm-56z replaces it.
- The four context providers (`code_analysis_summary` / `symbol_context` / `impact_analysis` / `health_report`) — those land in gm-bro and live under `internal/core/promptctx`.
- Reindex policy execution. Config accepts the four policy strings (`post_merge`, `scheduled`, `manual`, `on_demand`) but does not run them; that's gm-x7n.
- The HTTP / MCP API surface. Wiring lives elsewhere.
Stability: the Provider interface, Config schema, and registered backend name keys are part of Gemba's public typed surface. Adding methods to Provider is a breaking change for every backend; add capability flags to Manifest instead and gate optional methods behind those.
Index ¶
- Constants
- Variables
- func IsRegistered(name string) bool
- func Register(name string, factory Factory) error
- func RegisteredBackends() []string
- func Replace(name string, factory Factory) error
- type BackendConfig
- type CodeAnalysisSection
- type Config
- type Factory
- type HealthReport
- type ImpactDirection
- type ImpactReport
- type Manifest
- type Module
- type Provider
- type ReindexFlags
- type ReindexPolicy
- type RepoConfig
- type RepoRef
- type RiskLevel
- type SymbolRef
Constants ¶
const BackendGitNexus = "gitnexus"
BackendGitNexus is the canonical registered name for the GitNexus reference backend. Other adapters claim their own names ("sourcegraph", "codeql", …) when they ship.
Variables ¶
var ErrNotImplemented = errors.New("codeanalysis: backend not implemented")
ErrNotImplemented is returned by registered factory placeholders when a backend has been reserved by name but not yet shipped. In particular the `"gitnexus"` slot returns this until gm-56z installs the real adapter.
Callers MUST treat this as a clean "backend present but not usable yet" signal — distinct from "unknown backend" (see ErrUnknownBackend) which means the registry has no slot for the requested name at all.
var ErrUnknownBackend = errors.New("codeanalysis: unknown backend")
ErrUnknownBackend is returned by Resolve when the requested name was never registered. Callers usually surface this as a configuration error: a `.gemba/code_analysis.toml` referencing a backend the binary doesn't know about.
Functions ¶
func IsRegistered ¶
IsRegistered reports whether name has a factory installed. Cheap; safe for hot paths.
func Register ¶
Register installs a factory under name. Returns an error when name is empty, factory is nil, or another factory is already registered under name. Re-registering the SAME factory under the same name is a no-op so init paths can be defensive — we compare the underlying function pointer via reflect.
Callers needing to swap a placeholder for a real backend (the canonical gm-56z replacement of the gitnexus stub) MUST call Replace instead.
func RegisteredBackends ¶
func RegisteredBackends() []string
RegisteredBackends returns the list of registered backend names in ascending string order. Useful for diagnostics ("which backends does this binary know about?") and for validating a config's `default_backend` against the live registry.
func Replace ¶
Replace installs factory under name, overwriting any prior entry. Used by gm-56z to swap the gitnexus placeholder for the real adapter at binary-init time. Returns an error when name is empty or factory is nil.
Replace is intentionally narrow — it does NOT compare against the prior factory. Callers SHOULD only use it when they know they're replacing a placeholder.
Types ¶
type BackendConfig ¶
BackendConfig is the open key/value bag for a single backend's tuning ("embeddings = false", "max_repo_size_mb = 500"). The loader does NOT validate the contents; each backend's factory inspects what it cares about.
type CodeAnalysisSection ¶
type CodeAnalysisSection struct {
DefaultBackend string `toml:"default_backend" json:"default_backend"`
}
CodeAnalysisSection holds top-level cross-cutting settings. DefaultBackend is the backend used by any repo that does not declare its own; it MUST match a registered backend name.
type Config ¶
type Config struct {
CodeAnalysis CodeAnalysisSection `toml:"code_analysis" json:"code_analysis"`
Repos []RepoConfig `toml:"repos" json:"repos"`
Backend map[string]BackendConfig `toml:"backend" json:"backend,omitempty"`
}
Config is the parsed bundle of `code_analysis.toml`. Top- level CodeAnalysis carries cross-repo defaults; Repos is the per-repo list. Backend-specific knobs live in BackendConfig (a free-form map keyed by backend name) so adapters can extract their own settings without coupling to this package.
func DecodeConfig ¶
DecodeConfig parses raw TOML bytes into a Config, applies defaults, and validates. Split out from LoadConfig so tests can exercise the parsing path without touching disk.
func LoadConfig ¶
LoadConfig reads and parses path. Returns an empty (but valid) Config when path does not exist — workspaces opt in to code-analysis by authoring the file, and a missing file is not an error. Other I/O errors propagate.
func (*Config) RepoByName ¶
func (c *Config) RepoByName(name string) (RepoConfig, bool)
RepoByName returns the configured repo with the matching name. The bool is false when no such repo is configured. Linear scan; the typical config has a handful of repos so no map index is warranted.
func (*Config) Validate ¶
Validate checks the parsed config for the four classes of error gm-1ak's loader is responsible for catching:
- default_backend names a backend the binary knows about.
- every repo declares a non-empty path.
- repo names are unique within the file.
- each repo's resolved backend is registered, and its reindex_policy (when set) is one of the four canonical values.
Returns the FIRST validation error encountered with enough detail to point the operator at the offending line.
type Factory ¶
Factory builds a Provider. It returns the constructed provider (or nil) and an error. A factory that returns (nil, ErrNotImplemented) acts as a placeholder — the registry slot is reserved but the backend isn't wired yet.
Factories are invoked lazily at Resolve time, so a binary that never resolves a backend never pays its construction cost. Factories SHOULD be cheap and idempotent — callers resolving the same name multiple times will invoke the factory each time unless they cache the result themselves.
type HealthReport ¶
type HealthReport struct {
Repo RepoRef `json:"repo"`
FetchedAt time.Time `json:"fetched_at"`
StaleAfter time.Duration `json:"stale_after,omitempty"`
CycleCount int `json:"cycle_count,omitempty"`
UntestedModules []string `json:"untested_modules,omitempty"`
GodClasses []SymbolRef `json:"god_classes,omitempty"`
UndocumentedAPIs []SymbolRef `json:"undocumented_apis,omitempty"`
StaleCode []SymbolRef `json:"stale_code,omitempty"`
// Notes carries human-readable backend warnings ("index
// older than expected", "embeddings disabled — health gaps
// estimated"). UI surfaces these alongside the table.
Notes []string `json:"notes,omitempty"`
}
HealthReport is the per-repo health surface Provider.Health returns. Fields are sliced by concern: cycles vs untested modules vs god-classes vs documentation gaps. Empty slices mean "the backend looked and found nothing"; nil slices mean "the backend doesn't surface this dimension". Callers that need to distinguish should check Manifest capability flags.
FetchedAt + StaleAfter together drive the "analysis from 12h ago; reindex recommended" warning the design calls for.
type ImpactDirection ¶
type ImpactDirection string
ImpactDirection picks which side of the dependency graph an Provider.Impact call walks. "upstream" returns callers (what would BREAK if I change this); "downstream" returns callees (what does this depend on); "both" returns the union.
const ( ImpactUpstream ImpactDirection = "upstream" ImpactDownstream ImpactDirection = "downstream" ImpactBoth ImpactDirection = "both" )
func (ImpactDirection) IsValid ¶
func (d ImpactDirection) IsValid() bool
IsValid reports whether d is one of the three canonical directions.
type ImpactReport ¶
type ImpactReport struct {
Target string `json:"target"`
Direction ImpactDirection `json:"direction"`
// Depth is the maximum hop count the report covers. Mirrors
// the gitnexus depth model (1 = direct, 2 = transitive,
// 3+ = far transitive).
Depth int `json:"depth,omitempty"`
Affected []SymbolRef `json:"affected,omitempty"`
Risk RiskLevel `json:"risk,omitempty"`
// Notes carries human-readable provenance ("4 callers in
// `internal/walk`, 1 in `web/src/api`"). UI joins them with
// newlines.
Notes []string `json:"notes,omitempty"`
}
ImpactReport is the blast-radius output of Provider.Impact. Target is echoed back so a caller batching multiple queries can correlate. Direction records which side of the graph was walked. Affected lists the symbols within the requested depth; Risk is a coarse "low/medium/high/critical" classifier the backend computes from fan-in / fan-out / cluster crossings.
type Manifest ¶
type Manifest struct {
// Backend names the registered adapter ("gitnexus",
// "sourcegraph", "codeql", custom string). MUST match the
// key the backend was registered under.
Backend string `json:"backend"`
// Version is the backend implementation's version string.
// Free-form — backends choose their own scheme.
Version string `json:"version,omitempty"`
// IndexedRepos lists every repo the backend currently has
// an index for. Subset of the configured repos when an
// initial index hasn't run yet.
IndexedRepos []RepoRef `json:"indexed_repos,omitempty"`
// Capability flags. Default false; backends opt in as they
// implement each surface. See the per-method docstrings on
// [Provider] for what each flag gates.
SupportsSymbolContext bool `json:"supports_symbol_context,omitempty"`
SupportsImpact bool `json:"supports_impact,omitempty"`
SupportsRouteMap bool `json:"supports_route_map,omitempty"`
SupportsEmbeddings bool `json:"supports_embeddings,omitempty"`
// MaxRepoSize is an advisory cap (bytes) above which the
// backend declines to index. 0 means "no cap declared".
MaxRepoSize int64 `json:"max_repo_size,omitempty"`
}
Manifest is a provider's self-description. Backends fill it in once and return it from Provider.Manifest; consumers inspect the capability flags before issuing optional queries.
Capability flags are advisory: a backend that returns SupportsImpact=false MUST still implement Provider.Impact, but MAY return an empty ImpactReport with a descriptive note. This keeps the interface flat while letting consumers short-circuit when they know a backend can't help.
type Module ¶
type Module struct {
Name string `json:"name"`
Path string `json:"path"`
Summary string `json:"summary,omitempty"`
LOC int `json:"loc,omitempty"`
FileCount int `json:"file_count,omitempty"`
TestCoverage float64 `json:"test_coverage"`
Dependencies []string `json:"dependencies,omitempty"`
}
Module is a code module's typed projection — the unit Provider.Modules returns. Name and Path are required; everything else is best-effort. LOC and FileCount are 0 when the backend hasn't computed them. TestCoverage is in [0, 1] when known and -1 when not measured (callers MUST treat negative values as "unknown", not "0% covered").
type Provider ¶
type Provider interface {
// Manifest returns the backend's self-description. SHOULD
// be cheap (cached) — callers may invoke it on every
// request to gate optional queries.
Manifest(ctx context.Context) (Manifest, error)
// ListRepos returns the configured repos this provider
// will answer queries for. Order is implementation-defined
// but stable across calls within a session.
ListRepos(ctx context.Context) ([]RepoRef, error)
// Reindex (re)builds the backend's index for repo. See
// [ReindexFlags] for the knobs. Returns nil on success;
// returns a descriptive error otherwise. Long-running —
// callers SHOULD pass a context with a generous deadline.
Reindex(ctx context.Context, repo RepoRef, flags ReindexFlags) error
// Modules returns the module inventory for repo. Empty
// slice when the index is empty or the backend does not
// classify code into modules.
Modules(ctx context.Context, repo RepoRef) ([]Module, error)
// Health returns a fresh health report for repo. Backends
// MAY cache internally; callers SHOULD treat the response
// as a snapshot keyed off the backend's last index commit.
Health(ctx context.Context, repo RepoRef) (HealthReport, error)
// Impact returns the blast-radius for target inside repo.
// target is the qualified symbol or file path the backend
// understands; direction picks the side of the graph to
// walk. Backends gated on Manifest.SupportsImpact=false
// MUST still implement this — they MAY return an empty
// report with a Notes entry instead of an error.
Impact(ctx context.Context, repo RepoRef, target string, direction ImpactDirection) (ImpactReport, error)
}
Provider is the typed surface every code-analysis backend implements. Methods are READ-MOSTLY; only Provider.Reindex mutates the backend's state, and even that only updates the backend's own index store — never the working tree.
All methods are context-aware so callers can honour deadlines + cancellation when a backend's underlying RPC is slow or hung. Backends MUST respect ctx.Done().
Implementations SHOULD be safe for concurrent reads; they are not required to be safe for concurrent writes — callers MUST serialise Provider.Reindex against itself per RepoRef.
func Resolve ¶
Resolve returns a freshly-constructed Provider for the named backend. Returns ErrUnknownBackend when no factory is registered under name. Returns whatever error the factory returns when name is registered but the factory fails — in particular, the pre-registered gitnexus slot returns ErrNotImplemented until gm-56z replaces it.
type ReindexFlags ¶
type ReindexFlags struct {
Full bool `json:"full,omitempty"`
DryRun bool `json:"dry_run,omitempty"`
WithEmbeddings bool `json:"with_embeddings,omitempty"`
}
ReindexFlags controls a Provider.Reindex call. Full requests a from-scratch rebuild; otherwise the backend MAY run an incremental update against its existing index. DryRun asks the backend to report what it WOULD do without writing anything to its index store. WithEmbeddings opts the rebuild in to embedding generation when the backend supports it.
type ReindexPolicy ¶
type ReindexPolicy string
ReindexPolicy names how a repo's index is refreshed. The four canonical values mirror the design's table; unrecognized values are rejected at config-load time.
const ( // PolicyPostMerge re-indexes after every merge to main. Matches // the existing GitNexus PostToolUse hook pattern. PolicyPostMerge ReindexPolicy = "post_merge" // PolicyScheduled re-indexes on a fixed cadence (e.g. nightly). PolicyScheduled ReindexPolicy = "scheduled" // PolicyManual re-indexes only when an operator runs an // explicit `gemba code-analysis reindex` command. PolicyManual ReindexPolicy = "manual" // PolicyOnDemand re-indexes lazily — the consuming persona // triggers a reindex right before relying on a query. PolicyOnDemand ReindexPolicy = "on_demand" )
func (ReindexPolicy) IsValid ¶
func (p ReindexPolicy) IsValid() bool
IsValid reports whether p is one of the four canonical policies. Empty string returns false; callers wanting "use default" semantics should resolve the empty value before calling IsValid.
type RepoConfig ¶
type RepoConfig struct {
Name string `toml:"name" json:"name"`
Path string `toml:"path" json:"path"`
Remote string `toml:"remote,omitempty" json:"remote,omitempty"`
Backend string `toml:"backend,omitempty" json:"backend,omitempty"`
ReindexPolicy ReindexPolicy `toml:"reindex_policy,omitempty" json:"reindex_policy,omitempty"`
}
RepoConfig is one row of the `[[repos]]` array. Name is the operator-chosen handle (must be unique within the file). Path is the working-tree path (absolute or workspace-relative — the loader leaves it as written; the caller is responsible for resolving). Backend is optional and falls back to CodeAnalysis.DefaultBackend.
func (RepoConfig) ResolvedBackend ¶
func (r RepoConfig) ResolvedBackend(defaultBackend string) string
ResolvedBackend returns the backend name to use for this repo — its own Backend if set, otherwise the cross-cutting default. Empty string means "no default and none declared", which Validate flags.
type RepoRef ¶
type RepoRef struct {
Name string `json:"name"`
Path string `json:"path"`
Remote string `json:"remote,omitempty"`
}
RepoRef is the repository identity passed across the Provider interface. Name is the operator-chosen handle that matches the `[[repos]]` entry in `code_analysis.toml`. Path is the local working-tree path (absolute or workspace-relative — the loader resolves to absolute). Remote is the canonical upstream URL when known; backends that can attach to remote indexes use it.
type RiskLevel ¶
type RiskLevel string
RiskLevel classifies an ImpactReport's severity. Empty string is "unclassified" — callers MAY treat that as low.
type SymbolRef ¶
type SymbolRef struct {
Name string `json:"name"`
File string `json:"file,omitempty"`
Line int `json:"line,omitempty"`
}
SymbolRef is a lightweight pointer to a symbol the backend has surfaced (e.g. a god-class, an undocumented API). The fields are deliberately narrow: Name is the qualified symbol identifier; File / Line locate it in the working tree when known.