codeanalysis

package
v0.0.0-...-13f862e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 22, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package codeanalysis is the core typed surface for Gemba's agentic code-analysis capability (gm-l1i, design docs/design/code-analysis.md).

A Provider is a READ-ONLY, pluggable knowledge-graph adapter that answers structural questions about one or more workspace repos: "what calls this symbol?", "what's the blast radius of changing this file?", "which modules are untested?". GitNexus is the reference backend; Sourcegraph, CodeQL, and tree-sitter / LSP indexers are anticipated adapters.

This package owns the boundary between configuration on disk and runtime backends:

What this package does NOT own:

  • Any concrete backend implementation. The `gitnexus` factory installed by this package returns ErrNotImplemented until gm-56z replaces it.
  • The four context providers (`code_analysis_summary` / `symbol_context` / `impact_analysis` / `health_report`) — those land in gm-bro and live under `internal/core/promptctx`.
  • Reindex policy execution. Config accepts the four policy strings (`post_merge`, `scheduled`, `manual`, `on_demand`) but does not run them; that's gm-x7n.
  • The HTTP / MCP API surface. Wiring lives elsewhere.

Stability: the Provider interface, Config schema, and registered backend name keys are part of Gemba's public typed surface. Adding methods to Provider is a breaking change for every backend; add capability flags to Manifest instead and gate optional methods behind those.

Index

Constants

View Source
const BackendGitNexus = "gitnexus"

BackendGitNexus is the canonical registered name for the GitNexus reference backend. Other adapters claim their own names ("sourcegraph", "codeql", …) when they ship.

Variables

View Source
var ErrNotImplemented = errors.New("codeanalysis: backend not implemented")

ErrNotImplemented is returned by registered factory placeholders when a backend has been reserved by name but not yet shipped. In particular the `"gitnexus"` slot returns this until gm-56z installs the real adapter.

Callers MUST treat this as a clean "backend present but not usable yet" signal — distinct from "unknown backend" (see ErrUnknownBackend) which means the registry has no slot for the requested name at all.

View Source
var ErrUnknownBackend = errors.New("codeanalysis: unknown backend")

ErrUnknownBackend is returned by Resolve when the requested name was never registered. Callers usually surface this as a configuration error: a `.gemba/code_analysis.toml` referencing a backend the binary doesn't know about.

Functions

func IsRegistered

func IsRegistered(name string) bool

IsRegistered reports whether name has a factory installed. Cheap; safe for hot paths.

func Register

func Register(name string, factory Factory) error

Register installs a factory under name. Returns an error when name is empty, factory is nil, or another factory is already registered under name. Re-registering the SAME factory under the same name is a no-op so init paths can be defensive — we compare the underlying function pointer via reflect.

Callers needing to swap a placeholder for a real backend (the canonical gm-56z replacement of the gitnexus stub) MUST call Replace instead.

func RegisteredBackends

func RegisteredBackends() []string

RegisteredBackends returns the list of registered backend names in ascending string order. Useful for diagnostics ("which backends does this binary know about?") and for validating a config's `default_backend` against the live registry.

func Replace

func Replace(name string, factory Factory) error

Replace installs factory under name, overwriting any prior entry. Used by gm-56z to swap the gitnexus placeholder for the real adapter at binary-init time. Returns an error when name is empty or factory is nil.

Replace is intentionally narrow — it does NOT compare against the prior factory. Callers SHOULD only use it when they know they're replacing a placeholder.

Types

type BackendConfig

type BackendConfig map[string]any

BackendConfig is the open key/value bag for a single backend's tuning ("embeddings = false", "max_repo_size_mb = 500"). The loader does NOT validate the contents; each backend's factory inspects what it cares about.

type CodeAnalysisSection

type CodeAnalysisSection struct {
	DefaultBackend string `toml:"default_backend" json:"default_backend"`
}

CodeAnalysisSection holds top-level cross-cutting settings. DefaultBackend is the backend used by any repo that does not declare its own; it MUST match a registered backend name.

type Config

type Config struct {
	CodeAnalysis CodeAnalysisSection      `toml:"code_analysis" json:"code_analysis"`
	Repos        []RepoConfig             `toml:"repos" json:"repos"`
	Backend      map[string]BackendConfig `toml:"backend" json:"backend,omitempty"`
}

Config is the parsed bundle of `code_analysis.toml`. Top- level CodeAnalysis carries cross-repo defaults; Repos is the per-repo list. Backend-specific knobs live in BackendConfig (a free-form map keyed by backend name) so adapters can extract their own settings without coupling to this package.

func DecodeConfig

func DecodeConfig(raw []byte) (*Config, error)

DecodeConfig parses raw TOML bytes into a Config, applies defaults, and validates. Split out from LoadConfig so tests can exercise the parsing path without touching disk.

func LoadConfig

func LoadConfig(path string) (*Config, error)

LoadConfig reads and parses path. Returns an empty (but valid) Config when path does not exist — workspaces opt in to code-analysis by authoring the file, and a missing file is not an error. Other I/O errors propagate.

func (*Config) RepoByName

func (c *Config) RepoByName(name string) (RepoConfig, bool)

RepoByName returns the configured repo with the matching name. The bool is false when no such repo is configured. Linear scan; the typical config has a handful of repos so no map index is warranted.

func (*Config) Validate

func (c *Config) Validate() error

Validate checks the parsed config for the four classes of error gm-1ak's loader is responsible for catching:

  1. default_backend names a backend the binary knows about.
  2. every repo declares a non-empty path.
  3. repo names are unique within the file.
  4. each repo's resolved backend is registered, and its reindex_policy (when set) is one of the four canonical values.

Returns the FIRST validation error encountered with enough detail to point the operator at the offending line.

type Factory

type Factory func() (Provider, error)

Factory builds a Provider. It returns the constructed provider (or nil) and an error. A factory that returns (nil, ErrNotImplemented) acts as a placeholder — the registry slot is reserved but the backend isn't wired yet.

Factories are invoked lazily at Resolve time, so a binary that never resolves a backend never pays its construction cost. Factories SHOULD be cheap and idempotent — callers resolving the same name multiple times will invoke the factory each time unless they cache the result themselves.

type HealthReport

type HealthReport struct {
	Repo             RepoRef       `json:"repo"`
	FetchedAt        time.Time     `json:"fetched_at"`
	StaleAfter       time.Duration `json:"stale_after,omitempty"`
	CycleCount       int           `json:"cycle_count,omitempty"`
	UntestedModules  []string      `json:"untested_modules,omitempty"`
	GodClasses       []SymbolRef   `json:"god_classes,omitempty"`
	UndocumentedAPIs []SymbolRef   `json:"undocumented_apis,omitempty"`
	StaleCode        []SymbolRef   `json:"stale_code,omitempty"`
	// Notes carries human-readable backend warnings ("index
	// older than expected", "embeddings disabled — health gaps
	// estimated"). UI surfaces these alongside the table.
	Notes []string `json:"notes,omitempty"`
}

HealthReport is the per-repo health surface Provider.Health returns. Fields are sliced by concern: cycles vs untested modules vs god-classes vs documentation gaps. Empty slices mean "the backend looked and found nothing"; nil slices mean "the backend doesn't surface this dimension". Callers that need to distinguish should check Manifest capability flags.

FetchedAt + StaleAfter together drive the "analysis from 12h ago; reindex recommended" warning the design calls for.

func (HealthReport) IsStale

func (h HealthReport) IsStale(now time.Time) bool

IsStale reports whether the report's age has exceeded its declared StaleAfter. Returns false when StaleAfter is zero (the backend declined to express a freshness window).

type ImpactDirection

type ImpactDirection string

ImpactDirection picks which side of the dependency graph an Provider.Impact call walks. "upstream" returns callers (what would BREAK if I change this); "downstream" returns callees (what does this depend on); "both" returns the union.

const (
	ImpactUpstream   ImpactDirection = "upstream"
	ImpactDownstream ImpactDirection = "downstream"
	ImpactBoth       ImpactDirection = "both"
)

func (ImpactDirection) IsValid

func (d ImpactDirection) IsValid() bool

IsValid reports whether d is one of the three canonical directions.

type ImpactReport

type ImpactReport struct {
	Target    string          `json:"target"`
	Direction ImpactDirection `json:"direction"`
	// Depth is the maximum hop count the report covers. Mirrors
	// the gitnexus depth model (1 = direct, 2 = transitive,
	// 3+ = far transitive).
	Depth    int         `json:"depth,omitempty"`
	Affected []SymbolRef `json:"affected,omitempty"`
	Risk     RiskLevel   `json:"risk,omitempty"`
	// Notes carries human-readable provenance ("4 callers in
	// `internal/walk`, 1 in `web/src/api`"). UI joins them with
	// newlines.
	Notes []string `json:"notes,omitempty"`
}

ImpactReport is the blast-radius output of Provider.Impact. Target is echoed back so a caller batching multiple queries can correlate. Direction records which side of the graph was walked. Affected lists the symbols within the requested depth; Risk is a coarse "low/medium/high/critical" classifier the backend computes from fan-in / fan-out / cluster crossings.

type Manifest

type Manifest struct {
	// Backend names the registered adapter ("gitnexus",
	// "sourcegraph", "codeql", custom string). MUST match the
	// key the backend was registered under.
	Backend string `json:"backend"`
	// Version is the backend implementation's version string.
	// Free-form — backends choose their own scheme.
	Version string `json:"version,omitempty"`
	// IndexedRepos lists every repo the backend currently has
	// an index for. Subset of the configured repos when an
	// initial index hasn't run yet.
	IndexedRepos []RepoRef `json:"indexed_repos,omitempty"`
	// Capability flags. Default false; backends opt in as they
	// implement each surface. See the per-method docstrings on
	// [Provider] for what each flag gates.
	SupportsSymbolContext bool `json:"supports_symbol_context,omitempty"`
	SupportsImpact        bool `json:"supports_impact,omitempty"`
	SupportsRouteMap      bool `json:"supports_route_map,omitempty"`
	SupportsEmbeddings    bool `json:"supports_embeddings,omitempty"`
	// MaxRepoSize is an advisory cap (bytes) above which the
	// backend declines to index. 0 means "no cap declared".
	MaxRepoSize int64 `json:"max_repo_size,omitempty"`
}

Manifest is a provider's self-description. Backends fill it in once and return it from Provider.Manifest; consumers inspect the capability flags before issuing optional queries.

Capability flags are advisory: a backend that returns SupportsImpact=false MUST still implement Provider.Impact, but MAY return an empty ImpactReport with a descriptive note. This keeps the interface flat while letting consumers short-circuit when they know a backend can't help.

type Module

type Module struct {
	Name         string   `json:"name"`
	Path         string   `json:"path"`
	Summary      string   `json:"summary,omitempty"`
	LOC          int      `json:"loc,omitempty"`
	FileCount    int      `json:"file_count,omitempty"`
	TestCoverage float64  `json:"test_coverage"`
	Dependencies []string `json:"dependencies,omitempty"`
}

Module is a code module's typed projection — the unit Provider.Modules returns. Name and Path are required; everything else is best-effort. LOC and FileCount are 0 when the backend hasn't computed them. TestCoverage is in [0, 1] when known and -1 when not measured (callers MUST treat negative values as "unknown", not "0% covered").

type Provider

type Provider interface {
	// Manifest returns the backend's self-description. SHOULD
	// be cheap (cached) — callers may invoke it on every
	// request to gate optional queries.
	Manifest(ctx context.Context) (Manifest, error)

	// ListRepos returns the configured repos this provider
	// will answer queries for. Order is implementation-defined
	// but stable across calls within a session.
	ListRepos(ctx context.Context) ([]RepoRef, error)

	// Reindex (re)builds the backend's index for repo. See
	// [ReindexFlags] for the knobs. Returns nil on success;
	// returns a descriptive error otherwise. Long-running —
	// callers SHOULD pass a context with a generous deadline.
	Reindex(ctx context.Context, repo RepoRef, flags ReindexFlags) error

	// Modules returns the module inventory for repo. Empty
	// slice when the index is empty or the backend does not
	// classify code into modules.
	Modules(ctx context.Context, repo RepoRef) ([]Module, error)

	// Health returns a fresh health report for repo. Backends
	// MAY cache internally; callers SHOULD treat the response
	// as a snapshot keyed off the backend's last index commit.
	Health(ctx context.Context, repo RepoRef) (HealthReport, error)

	// Impact returns the blast-radius for target inside repo.
	// target is the qualified symbol or file path the backend
	// understands; direction picks the side of the graph to
	// walk. Backends gated on Manifest.SupportsImpact=false
	// MUST still implement this — they MAY return an empty
	// report with a Notes entry instead of an error.
	Impact(ctx context.Context, repo RepoRef, target string, direction ImpactDirection) (ImpactReport, error)
}

Provider is the typed surface every code-analysis backend implements. Methods are READ-MOSTLY; only Provider.Reindex mutates the backend's state, and even that only updates the backend's own index store — never the working tree.

All methods are context-aware so callers can honour deadlines + cancellation when a backend's underlying RPC is slow or hung. Backends MUST respect ctx.Done().

Implementations SHOULD be safe for concurrent reads; they are not required to be safe for concurrent writes — callers MUST serialise Provider.Reindex against itself per RepoRef.

func Resolve

func Resolve(name string) (Provider, error)

Resolve returns a freshly-constructed Provider for the named backend. Returns ErrUnknownBackend when no factory is registered under name. Returns whatever error the factory returns when name is registered but the factory fails — in particular, the pre-registered gitnexus slot returns ErrNotImplemented until gm-56z replaces it.

type ReindexFlags

type ReindexFlags struct {
	Full           bool `json:"full,omitempty"`
	DryRun         bool `json:"dry_run,omitempty"`
	WithEmbeddings bool `json:"with_embeddings,omitempty"`
}

ReindexFlags controls a Provider.Reindex call. Full requests a from-scratch rebuild; otherwise the backend MAY run an incremental update against its existing index. DryRun asks the backend to report what it WOULD do without writing anything to its index store. WithEmbeddings opts the rebuild in to embedding generation when the backend supports it.

type ReindexPolicy

type ReindexPolicy string

ReindexPolicy names how a repo's index is refreshed. The four canonical values mirror the design's table; unrecognized values are rejected at config-load time.

const (
	// PolicyPostMerge re-indexes after every merge to main. Matches
	// the existing GitNexus PostToolUse hook pattern.
	PolicyPostMerge ReindexPolicy = "post_merge"
	// PolicyScheduled re-indexes on a fixed cadence (e.g. nightly).
	PolicyScheduled ReindexPolicy = "scheduled"
	// PolicyManual re-indexes only when an operator runs an
	// explicit `gemba code-analysis reindex` command.
	PolicyManual ReindexPolicy = "manual"
	// PolicyOnDemand re-indexes lazily — the consuming persona
	// triggers a reindex right before relying on a query.
	PolicyOnDemand ReindexPolicy = "on_demand"
)

func (ReindexPolicy) IsValid

func (p ReindexPolicy) IsValid() bool

IsValid reports whether p is one of the four canonical policies. Empty string returns false; callers wanting "use default" semantics should resolve the empty value before calling IsValid.

type RepoConfig

type RepoConfig struct {
	Name          string        `toml:"name" json:"name"`
	Path          string        `toml:"path" json:"path"`
	Remote        string        `toml:"remote,omitempty" json:"remote,omitempty"`
	Backend       string        `toml:"backend,omitempty" json:"backend,omitempty"`
	ReindexPolicy ReindexPolicy `toml:"reindex_policy,omitempty" json:"reindex_policy,omitempty"`
}

RepoConfig is one row of the `[[repos]]` array. Name is the operator-chosen handle (must be unique within the file). Path is the working-tree path (absolute or workspace-relative — the loader leaves it as written; the caller is responsible for resolving). Backend is optional and falls back to CodeAnalysis.DefaultBackend.

func (RepoConfig) ResolvedBackend

func (r RepoConfig) ResolvedBackend(defaultBackend string) string

ResolvedBackend returns the backend name to use for this repo — its own Backend if set, otherwise the cross-cutting default. Empty string means "no default and none declared", which Validate flags.

func (RepoConfig) ToRepoRef

func (r RepoConfig) ToRepoRef() RepoRef

ToRepoRef projects a parsed RepoConfig to a RepoRef for passing across the Provider interface.

type RepoRef

type RepoRef struct {
	Name   string `json:"name"`
	Path   string `json:"path"`
	Remote string `json:"remote,omitempty"`
}

RepoRef is the repository identity passed across the Provider interface. Name is the operator-chosen handle that matches the `[[repos]]` entry in `code_analysis.toml`. Path is the local working-tree path (absolute or workspace-relative — the loader resolves to absolute). Remote is the canonical upstream URL when known; backends that can attach to remote indexes use it.

type RiskLevel

type RiskLevel string

RiskLevel classifies an ImpactReport's severity. Empty string is "unclassified" — callers MAY treat that as low.

const (
	RiskLow      RiskLevel = "low"
	RiskMedium   RiskLevel = "medium"
	RiskHigh     RiskLevel = "high"
	RiskCritical RiskLevel = "critical"
)

type SymbolRef

type SymbolRef struct {
	Name string `json:"name"`
	File string `json:"file,omitempty"`
	Line int    `json:"line,omitempty"`
}

SymbolRef is a lightweight pointer to a symbol the backend has surfaced (e.g. a god-class, an undocumented API). The fields are deliberately narrow: Name is the qualified symbol identifier; File / Line locate it in the working tree when known.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL