analyzer

package
v0.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2026 License: MIT Imports: 25 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultBatchSize = 500

DefaultBatchSize matches the Java side's tuned default (CLAUDE.md gotcha).

Variables

View Source
var DefaultExcludeDirs = map[string]bool{
	"node_modules": true, "build": true, "target": true, "dist": true,
	"out": true, "bin": true, "obj": true,
	".git": true, ".svn": true, ".idea": true, ".vscode": true,
	".eclipse": true, ".settings": true,
	"__pycache__": true, "venv": true, ".venv": true, ".tox": true,
	".mypy_cache": true, ".pytest_cache": true, ".eggs": true,
	".gradle": true, ".mvn": true,
	"bower_components": true, ".next": true, ".nuxt": true, "coverage": true,
	".nyc_output": true, ".parcel-cache": true, ".turbo": true, ".cache": true,
	"vendor":  true,
	".codeiq": true,
}

DefaultExcludeDirs mirrors the Java FileDiscovery.DEFAULT_EXCLUDES set.

Functions

This section is empty.

Types

type Analyzer

type Analyzer struct {
	// contains filtered or unexported fields
}

Analyzer orchestrates the index pipeline.

func NewAnalyzer

func NewAnalyzer(opts Options) *Analyzer

NewAnalyzer returns an analyzer wired to opts.

func (*Analyzer) Run

func (a *Analyzer) Run(root string) (Stats, error)

Run executes FileDiscovery → parse → detectors → GraphBuilder → cache writes and returns aggregate stats. Errors from individual file processing are logged to stderr but do not stop the run — partial output is better than no output (matches Java's per-file try/catch behaviour).

type DiscoveredFile

type DiscoveredFile struct {
	AbsPath  string
	RelPath  string // forward-slash, relative to root
	Language parser.Language
	Ext      string
}

DiscoveredFile is one file discovered for analysis.

type EnrichOptions

type EnrichOptions struct {
	// GraphDir overrides the Kuzu output directory. When "", the default
	// `<root>/.codeiq/graph/codeiq.kuzu` is used.
	GraphDir string
	// StoreBufferPoolBytes caps Kuzu's buffer pool. Zero -> graph package
	// default (2 GiB).
	StoreBufferPoolBytes uint64
	// StoreCopyThreads caps Kuzu COPY FROM parallelism. Zero -> graph
	// package default (min(4, GOMAXPROCS)).
	StoreCopyThreads uint64
}

EnrichOptions configures Enrich. The zero value is usable; GraphDir defaults to `<root>/.codeiq/graph/codeiq.kuzu` when empty.

type EnrichSummary

type EnrichSummary struct {
	Nodes    int
	Edges    int
	Services int
}

EnrichSummary reports per-run counters from a successful Enrich.

func Enrich

func Enrich(root string, c *cache.Cache, opts EnrichOptions) (EnrichSummary, error)

Enrich loads the SQLite cache for `root`, runs the linker / classifier / lexical / language-extractor / service-detector passes, bulk-loads the resulting graph into Kuzu, and creates the FTS-equivalent indexes. The returned summary reports total nodes / edges / service nodes after every pass has run.

Mirrors the `enrich` pipeline in Java (Analyzer.java + GraphStore.java). The pipeline order matches the Java side exactly:

  1. Linkers (TopicLinker, EntityLinker, ModuleContainmentLinker)
  2. LayerClassifier
  3. LexicalEnricher (doc comments + config keys)
  4. LanguageEnricher (Java, TypeScript, Python, Go extractors)
  5. ServiceDetector (filesystem walk for build files)
  6. graph.Store.BulkLoadNodes / BulkLoadEdges / CreateIndexes

All steps are deterministic — repeated calls against the same cache + root produce identical Kuzu output.

type FileDiscovery

type FileDiscovery struct{}

FileDiscovery walks a repo and emits language-tagged files. Uses `git ls-files -co --exclude-standard` first; falls back to fs walk.

func NewFileDiscovery

func NewFileDiscovery() *FileDiscovery

NewFileDiscovery returns a discovery instance.

func (*FileDiscovery) Discover

func (d *FileDiscovery) Discover(root string) ([]DiscoveredFile, error)

Discover walks root and returns files sorted by RelPath.

type GraphBuilder

type GraphBuilder struct {
	// contains filtered or unexported fields
}

GraphBuilder buffers detector results across batches. Concurrent-safe.

Phase 1 (plan §1.1, §1.2):

  • Nodes are deduped by ID via mergeNode (confidence-aware).
  • Edges are deduped by canonical (source, target, kind) key via mergeEdge.

Snapshot() produces a deterministic sorted view with phantom edges (those whose endpoint is still missing) dropped, and exposes the dedup/drop counts so the CLI can surface "deduped N, dropped K" diagnostics.

func NewGraphBuilder

func NewGraphBuilder() *GraphBuilder

NewGraphBuilder returns an empty builder.

func (*GraphBuilder) Add

func (b *GraphBuilder) Add(r *detector.Result)

Add merges a detector result. Duplicate node IDs and duplicate edge (source, target, kind) tuples collapse with confidence-aware merging.

func (*GraphBuilder) Snapshot

func (b *GraphBuilder) Snapshot() Snapshot

Snapshot returns the current state as a sorted, dangling-edge-free Snapshot with surfaced dedup/drop counts.

After this call returns, the builder's internal dedup maps are cleared (set to nil). This releases ~280 MB of reference pressure at ~/projects/ scale where the downstream enrich pipeline holds the returned Snapshot slices for the lifetime of the function — coexisting with the dedup maps was the largest in-memory duplication in the pipeline. Snapshot is therefore single-shot: subsequent calls to Snapshot or Add on the same builder are not supported.

type LayerClassifier

type LayerClassifier struct{}

LayerClassifier assigns a Layer value to every CodeNode based on (kind, framework, file_path) heuristics. Pure, deterministic, first-match wins. Priority order mirrors LayerClassifier.java:

  1. Node kind (frontend / backend / infra)
  2. Language (infra)
  3. File extension + path
  4. Framework
  5. Shared node kinds
  6. Fallback package/path heuristics + Java src/main convention

func (*LayerClassifier) Classify

func (c *LayerClassifier) Classify(nodes []*model.CodeNode)

Classify sets the Layer property on every node in the slice.

type Options

type Options struct {
	Cache     *cache.Cache
	Registry  *detector.Registry
	BatchSize int // defaults to DefaultBatchSize
	Workers   int // defaults to 2 * GOMAXPROCS
}

Options configures an Analyzer.

type ServiceDetectionResult

type ServiceDetectionResult struct {
	Nodes []*model.CodeNode
	Edges []*model.CodeEdge
}

ServiceDetectionResult holds the new SERVICE nodes and the CONTAINS edges produced by a Detect call. The Detect call also mutates the incoming `nodes` slice in place by stamping each node's `service` property.

type ServiceDetector

type ServiceDetector struct{}

ServiceDetector walks the filesystem for build files (30+ build systems) and emits SERVICE nodes with CONTAINS edges to their child nodes. Mirrors src/main/java/io/github/randomcodespace/iq/analyzer/ServiceDetector.java.

Filesystem-driven by design — not all build files produce CodeNodes during index, so we cannot rely on the node list alone.

func (*ServiceDetector) Detect

func (sd *ServiceDetector) Detect(nodes []*model.CodeNode, edges []*model.CodeEdge,
	projectDir string, projectRoot string) ServiceDetectionResult

Detect walks `projectRoot`, identifies module boundaries, creates SERVICE nodes and CONTAINS edges. `projectDir` is used as the fallback service name for the root module when no name can be extracted from the build file.

As a side effect, each node in `nodes` whose filePath falls under a detected module has its `service` property set to that service's label.

type Snapshot

type Snapshot struct {
	Nodes []*model.CodeNode
	Edges []*model.CodeEdge

	// DedupedNodes is the count of node emissions that collided with an
	// existing node ID and were merged in. Zero on a graph where no
	// detector double-emitted.
	DedupedNodes int
	// DedupedEdges is the same for edges by (source, target, kind).
	DedupedEdges int
	// DroppedEdges is the count of edges that had no matching source or
	// target node in the final node set — phantom references usually
	// caused by a linker pointing at a node that no detector emitted.
	DroppedEdges int
}

Snapshot is the deterministic, sorted view of buffered state with phantom edges (source or target node missing) dropped. It also exposes the count of duplicate emissions collapsed during Add() and the count of dangling edges dropped during this Snapshot call.

type Stats

type Stats struct {
	Files        int
	Nodes        int
	Edges        int
	DedupedNodes int
	DedupedEdges int
	DroppedEdges int
}

Stats reports per-run counts.

Plan §1.5 — DedupedNodes/DedupedEdges/DroppedEdges expose dedup activity so operators can see "graph collapsed 312 duplicate nodes, dropped 14 phantom edges" — the visibility is what makes "meaningful" diagnosable.

Directories

Path Synopsis
Package linker contains cross-file enrichers that run after detectors during `codeiq enrich`.
Package linker contains cross-file enrichers that run after detectors during `codeiq enrich`.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL