analyzer

package

v0.4.2 Latest Latest Go to latest Published: May 14, 2026 License: MIT Imports: 25 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/randomcodespace/codeiq

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
Variables
type Analyzer
- func NewAnalyzer(opts Options) *Analyzer
- func (a *Analyzer) Run(root string) (Stats, error)
type DiscoveredFile
type EnrichOptions
type EnrichSummary
- func Enrich(root string, c *cache.Cache, opts EnrichOptions) (EnrichSummary, error)
type FileDiscovery
- func NewFileDiscovery() *FileDiscovery
- func (d *FileDiscovery) Discover(root string) ([]DiscoveredFile, error)
type GraphBuilder
- func NewGraphBuilder() *GraphBuilder
- func (b *GraphBuilder) Add(r *detector.Result)
- func (b *GraphBuilder) Snapshot() Snapshot
type LayerClassifier
- func (c *LayerClassifier) Classify(nodes []*model.CodeNode)
type Options
type ServiceDetectionResult
type ServiceDetector
- func (sd *ServiceDetector) Detect(nodes []*model.CodeNode, edges []*model.CodeEdge, projectDir string, ...) ServiceDetectionResult
type Snapshot
type Stats

Constants ¶

View Source

const DefaultBatchSize = 500

DefaultBatchSize matches the Java side's tuned default (CLAUDE.md gotcha).

Variables ¶

View Source

var DefaultExcludeDirs = map[string]bool{
	"node_modules": true, "build": true, "target": true, "dist": true,
	"out": true, "bin": true, "obj": true,
	".git": true, ".svn": true, ".idea": true, ".vscode": true,
	".eclipse": true, ".settings": true,
	"__pycache__": true, "venv": true, ".venv": true, ".tox": true,
	".mypy_cache": true, ".pytest_cache": true, ".eggs": true,
	".gradle": true, ".mvn": true,
	"bower_components": true, ".next": true, ".nuxt": true, "coverage": true,
	".nyc_output": true, ".parcel-cache": true, ".turbo": true, ".cache": true,
	"vendor":  true,
	".codeiq": true,
}

DefaultExcludeDirs mirrors the Java FileDiscovery.DEFAULT_EXCLUDES set.

Functions ¶

This section is empty.

Types ¶

type Analyzer ¶

type Analyzer struct {
	// contains filtered or unexported fields
}

Analyzer orchestrates the index pipeline.

func NewAnalyzer ¶

func NewAnalyzer(opts Options) *Analyzer

NewAnalyzer returns an analyzer wired to opts.

func (*Analyzer) Run ¶

func (a *Analyzer) Run(root string) (Stats, error)

Run executes FileDiscovery → parse → detectors → GraphBuilder → cache writes and returns aggregate stats. Errors from individual file processing are logged to stderr but do not stop the run — partial output is better than no output (matches Java's per-file try/catch behaviour).

type DiscoveredFile ¶

type DiscoveredFile struct {
	AbsPath  string
	RelPath  string // forward-slash, relative to root
	Language parser.Language
	Ext      string
}

DiscoveredFile is one file discovered for analysis.

type EnrichOptions ¶

type EnrichOptions struct {
	// GraphDir overrides the Kuzu output directory. When "", the default
	// `<root>/.codeiq/graph/codeiq.kuzu` is used.
	GraphDir string
	// StoreBufferPoolBytes caps Kuzu's buffer pool. Zero -> graph package
	// default (2 GiB).
	StoreBufferPoolBytes uint64
	// StoreCopyThreads caps Kuzu COPY FROM parallelism. Zero -> graph
	// package default (min(4, GOMAXPROCS)).
	StoreCopyThreads uint64
}

EnrichOptions configures Enrich. The zero value is usable; GraphDir defaults to `<root>/.codeiq/graph/codeiq.kuzu` when empty.

type EnrichSummary ¶

type EnrichSummary struct {
	Nodes    int
	Edges    int
	Services int
}

EnrichSummary reports per-run counters from a successful Enrich.

func Enrich ¶

func Enrich(root string, c *cache.Cache, opts EnrichOptions) (EnrichSummary, error)

Enrich loads the SQLite cache for `root`, runs the linker / classifier / lexical / language-extractor / service-detector passes, bulk-loads the resulting graph into Kuzu, and creates the FTS-equivalent indexes. The returned summary reports total nodes / edges / service nodes after every pass has run.

Mirrors the `enrich` pipeline in Java (Analyzer.java + GraphStore.java). The pipeline order matches the Java side exactly:

Linkers (TopicLinker, EntityLinker, ModuleContainmentLinker)
LayerClassifier
LexicalEnricher (doc comments + config keys)
LanguageEnricher (Java, TypeScript, Python, Go extractors)
ServiceDetector (filesystem walk for build files)
graph.Store.BulkLoadNodes / BulkLoadEdges / CreateIndexes

All steps are deterministic — repeated calls against the same cache + root produce identical Kuzu output.

type FileDiscovery ¶

type FileDiscovery struct{}

FileDiscovery walks a repo and emits language-tagged files. Uses `git ls-files -co --exclude-standard` first; falls back to fs walk.

func NewFileDiscovery ¶

func NewFileDiscovery() *FileDiscovery

NewFileDiscovery returns a discovery instance.

func (*FileDiscovery) Discover ¶

func (d *FileDiscovery) Discover(root string) ([]DiscoveredFile, error)

Discover walks root and returns files sorted by RelPath.

type GraphBuilder ¶

type GraphBuilder struct {
	// contains filtered or unexported fields
}

GraphBuilder buffers detector results across batches. Concurrent-safe.

Phase 1 (plan §1.1, §1.2):

Nodes are deduped by ID via mergeNode (confidence-aware).
Edges are deduped by canonical (source, target, kind) key via mergeEdge.

Snapshot() produces a deterministic sorted view with phantom edges (those whose endpoint is still missing) dropped, and exposes the dedup/drop counts so the CLI can surface "deduped N, dropped K" diagnostics.

func NewGraphBuilder ¶

func NewGraphBuilder() *GraphBuilder

NewGraphBuilder returns an empty builder.

func (*GraphBuilder) Add ¶

func (b *GraphBuilder) Add(r *detector.Result)

Add merges a detector result. Duplicate node IDs and duplicate edge (source, target, kind) tuples collapse with confidence-aware merging.

func (*GraphBuilder) Snapshot ¶

func (b *GraphBuilder) Snapshot() Snapshot

Snapshot returns the current state as a sorted, dangling-edge-free Snapshot with surfaced dedup/drop counts.

After this call returns, the builder's internal dedup maps are cleared (set to nil). This releases ~280 MB of reference pressure at ~/projects/ scale where the downstream enrich pipeline holds the returned Snapshot slices for the lifetime of the function — coexisting with the dedup maps was the largest in-memory duplication in the pipeline. Snapshot is therefore single-shot: subsequent calls to Snapshot or Add on the same builder are not supported.

type LayerClassifier ¶

type LayerClassifier struct{}

LayerClassifier assigns a Layer value to every CodeNode based on (kind, framework, file_path) heuristics. Pure, deterministic, first-match wins. Priority order mirrors LayerClassifier.java:

Node kind (frontend / backend / infra)
Language (infra)
File extension + path
Framework
Shared node kinds
Fallback package/path heuristics + Java src/main convention

func (*LayerClassifier) Classify ¶

func (c *LayerClassifier) Classify(nodes []*model.CodeNode)

Classify sets the Layer property on every node in the slice.

type Options ¶

type Options struct {
	Cache     *cache.Cache
	Registry  *detector.Registry
	BatchSize int // defaults to DefaultBatchSize
	Workers   int // defaults to 2 * GOMAXPROCS
}

Options configures an Analyzer.

type ServiceDetectionResult ¶

type ServiceDetectionResult struct {
	Nodes []*model.CodeNode
	Edges []*model.CodeEdge
}

ServiceDetectionResult holds the new SERVICE nodes and the CONTAINS edges produced by a Detect call. The Detect call also mutates the incoming `nodes` slice in place by stamping each node's `service` property.

type ServiceDetector ¶

type ServiceDetector struct{}

ServiceDetector walks the filesystem for build files (30+ build systems) and emits SERVICE nodes with CONTAINS edges to their child nodes. Mirrors src/main/java/io/github/randomcodespace/iq/analyzer/ServiceDetector.java.

Filesystem-driven by design — not all build files produce CodeNodes during index, so we cannot rely on the node list alone.

func (*ServiceDetector) Detect ¶

func (sd *ServiceDetector) Detect(nodes []*model.CodeNode, edges []*model.CodeEdge,
	projectDir string, projectRoot string) ServiceDetectionResult

Detect walks `projectRoot`, identifies module boundaries, creates SERVICE nodes and CONTAINS edges. `projectDir` is used as the fallback service name for the root module when no name can be extracted from the build file.

As a side effect, each node in `nodes` whose filePath falls under a detected module has its `service` property set to that service's label.

type Snapshot ¶

type Snapshot struct {
	Nodes []*model.CodeNode
	Edges []*model.CodeEdge

	// DedupedNodes is the count of node emissions that collided with an
	// existing node ID and were merged in. Zero on a graph where no
	// detector double-emitted.
	DedupedNodes int
	// DedupedEdges is the same for edges by (source, target, kind).
	DedupedEdges int
	// DroppedEdges is the count of edges that had no matching source or
	// target node in the final node set — phantom references usually
	// caused by a linker pointing at a node that no detector emitted.
	DroppedEdges int
}

Snapshot is the deterministic, sorted view of buffered state with phantom edges (source or target node missing) dropped. It also exposes the count of duplicate emissions collapsed during Add() and the count of dangling edges dropped during this Snapshot call.

type Stats ¶

type Stats struct {
	Files        int
	Nodes        int
	Edges        int
	DedupedNodes int
	DedupedEdges int
	DroppedEdges int
}

Stats reports per-run counts.

Plan §1.5 — DedupedNodes/DedupedEdges/DroppedEdges expose dedup activity so operators can see "graph collapsed 312 duplicate nodes, dropped 14 phantom edges" — the visibility is what makes "meaningful" diagnosable.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
linker Package linker contains cross-file enrichers that run after detectors during `codeiq enrich`.	Package linker contains cross-file enrichers that run after detectors during `codeiq enrich`.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL