Documentation
¶
Index ¶
Constants ¶
const DefaultBatchSize = 500
DefaultBatchSize matches the Java side's tuned default (CLAUDE.md gotcha).
Variables ¶
var DefaultExcludeDirs = map[string]bool{ "node_modules": true, "build": true, "target": true, "dist": true, "out": true, "bin": true, "obj": true, ".git": true, ".svn": true, ".idea": true, ".vscode": true, ".eclipse": true, ".settings": true, "__pycache__": true, "venv": true, ".venv": true, ".tox": true, ".mypy_cache": true, ".pytest_cache": true, ".eggs": true, ".gradle": true, ".mvn": true, "bower_components": true, ".next": true, ".nuxt": true, "coverage": true, ".nyc_output": true, ".parcel-cache": true, ".turbo": true, ".cache": true, "vendor": true, ".codeiq": true, }
DefaultExcludeDirs mirrors the Java FileDiscovery.DEFAULT_EXCLUDES set.
Functions ¶
This section is empty.
Types ¶
type Analyzer ¶
type Analyzer struct {
// contains filtered or unexported fields
}
Analyzer orchestrates the index pipeline.
func NewAnalyzer ¶
NewAnalyzer returns an analyzer wired to opts.
func (*Analyzer) Run ¶
Run executes FileDiscovery → parse → detectors → GraphBuilder → cache writes and returns aggregate stats. Errors from individual file processing are logged to stderr but do not stop the run — partial output is better than no output (matches Java's per-file try/catch behaviour).
type DiscoveredFile ¶
type DiscoveredFile struct {
AbsPath string
RelPath string // forward-slash, relative to root
Language parser.Language
Ext string
}
DiscoveredFile is one file discovered for analysis.
type EnrichOptions ¶
type EnrichOptions struct {
// GraphDir overrides the Kuzu output directory. When "", the default
// `<root>/.codeiq/graph/codeiq.kuzu` is used.
GraphDir string
// StoreBufferPoolBytes caps Kuzu's buffer pool. Zero -> graph package
// default (2 GiB).
StoreBufferPoolBytes uint64
// StoreCopyThreads caps Kuzu COPY FROM parallelism. Zero -> graph
// package default (min(4, GOMAXPROCS)).
StoreCopyThreads uint64
}
EnrichOptions configures Enrich. The zero value is usable; GraphDir defaults to `<root>/.codeiq/graph/codeiq.kuzu` when empty.
type EnrichSummary ¶
EnrichSummary reports per-run counters from a successful Enrich.
func Enrich ¶
func Enrich(root string, c *cache.Cache, opts EnrichOptions) (EnrichSummary, error)
Enrich loads the SQLite cache for `root`, runs the linker / classifier / lexical / language-extractor / service-detector passes, bulk-loads the resulting graph into Kuzu, and creates the FTS-equivalent indexes. The returned summary reports total nodes / edges / service nodes after every pass has run.
Mirrors the `enrich` pipeline in Java (Analyzer.java + GraphStore.java). The pipeline order matches the Java side exactly:
- Linkers (TopicLinker, EntityLinker, ModuleContainmentLinker)
- LayerClassifier
- LexicalEnricher (doc comments + config keys)
- LanguageEnricher (Java, TypeScript, Python, Go extractors)
- ServiceDetector (filesystem walk for build files)
- graph.Store.BulkLoadNodes / BulkLoadEdges / CreateIndexes
All steps are deterministic — repeated calls against the same cache + root produce identical Kuzu output.
type FileDiscovery ¶
type FileDiscovery struct{}
FileDiscovery walks a repo and emits language-tagged files. Uses `git ls-files -co --exclude-standard` first; falls back to fs walk.
func NewFileDiscovery ¶
func NewFileDiscovery() *FileDiscovery
NewFileDiscovery returns a discovery instance.
func (*FileDiscovery) Discover ¶
func (d *FileDiscovery) Discover(root string) ([]DiscoveredFile, error)
Discover walks root and returns files sorted by RelPath.
type GraphBuilder ¶
type GraphBuilder struct {
// contains filtered or unexported fields
}
GraphBuilder buffers detector results across batches. Concurrent-safe.
Phase 1 (plan §1.1, §1.2):
- Nodes are deduped by ID via mergeNode (confidence-aware).
- Edges are deduped by canonical (source, target, kind) key via mergeEdge.
Snapshot() produces a deterministic sorted view with phantom edges (those whose endpoint is still missing) dropped, and exposes the dedup/drop counts so the CLI can surface "deduped N, dropped K" diagnostics.
func NewGraphBuilder ¶
func NewGraphBuilder() *GraphBuilder
NewGraphBuilder returns an empty builder.
func (*GraphBuilder) Add ¶
func (b *GraphBuilder) Add(r *detector.Result)
Add merges a detector result. Duplicate node IDs and duplicate edge (source, target, kind) tuples collapse with confidence-aware merging.
func (*GraphBuilder) Snapshot ¶
func (b *GraphBuilder) Snapshot() Snapshot
Snapshot returns the current state as a sorted, dangling-edge-free Snapshot with surfaced dedup/drop counts.
After this call returns, the builder's internal dedup maps are cleared (set to nil). This releases ~280 MB of reference pressure at ~/projects/ scale where the downstream enrich pipeline holds the returned Snapshot slices for the lifetime of the function — coexisting with the dedup maps was the largest in-memory duplication in the pipeline. Snapshot is therefore single-shot: subsequent calls to Snapshot or Add on the same builder are not supported.
type LayerClassifier ¶
type LayerClassifier struct{}
LayerClassifier assigns a Layer value to every CodeNode based on (kind, framework, file_path) heuristics. Pure, deterministic, first-match wins. Priority order mirrors LayerClassifier.java:
- Node kind (frontend / backend / infra)
- Language (infra)
- File extension + path
- Framework
- Shared node kinds
- Fallback package/path heuristics + Java src/main convention
func (*LayerClassifier) Classify ¶
func (c *LayerClassifier) Classify(nodes []*model.CodeNode)
Classify sets the Layer property on every node in the slice.
type Options ¶
type Options struct {
Cache *cache.Cache
Registry *detector.Registry
BatchSize int // defaults to DefaultBatchSize
Workers int // defaults to 2 * GOMAXPROCS
}
Options configures an Analyzer.
type ServiceDetectionResult ¶
ServiceDetectionResult holds the new SERVICE nodes and the CONTAINS edges produced by a Detect call. The Detect call also mutates the incoming `nodes` slice in place by stamping each node's `service` property.
type ServiceDetector ¶
type ServiceDetector struct{}
ServiceDetector walks the filesystem for build files (30+ build systems) and emits SERVICE nodes with CONTAINS edges to their child nodes. Mirrors src/main/java/io/github/randomcodespace/iq/analyzer/ServiceDetector.java.
Filesystem-driven by design — not all build files produce CodeNodes during index, so we cannot rely on the node list alone.
func (*ServiceDetector) Detect ¶
func (sd *ServiceDetector) Detect(nodes []*model.CodeNode, edges []*model.CodeEdge, projectDir string, projectRoot string) ServiceDetectionResult
Detect walks `projectRoot`, identifies module boundaries, creates SERVICE nodes and CONTAINS edges. `projectDir` is used as the fallback service name for the root module when no name can be extracted from the build file.
As a side effect, each node in `nodes` whose filePath falls under a detected module has its `service` property set to that service's label.
type Snapshot ¶
type Snapshot struct {
Nodes []*model.CodeNode
Edges []*model.CodeEdge
// DedupedNodes is the count of node emissions that collided with an
// existing node ID and were merged in. Zero on a graph where no
// detector double-emitted.
DedupedNodes int
// DedupedEdges is the same for edges by (source, target, kind).
DedupedEdges int
// DroppedEdges is the count of edges that had no matching source or
// target node in the final node set — phantom references usually
// caused by a linker pointing at a node that no detector emitted.
DroppedEdges int
}
Snapshot is the deterministic, sorted view of buffered state with phantom edges (source or target node missing) dropped. It also exposes the count of duplicate emissions collapsed during Add() and the count of dangling edges dropped during this Snapshot call.
type Stats ¶
type Stats struct {
Files int
Nodes int
Edges int
DedupedNodes int
DedupedEdges int
DroppedEdges int
}
Stats reports per-run counts.
Plan §1.5 — DedupedNodes/DedupedEdges/DroppedEdges expose dedup activity so operators can see "graph collapsed 312 duplicate nodes, dropped 14 phantom edges" — the visibility is what makes "meaningful" diagnosable.