extractor

package

v0.4.2 Latest Latest Go to latest Published: May 14, 2026 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/randomcodespace/codeiq

Links

Open Source Insights

Documentation ¶

Overview ¶

Package extractor defines the LanguageExtractor interface and the Enricher orchestrator that drives per-language extractors over a node list.

Mirrors src/main/java/.../intelligence/extractor/{LanguageExtractor, LanguageExtractionResult}.java. Each extractor is registered for one language and runs against nodes whose file path's extension maps to that language via DetectLanguage.

Index ¶

func DetectLanguage(path string) string
type Context
type Enricher
- func NewEnricher(exts ...LanguageExtractor) *Enricher
- func (en *Enricher) Enrich(nodes []*model.CodeNode, edges *[]*model.CodeEdge, root string)
type LanguageExtractor
type Result
- func EmptyResult() Result

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func DetectLanguage ¶

func DetectLanguage(path string) string

DetectLanguage maps a file path to an extractor language key, lower-case. Returns "" for unsupported extensions; the orchestrator then skips the file entirely.

Types ¶

type Context ¶

type Context struct {
	// FilePath is the path stamped onto CodeNode.FilePath (project-relative).
	FilePath string
	// Language is the canonical language key returned by Enricher.Language()
	// (lower-case, e.g. "java", "typescript").
	Language string
	// Content is the raw file source.
	Content string
	// Registry maps node ID and (when non-empty) FQN to the originating
	// CodeNode, so extractors can look up call targets, type bases, etc.
	Registry map[string]*model.CodeNode
}

Context is the per-file context an extractor sees during enrich. The orchestrator reads the file once and passes the contents to every node-level Extract call for that file.

type Enricher ¶

type Enricher struct {
	// contains filtered or unexported fields
}

Enricher orchestrates per-language extractors over a node list. Mirrors LanguageEnricher.java. The zero value is unusable; use NewEnricher.

func NewEnricher ¶

func NewEnricher(exts ...LanguageExtractor) *Enricher

NewEnricher returns an enricher that dispatches each registered extractor against nodes whose file extension maps (via DetectLanguage) to the extractor's Language(). Registering two extractors for the same language is last-wins.

func (*Enricher) Enrich ¶

func (en *Enricher) Enrich(nodes []*model.CodeNode, edges *[]*model.CodeEdge, root string)

Enrich runs all registered extractors against the in-memory node list, appending new edges to *edges and stamping type-hint properties onto the nodes themselves. Source files are read at most once across all nodes sharing a file path. Per-file work runs on a goroutine per file; results merge back in sorted-file order so the output is deterministic regardless of scheduler timing.

`root` is the project root that node.FilePath is relative to. Files outside the root (failed reads, missing files) are silently skipped — extractors are best-effort.

type LanguageExtractor ¶

type LanguageExtractor interface {
	// Language returns the canonical language key, lower-case (e.g. "java").
	// This key must match DetectLanguage for the orchestrator to dispatch.
	Language() string
	// Extract runs the extractor against a single node, parsing ctx.Content
	// internally. Retained as the single-node convenience wrapper for tests
	// and ad-hoc callers; the orchestrator uses ExtractFromTree to avoid
	// re-parsing N times for a file with N nodes.
	Extract(ctx Context, node *model.CodeNode) Result
	// ExtractFromTree runs the extractor against every node in `nodes` using
	// a single pre-parsed tree. Returns one Result per input node in matching
	// order, so callers can stamp TypeHints back onto the corresponding node.
	// `tree` may be nil when ctx.Language has no tree-sitter grammar — the
	// extractor must handle that by returning len(nodes) EmptyResult entries.
	ExtractFromTree(ctx Context, tree *parser.Tree, nodes []*model.CodeNode) []Result
}

LanguageExtractor mirrors the Java LanguageExtractor interface. Implementors MUST be stateless and safe to call concurrently from multiple goroutines — the orchestrator fans out per-file work to a goroutine pool.

type Result ¶

type Result struct {
	// CallEdges holds CALLS-kind edges discovered for this node.
	CallEdges []*model.CodeEdge
	// SymbolReferences holds IMPORTS / DEPENDS_ON edges produced by import
	// or symbol-resolution heuristics.
	SymbolReferences []*model.CodeEdge
	// TypeHints stamps key/value strings into the node's Properties map.
	TypeHints map[string]string
	// Confidence is the capability-level confidence for this extraction.
	Confidence model.CapabilityLevel
}

Result is what one extractor returns for one node. Mirrors LanguageExtractionResult in the Java tree.

func EmptyResult ¶

func EmptyResult() Result

EmptyResult is the canonical zero result with PARTIAL confidence. Matches LanguageExtractionResult.empty() on the Java side.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
golang Package golang implements the Go language extractor.	Package golang implements the Go language extractor.
java Package java implements the Java language extractor.	Package java implements the Java language extractor.
python Package python implements the Python language extractor.	Package python implements the Python language extractor.
typescript Package typescript implements the TypeScript language extractor.	Package typescript implements the TypeScript language extractor.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL