extractor

package
v0.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2026 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package extractor defines the LanguageExtractor interface and the Enricher orchestrator that drives per-language extractors over a node list.

Mirrors src/main/java/.../intelligence/extractor/{LanguageExtractor, LanguageExtractionResult}.java. Each extractor is registered for one language and runs against nodes whose file path's extension maps to that language via DetectLanguage.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DetectLanguage

func DetectLanguage(path string) string

DetectLanguage maps a file path to an extractor language key, lower-case. Returns "" for unsupported extensions; the orchestrator then skips the file entirely.

Types

type Context

type Context struct {
	// FilePath is the path stamped onto CodeNode.FilePath (project-relative).
	FilePath string
	// Language is the canonical language key returned by Enricher.Language()
	// (lower-case, e.g. "java", "typescript").
	Language string
	// Content is the raw file source.
	Content string
	// Registry maps node ID and (when non-empty) FQN to the originating
	// CodeNode, so extractors can look up call targets, type bases, etc.
	Registry map[string]*model.CodeNode
}

Context is the per-file context an extractor sees during enrich. The orchestrator reads the file once and passes the contents to every node-level Extract call for that file.

type Enricher

type Enricher struct {
	// contains filtered or unexported fields
}

Enricher orchestrates per-language extractors over a node list. Mirrors LanguageEnricher.java. The zero value is unusable; use NewEnricher.

func NewEnricher

func NewEnricher(exts ...LanguageExtractor) *Enricher

NewEnricher returns an enricher that dispatches each registered extractor against nodes whose file extension maps (via DetectLanguage) to the extractor's Language(). Registering two extractors for the same language is last-wins.

func (*Enricher) Enrich

func (en *Enricher) Enrich(nodes []*model.CodeNode, edges *[]*model.CodeEdge, root string)

Enrich runs all registered extractors against the in-memory node list, appending new edges to *edges and stamping type-hint properties onto the nodes themselves. Source files are read at most once across all nodes sharing a file path. Per-file work runs on a goroutine per file; results merge back in sorted-file order so the output is deterministic regardless of scheduler timing.

`root` is the project root that node.FilePath is relative to. Files outside the root (failed reads, missing files) are silently skipped — extractors are best-effort.

type LanguageExtractor

type LanguageExtractor interface {
	// Language returns the canonical language key, lower-case (e.g. "java").
	// This key must match DetectLanguage for the orchestrator to dispatch.
	Language() string
	// Extract runs the extractor against a single node, parsing ctx.Content
	// internally. Retained as the single-node convenience wrapper for tests
	// and ad-hoc callers; the orchestrator uses ExtractFromTree to avoid
	// re-parsing N times for a file with N nodes.
	Extract(ctx Context, node *model.CodeNode) Result
	// ExtractFromTree runs the extractor against every node in `nodes` using
	// a single pre-parsed tree. Returns one Result per input node in matching
	// order, so callers can stamp TypeHints back onto the corresponding node.
	// `tree` may be nil when ctx.Language has no tree-sitter grammar — the
	// extractor must handle that by returning len(nodes) EmptyResult entries.
	ExtractFromTree(ctx Context, tree *parser.Tree, nodes []*model.CodeNode) []Result
}

LanguageExtractor mirrors the Java LanguageExtractor interface. Implementors MUST be stateless and safe to call concurrently from multiple goroutines — the orchestrator fans out per-file work to a goroutine pool.

type Result

type Result struct {
	// CallEdges holds CALLS-kind edges discovered for this node.
	CallEdges []*model.CodeEdge
	// SymbolReferences holds IMPORTS / DEPENDS_ON edges produced by import
	// or symbol-resolution heuristics.
	SymbolReferences []*model.CodeEdge
	// TypeHints stamps key/value strings into the node's Properties map.
	TypeHints map[string]string
	// Confidence is the capability-level confidence for this extraction.
	Confidence model.CapabilityLevel
}

Result is what one extractor returns for one node. Mirrors LanguageExtractionResult in the Java tree.

func EmptyResult

func EmptyResult() Result

EmptyResult is the canonical zero result with PARTIAL confidence. Matches LanguageExtractionResult.empty() on the Java side.

Directories

Path Synopsis
Package golang implements the Go language extractor.
Package golang implements the Go language extractor.
Package java implements the Java language extractor.
Package java implements the Java language extractor.
Package python implements the Python language extractor.
Package python implements the Python language extractor.
Package typescript implements the TypeScript language extractor.
Package typescript implements the TypeScript language extractor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL