Documentation
¶
Overview ¶
Package syntax provides unified AST representation for code duplication detection.
This package bridges the gap between language-specific AST parsers (golang/ast for Go code) and the language-agnostic suffix tree used by the detection algorithm.
Core Types: - Node: Unified syntax tree node representing any language construct - Match: Represents a clone match with fragments (group of nodes) - Frags: Slice of node sequences (each fragment is a sequence of nodes)
Design: - Language-agnostic: Works with any language that provides a parser - Type-safe: Uses int32 for types (see golang/constants for mapping) - Memory-optimized: Careful field ordering for cache efficiency - Position-aware: Tracks byte positions and line numbers for all nodes
Usage Flow: 1. Parse source files -> language-specific AST (go/ast, etc.) 2. Transform AST -> unified syntax.Node tree (see syntax/golang/) 3. Build suffix tree from Node sequence (suffixtree.Update()) 4. Find duplicates using suffix tree (FindDuplOver()) 5. Convert matches to complete syntax units (FindSyntaxUnits())
Key Functions: - FindSyntaxUnits(): Converts suffix tree matches to complete syntax units - hashSeq(): Creates hash of node sequence for duplicate detection - isCyclic/spansMultipleFiles(): Validation helpers
Performance: - maxChildrenSerial constant prevents goroutine stack overflow - Node struct is 40B (37.5% reduction from 64B) via int32 fields - See MEMORY_LAYOUT_OPTIMIZATION_PLAN.md for details
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CountUniqueFiles ¶
CountUniqueFiles returns the number of unique files in a clone group.
Types ¶
type Match ¶
func FindSyntaxUnits ¶
func FindSyntaxUnits(data []*Node, m suffixtree.Match, threshold int) Match
FindSyntaxUnits finds all complete syntax units in the match group and returns them with the corresponding hash.
type Node ¶
Node represents a syntax tree node.
Memory Layout Optimized with int32 fields: - int32 fields grouped for cache efficiency (4B each, 16B total) - pointer field (8B) - string header at end (16B) Total: 40B (37.5% reduction from 64B).
func NewSyntheticFileNode ¶
NewSyntheticFileNode creates a synthetic node representing an entire file. This is used for file-level duplicate detection where we want to match entire files rather than specific code fragments.
func (*Node) AddChildren ¶
func (*Node) Val ¶
func (n *Node) Val() suffixtree.TokenValue
Val returns the token value for suffix tree compatibility. Implements the suffixtree.Token interface.