Documentation
¶
Overview ¶
Package printer provides output formatting and statistics for duplicate detection.
This package implements multiple output formats and statistical analysis of detected code duplicates.
Output Formats: - TextPrinter: Human-readable text output - JSONPrinter: Structured JSON output - HTMLPrinter: HTML report with syntax highlighting - PlumbingPrinter: Machine-readable output for scripting - StatsPrinter: Comprehensive statistics and health analysis
Design: - Printer interface: Common API for all formats - Format-specific implementations in separate files - Statistics split across focused files:
- stats.go: Core type and interface methods
- stats_collector.go: SetX methods for data collection
- stats_health.go: Health score calculation
- stats_formatter.go: Output formatters (CSV, JSON, Text)
- stats_visualization.go: Visualization helpers
- stats_recommendations.go: Recommendation generation
- stats_styles.go: Style management
- stats_data.go: Data structures
Configuration: - SortBy criteria: Size, Occurrence, Hash, TotalTokens - OutputFormat selection: Text, HTML, JSON, Plumbing, Simple JSON - Threshold settings: Minimum size for duplicates - Verbosity: Detailed output for debugging
Performance: - Streaming output for large projects - Efficient memory usage for statistics - Lazy evaluation where possible
Index ¶
- Constants
- func BuildCloneGroups(duplChan <-chan syntax.Match) map[string][][]*syntax.Node
- func ComputeUniqueCounts(groups map[string][][]*syntax.Node) map[string]int
- func GetCloneSize(group [][]*syntax.Node) int
- func SortCloneGroupKeys(keys []string, sortBy SortBy, groups map[string][][]*syntax.Node, ...)
- func SortCloneGroups(groups []CloneGroup, sortBy SortBy)
- func SortClonesByHash(dups [][]*syntax.Node) [][]*syntax.Node
- func SortClonesByOccurrence(dups [][]*syntax.Node) [][]*syntax.Node
- func SortClonesBySize(dups [][]*syntax.Node) [][]*syntax.Node
- func SortClonesByTotalTokens(dups [][]*syntax.Node) [][]*syntax.Node
- func SortNodesByCriteria(dups [][]*syntax.Node, sortBy SortBy) [][]*syntax.Node
- func TestCloneSortingWithData(t *testing.T, sortFunc func([][]*syntax.Node) [][]*syntax.Node, ...)
- type Clone
- type CloneGroup
- type FileInfo
- type Format
- type Issue
- type Issuer
- type JSONClone
- type JSONOutput
- type JSONPrinter
- func (p *JSONPrinter) OutputJSON(threshold int, sortBy SortBy, detectionMethod string) error
- func (p *JSONPrinter) OutputSimpleJSON() error
- func (p *JSONPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...SortBy) error
- func (*JSONPrinter) PrintFooter() error
- func (p *JSONPrinter) PrintHeader() error
- func (p *JSONPrinter) SetFilesCount(count int)
- func (p *JSONPrinter) SetHash(hash string)
- type Printer
- type ReadFile
- type SimpleCloneGroup
- type SimpleJSONClone
- type SimpleJSONOutput
- type SortBy
- type StatsData
- type StatsPrinter
- type Summary
Constants ¶
const ( // FormatText produces human-readable text output (default). FormatText = config.OutputFormatText // FormatJSON produces machine-readable JSON output. FormatJSON = config.OutputFormatJSON // FormatCSV produces machine-readable CSV output. FormatCSV = config.OutputFormatCSV )
Format constants - aliases to config.OutputFormat values for backward compatibility.
Variables ¶
This section is empty.
Functions ¶
func BuildCloneGroups ¶
BuildCloneGroups builds a map of hash to clone groups from matches.
func ComputeUniqueCounts ¶
ComputeUniqueCounts calculates unique file counts for each clone group.
func GetCloneSize ¶
GetCloneSize returns the size (token count) of the first clone in a group. Returns 0 if the group is empty or has no fragments.
func SortCloneGroupKeys ¶
func SortCloneGroupKeys( keys []string, sortBy SortBy, groups map[string][][]*syntax.Node, uniqueCounts map[string]int, )
SortCloneGroupKeys sorts clone group hashes based on specified criteria.
func SortCloneGroups ¶
func SortCloneGroups(groups []CloneGroup, sortBy SortBy)
SortCloneGroups sorts CloneGroup arrays by specified criteria.
func SortClonesByHash ¶
SortClonesByHash sorts clone groups by hash (alphabetical, ascending order).
func SortClonesByOccurrence ¶
SortClonesByOccurrence sorts clone groups by number of files (most files first, descending order).
func SortClonesBySize ¶
SortClonesBySize sorts clone groups by token count (largest first, descending order).
func SortClonesByTotalTokens ¶
SortClonesByTotalTokens sorts clone groups by total token count across all files (largest first).
func SortNodesByCriteria ¶
SortNodesByCriteria applies sorting criteria to node arrays using a unified switch.
Types ¶
type CloneGroup ¶
type CloneGroup struct {
Hash string `json:"hash"`
Size int `json:"size"`
Files []JSONClone `json:"files"`
}
CloneGroup represents a group of duplicate code fragments.
type FileInfo ¶
FileInfo represents processed file information.
func ProcessFileContent ¶
ProcessFileContent unified file processing for all printers.
type Format ¶
type Format = config.OutputFormat
Format is an alias to config.OutputFormat for backward compatibility. This consolidates the stats format with the main output format type.
func ParseFormat ¶
ParseFormat converts a string to a Format with validation. Returns an error if the value is not a valid format.
type JSONClone ¶
type JSONClone struct {
Filename string `json:"filename"`
LineStart int `json:"line_start"`
LineEnd int `json:"line_end"`
Fragment string `json:"fragment"`
}
JSONClone represents a single code fragment duplicate for JSON output.
type JSONOutput ¶
type JSONOutput struct {
Version string `json:"version"`
Timestamp time.Time `json:"timestamp"`
Threshold int `json:"threshold"`
FilesAnalyzed int `json:"files_analyzed"`
DetectionMethod string `json:"detection_method,omitempty"`
CloneGroups []CloneGroup `json:"clone_groups"`
Summary Summary `json:"summary"`
}
JSONOutput represents the structured JSON output.
type JSONPrinter ¶
type JSONPrinter struct {
ReadFile
// contains filtered or unexported fields
}
func (*JSONPrinter) OutputJSON ¶
func (p *JSONPrinter) OutputJSON(threshold int, sortBy SortBy, detectionMethod string) error
OutputJSON generates the complete JSON output.
func (*JSONPrinter) OutputSimpleJSON ¶
func (p *JSONPrinter) OutputSimpleJSON() error
OutputSimpleJSON generates simple JSON output format (from duplicates project). This provides a simpler, more straightforward JSON format for users who prefer it.
func (*JSONPrinter) PrintClones ¶
func (p *JSONPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...SortBy) error
func (*JSONPrinter) PrintFooter ¶
func (*JSONPrinter) PrintFooter() error
func (*JSONPrinter) PrintHeader ¶
func (p *JSONPrinter) PrintHeader() error
func (*JSONPrinter) SetFilesCount ¶
func (p *JSONPrinter) SetFilesCount(count int)
SetFilesCount sets the total number of files analyzed.
func (*JSONPrinter) SetHash ¶
func (p *JSONPrinter) SetHash(hash string)
SetHash sets the current hash for the clone group being processed.
type Printer ¶
type Printer interface {
PrintHeader() error
PrintClones(dups [][]*syntax.Node, sortBy ...SortBy) error // Add optional sortBy parameter
}
type SimpleCloneGroup ¶
type SimpleCloneGroup struct {
Hash string `json:"hash"`
Score int `json:"score"` // Impact score: tokens × instances
Instances []SimpleJSONClone `json:"instances"`
}
SimpleCloneGroup represents a clone group in simple format (from duplicates project).
type SimpleJSONClone ¶
type SimpleJSONClone struct {
Filename string `json:"filename"`
StartLine int `json:"start_line"`
EndLine int `json:"end_line"`
TokenCount int `json:"token_count"`
}
SimpleJSONClone represents a single code clone instance in simple format (from duplicates project).
type SimpleJSONOutput ¶
type SimpleJSONOutput []SimpleCloneGroup
SimpleJSONOutput represents the simple JSON output format (from duplicates project).
type SortBy ¶
type SortBy string
SortBy represents a sorting criterion for clone groups.
func ParseSortBy ¶
ParseSortBy converts a string to SortBy with validation. Returns an error if the value is not a valid sorting criterion.
type StatsData ¶
type StatsData struct {
// Count metrics
TotalFilesScanned int `json:"total_files_scanned"`
TotalCloneGroups int `json:"total_clone_groups"`
TotalClones int `json:"total_clones"`
// Size metrics
TotalDuplicateLines int `json:"total_duplicate_lines"`
TotalTokens int `json:"total_tokens"`
TotalEstimatedLines int `json:"total_estimated_lines"` // Estimated total lines for duplication percentage
AverageCloneSize int `json:"average_clone_size"`
// Complexity and impact metrics
ComplexityScore float64 `json:"complexity_score"`
ImpactScore int `json:"impact_score"`
DuplicationRatio float64 `json:"duplication_ratio"` // Percentage of duplicated code
// Quality metrics
HealthScore string `json:"health_score"` // A-F grade based on metrics
// Time metrics
AnalysisDuration string `json:"analysis_duration"` // Time taken for analysis
Timestamp string `json:"timestamp"` // ISO 8601 timestamp
// Aggregation metrics
FileDuplication map[string]int `json:"file_duplication"` // filename -> duplicate line count
SizeDistribution map[string]int `json:"size_distribution"` // size range -> count (lines)
TokenDistribution map[string]int `json:"token_distribution"` // token range -> count
SeverityBreakdown map[string]int `json:"severity_breakdown"` // severity -> count (small/medium/large/huge)
// Filter metrics (NEW)
FilesFiltered int `json:"files_filtered,omitempty"` // Total files filtered out
FilterBreakdown map[string]int `json:"filter_breakdown,omitempty"` // Reason -> count (e.g., "templ" -> 12)
// Metadata
DetectionMethods string `json:"detection_methods"` // Comma-separated detection methods used
SemanticDetection bool `json:"semantic_detection"` // Whether semantic-aware detection was enabled
}
StatsData holds all aggregated statistics about code duplication analysis.
Fields: - Count metrics: TotalFilesScanned, TotalCloneGroups, TotalClones - Size metrics: TotalTokens, TotalDuplicateLines, AverageCloneSize - Complexity metrics: ComplexityScore, ImpactScore - Quality metrics: DuplicationRatio, HealthScore - Time metrics: AnalysisDuration, Timestamp - Aggregation metrics: FileDuplication, SizeDistribution - Filter metrics: FilesFiltered, FilterBreakdown (NEW) - Metadata: DetectionMethods
Domain Types Status: - Uses primitive types (int, float64, string) for JSON compatibility - Could use domain types (FileCount, TokenCount, etc.) in future - See TODO in stats.go for migration path
JSON Marshaling: - All fields are JSON tagged for easy marshaling - Use printer.JSONPrinter for formatted JSON output.
type StatsPrinter ¶
type StatsPrinter interface {
Printer
SetFilesCount(count int)
SetDetectionMethods(methods string)
SetSemanticDetection(enabled bool)
SetFormat(format Format)
SetTimestamp(timestamp string)
SetAnalysisDuration(duration time.Duration)
SetTotalEstimatedLines(lines int)
SetFilterStats(filesFiltered int, breakdown map[string]int)
GetStatsData() any
}
StatsPrinter extends Printer interface with stats-specific setters.
type Summary ¶
type Summary struct {
TotalCloneGroups int `json:"total_clone_groups"`
TotalClones int `json:"total_clones"`
ComplexityScore float64 `json:"complexity_score"`
// ImpactScore represents total duplicated code volume (tokens × instances)
// This is the simple scoring metric from the duplicates project
ImpactScore int `json:"impact_score,omitempty"`
}
Summary provides analysis summary statistics.