printer

package
v0.0.0-...-904b7e1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 2, 2026 License: MIT Imports: 18 Imported by: 0

Documentation

Overview

Package printer provides output formatting and statistics for duplicate detection.

This package implements multiple output formats and statistical analysis of detected code duplicates.

Output Formats: - TextPrinter: Human-readable text output - JSONPrinter: Structured JSON output - HTMLPrinter: HTML report with syntax highlighting - PlumbingPrinter: Machine-readable output for scripting - StatsPrinter: Comprehensive statistics and health analysis

Design: - Printer interface: Common API for all formats - Format-specific implementations in separate files - Statistics split across focused files:

  • stats.go: Core type and interface methods
  • stats_collector.go: SetX methods for data collection
  • stats_health.go: Health score calculation
  • stats_formatter.go: Output formatters (CSV, JSON, Text)
  • stats_visualization.go: Visualization helpers
  • stats_recommendations.go: Recommendation generation
  • stats_styles.go: Style management
  • stats_data.go: Data structures

Configuration: - SortBy criteria: Size, Occurrence, Hash, TotalTokens - OutputFormat selection: Text, HTML, JSON, Plumbing, Simple JSON - Threshold settings: Minimum size for duplicates - Verbosity: Detailed output for debugging

Performance: - Streaming output for large projects - Efficient memory usage for statistics - Lazy evaluation where possible

Index

Constants

View Source
const (
	// FormatText produces human-readable text output (default).
	FormatText = config.OutputFormatText
	// FormatJSON produces machine-readable JSON output.
	FormatJSON = config.OutputFormatJSON
	// FormatCSV produces machine-readable CSV output.
	FormatCSV = config.OutputFormatCSV
)

Format constants - aliases to config.OutputFormat values for backward compatibility.

Variables

This section is empty.

Functions

func BuildCloneGroups

func BuildCloneGroups(duplChan <-chan syntax.Match) map[string][][]*syntax.Node

BuildCloneGroups builds a map of hash to clone groups from matches.

func ComputeUniqueCounts

func ComputeUniqueCounts(groups map[string][][]*syntax.Node) map[string]int

ComputeUniqueCounts calculates unique file counts for each clone group.

func GetCloneSize

func GetCloneSize(group [][]*syntax.Node) int

GetCloneSize returns the size (token count) of the first clone in a group. Returns 0 if the group is empty or has no fragments.

func SortCloneGroupKeys

func SortCloneGroupKeys(
	keys []string,
	sortBy SortBy,
	groups map[string][][]*syntax.Node,
	uniqueCounts map[string]int,
)

SortCloneGroupKeys sorts clone group hashes based on specified criteria.

func SortCloneGroups

func SortCloneGroups(groups []CloneGroup, sortBy SortBy)

SortCloneGroups sorts CloneGroup arrays by specified criteria.

func SortClonesByHash

func SortClonesByHash(dups [][]*syntax.Node) [][]*syntax.Node

SortClonesByHash sorts clone groups by hash (alphabetical, ascending order).

func SortClonesByOccurrence

func SortClonesByOccurrence(dups [][]*syntax.Node) [][]*syntax.Node

SortClonesByOccurrence sorts clone groups by number of files (most files first, descending order).

func SortClonesBySize

func SortClonesBySize(dups [][]*syntax.Node) [][]*syntax.Node

SortClonesBySize sorts clone groups by token count (largest first, descending order).

func SortClonesByTotalTokens

func SortClonesByTotalTokens(dups [][]*syntax.Node) [][]*syntax.Node

SortClonesByTotalTokens sorts clone groups by total token count across all files (largest first).

func SortNodesByCriteria

func SortNodesByCriteria(dups [][]*syntax.Node, sortBy SortBy) [][]*syntax.Node

SortNodesByCriteria applies sorting criteria to node arrays using a unified switch.

func TestCloneSortingWithData

func TestCloneSortingWithData(
	t *testing.T,
	sortFunc func([][]*syntax.Node) [][]*syntax.Node,
	sortName string,
	clones [][]*syntax.Node,
)

TestCloneSorting is a helper function for testing clone sorting algorithms that takes pre-created clones.

Types

type Clone

type Clone clone

func (Clone) Filename

func (c Clone) Filename() string

func (Clone) Fragment

func (c Clone) Fragment() []byte

func (Clone) LineEnd

func (c Clone) LineEnd() int

func (Clone) LineStart

func (c Clone) LineStart() int

type CloneGroup

type CloneGroup struct {
	Hash  string      `json:"hash"`
	Size  int         `json:"size"`
	Files []JSONClone `json:"files"`
}

CloneGroup represents a group of duplicate code fragments.

type FileInfo

type FileInfo struct {
	Filename  string
	LineStart int
	LineEnd   int
	Content   []byte
	Node      *syntax.Node
}

FileInfo represents processed file information.

func ProcessFileContent

func ProcessFileContent(fread ReadFile, node *syntax.Node) (*FileInfo, error)

ProcessFileContent unified file processing for all printers.

func ProcessNodeRange

func ProcessNodeRange(fread ReadFile, startNode, endNode *syntax.Node) (*FileInfo, error)

ProcessNodeRange processes a range of nodes (start to end).

type Format

type Format = config.OutputFormat

Format is an alias to config.OutputFormat for backward compatibility. This consolidates the stats format with the main output format type.

func ParseFormat

func ParseFormat(value string) (Format, error)

ParseFormat converts a string to a Format with validation. Returns an error if the value is not a valid format.

type Issue

type Issue struct {
	From, To Clone
}

type Issuer

type Issuer struct {
	ReadFile
}

func NewIssuer

func NewIssuer(fread ReadFile) *Issuer

func (*Issuer) MakeIssues

func (p *Issuer) MakeIssues(dups [][]*syntax.Node) ([]Issue, error)

type JSONClone

type JSONClone struct {
	Filename  string `json:"filename"`
	LineStart int    `json:"line_start"`
	LineEnd   int    `json:"line_end"`
	Fragment  string `json:"fragment"`
}

JSONClone represents a single code fragment duplicate for JSON output.

type JSONOutput

type JSONOutput struct {
	Version         string       `json:"version"`
	Timestamp       time.Time    `json:"timestamp"`
	Threshold       int          `json:"threshold"`
	FilesAnalyzed   int          `json:"files_analyzed"`
	DetectionMethod string       `json:"detection_method,omitempty"`
	CloneGroups     []CloneGroup `json:"clone_groups"`
	Summary         Summary      `json:"summary"`
}

JSONOutput represents the structured JSON output.

type JSONPrinter

type JSONPrinter struct {
	ReadFile
	// contains filtered or unexported fields
}

func (*JSONPrinter) OutputJSON

func (p *JSONPrinter) OutputJSON(threshold int, sortBy SortBy, detectionMethod string) error

OutputJSON generates the complete JSON output.

func (*JSONPrinter) OutputSimpleJSON

func (p *JSONPrinter) OutputSimpleJSON() error

OutputSimpleJSON generates simple JSON output format (from duplicates project). This provides a simpler, more straightforward JSON format for users who prefer it.

func (*JSONPrinter) PrintClones

func (p *JSONPrinter) PrintClones(dups [][]*syntax.Node, sortBy ...SortBy) error

func (*JSONPrinter) PrintFooter

func (*JSONPrinter) PrintFooter() error

func (*JSONPrinter) PrintHeader

func (p *JSONPrinter) PrintHeader() error

func (*JSONPrinter) SetFilesCount

func (p *JSONPrinter) SetFilesCount(count int)

SetFilesCount sets the total number of files analyzed.

func (*JSONPrinter) SetHash

func (p *JSONPrinter) SetHash(hash string)

SetHash sets the current hash for the clone group being processed.

type Printer

type Printer interface {
	PrintHeader() error
	PrintClones(dups [][]*syntax.Node, sortBy ...SortBy) error // Add optional sortBy parameter
	PrintFooter() error
}

func NewHTML

func NewHTML(w io.Writer, fread ReadFile, threshold ...int) Printer

func NewJSON

func NewJSON(w io.Writer, fread ReadFile) Printer

func NewPlumbing

func NewPlumbing(w io.Writer, fread ReadFile) Printer

func NewStats

func NewStats(w io.Writer, fread ReadFile, threshold int) Printer

NewStats creates a new stats printer.

Note: Accepts `threshold int` for backward compatibility. For type-safe version, use domain.Threshold at call site.

func NewText

func NewText(w io.Writer, fread ReadFile) Printer

type ReadFile

type ReadFile func(filename string) ([]byte, error)

type SimpleCloneGroup

type SimpleCloneGroup struct {
	Hash      string            `json:"hash"`
	Score     int               `json:"score"` // Impact score: tokens × instances
	Instances []SimpleJSONClone `json:"instances"`
}

SimpleCloneGroup represents a clone group in simple format (from duplicates project).

type SimpleJSONClone

type SimpleJSONClone struct {
	Filename   string `json:"filename"`
	StartLine  int    `json:"start_line"`
	EndLine    int    `json:"end_line"`
	TokenCount int    `json:"token_count"`
}

SimpleJSONClone represents a single code clone instance in simple format (from duplicates project).

type SimpleJSONOutput

type SimpleJSONOutput []SimpleCloneGroup

SimpleJSONOutput represents the simple JSON output format (from duplicates project).

type SortBy

type SortBy string

SortBy represents a sorting criterion for clone groups.

const (
	SortBySize        SortBy = "size"
	SortByOccurrence  SortBy = "occurrence"
	SortByHash        SortBy = "hash"
	SortByTotalTokens SortBy = "total-tokens"
)

func ParseSortBy

func ParseSortBy(value string) (SortBy, error)

ParseSortBy converts a string to SortBy with validation. Returns an error if the value is not a valid sorting criterion.

func (SortBy) IsValid

func (s SortBy) IsValid() bool

IsValid returns true if the SortBy value is valid.

func (SortBy) String

func (s SortBy) String() string

String returns the string representation of SortBy.

type StatsData

type StatsData struct {
	// Count metrics
	TotalFilesScanned int `json:"total_files_scanned"`
	TotalCloneGroups  int `json:"total_clone_groups"`
	TotalClones       int `json:"total_clones"`

	// Size metrics
	TotalDuplicateLines int `json:"total_duplicate_lines"`
	TotalTokens         int `json:"total_tokens"`
	TotalEstimatedLines int `json:"total_estimated_lines"` // Estimated total lines for duplication percentage
	AverageCloneSize    int `json:"average_clone_size"`

	// Complexity and impact metrics
	ComplexityScore  float64 `json:"complexity_score"`
	ImpactScore      int     `json:"impact_score"`
	DuplicationRatio float64 `json:"duplication_ratio"` // Percentage of duplicated code

	// Quality metrics
	HealthScore string `json:"health_score"` // A-F grade based on metrics

	// Time metrics
	AnalysisDuration string `json:"analysis_duration"` // Time taken for analysis
	Timestamp        string `json:"timestamp"`         // ISO 8601 timestamp

	// Aggregation metrics
	FileDuplication   map[string]int `json:"file_duplication"`   // filename -> duplicate line count
	SizeDistribution  map[string]int `json:"size_distribution"`  // size range -> count (lines)
	TokenDistribution map[string]int `json:"token_distribution"` // token range -> count
	SeverityBreakdown map[string]int `json:"severity_breakdown"` // severity -> count (small/medium/large/huge)

	// Filter metrics (NEW)
	FilesFiltered   int            `json:"files_filtered,omitempty"`   // Total files filtered out
	FilterBreakdown map[string]int `json:"filter_breakdown,omitempty"` // Reason -> count (e.g., "templ" -> 12)

	// Metadata
	DetectionMethods  string `json:"detection_methods"`  // Comma-separated detection methods used
	SemanticDetection bool   `json:"semantic_detection"` // Whether semantic-aware detection was enabled
}

StatsData holds all aggregated statistics about code duplication analysis.

Fields: - Count metrics: TotalFilesScanned, TotalCloneGroups, TotalClones - Size metrics: TotalTokens, TotalDuplicateLines, AverageCloneSize - Complexity metrics: ComplexityScore, ImpactScore - Quality metrics: DuplicationRatio, HealthScore - Time metrics: AnalysisDuration, Timestamp - Aggregation metrics: FileDuplication, SizeDistribution - Filter metrics: FilesFiltered, FilterBreakdown (NEW) - Metadata: DetectionMethods

Domain Types Status: - Uses primitive types (int, float64, string) for JSON compatibility - Could use domain types (FileCount, TokenCount, etc.) in future - See TODO in stats.go for migration path

JSON Marshaling: - All fields are JSON tagged for easy marshaling - Use printer.JSONPrinter for formatted JSON output.

type StatsPrinter

type StatsPrinter interface {
	Printer
	SetFilesCount(count int)
	SetDetectionMethods(methods string)
	SetSemanticDetection(enabled bool)
	SetFormat(format Format)
	SetTimestamp(timestamp string)
	SetAnalysisDuration(duration time.Duration)
	SetTotalEstimatedLines(lines int)
	SetFilterStats(filesFiltered int, breakdown map[string]int)
	GetStatsData() any
}

StatsPrinter extends Printer interface with stats-specific setters.

type Summary

type Summary struct {
	TotalCloneGroups int     `json:"total_clone_groups"`
	TotalClones      int     `json:"total_clones"`
	ComplexityScore  float64 `json:"complexity_score"`
	// ImpactScore represents total duplicated code volume (tokens × instances)
	// This is the simple scoring metric from the duplicates project
	ImpactScore int `json:"impact_score,omitempty"`
}

Summary provides analysis summary statistics.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL