processor

package
v3.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 27, 2023 License: MIT, Unlicense Imports: 30 Imported by: 4

Documentation

Index

Constants

View Source
const (
	TString int = iota + 1
	TSlcomment
	TMlcomment
	TComplexity
)

Used by trie structure to store the types

View Source
const (
	SBlank             int64 = 1
	SCode              int64 = 2
	SComment           int64 = 3
	SCommentCode       int64 = 4 // Indicates comment after code
	SMulticomment      int64 = 5
	SMulticommentCode  int64 = 6 // Indicates multi comment after code
	SMulticommentBlank int64 = 7 // Indicates multi comment ended with blank afterwards
	SString            int64 = 8
	SDocString         int64 = 9
)

The below are used as identifiers for the code state machine

View Source
const SheBang string = "#!"

SheBang is a global constant for indicating a shebang file header

Variables

View Source
var AllowListExtensions = []string{}

AllowListExtensions is a list of extensions which are allowed to be processed

View Source
var AverageWage int64 = 56286

AverageWage is the average wage in dollars used for the COCOMO cost estimate

View Source
var BloomTable [256]uint64
View Source
var ByteOrderMarks = [][]byte{
	{254, 255},
	{255, 254},
	{0, 0, 254, 255},
	{255, 254, 0, 0},
	{43, 47, 118, 56},
	{43, 47, 118, 57},
	{43, 47, 118, 43},
	{43, 47, 118, 47},
	{43, 47, 118, 56, 45},
	{247, 100, 76},
	{221, 115, 102, 115},
	{14, 254, 255},
	{251, 238, 40},
	{132, 49, 149, 51},
}

ByteOrderMarks are taken from https://en.wikipedia.org/wiki/Byte_order_mark#Byte_order_marks_by_encoding These indicate that we cannot count the file correctly so we can at least warn the user

View Source
var Ci = false

Ci indicates if running inside a CI so to disable box drawing characters

View Source
var Cocomo = false

Cocomo toggles the COCOMO calculation

View Source
var CocomoProjectType = "organic"

CocomoProjectType allows the flipping between project types which impacts the calculation

View Source
var Complexity = false

Complexity toggles complexity calculation

View Source
var ConfigureLimits func()

ConfigureLimits configures ulimits where possible

View Source
var CountAs = ""

CountAs is a rule for mapping known or new extensions to other rules

View Source
var CurrencySymbol = ""

CurrencySymbol allows setting the currency symbol for cocomo project cost estimation

View Source
var Debug = false

Debug enables debug logging output

View Source
var DirFilePaths = []string{}

DirFilePaths is not set via flags but by arguments following the flags for file or directory to process

View Source
var DirectoryWalkerJobWorkers = runtime.NumCPU()

DirectoryWalkerJobWorkers is the number of workers which will walk the directory tree

View Source
var DisableCheckBinary = false

DisableCheckBinary toggles checking for binary files using NUL bytes

View Source
var Duplicates = false

Duplicates enables duplicate file detection

View Source
var EAF float64 = 1.0

the effort adjustment factor derived from the cost drivers, i.e. 1.0 if rated nominal

View Source
var Exclude = []string{}

Exclude is a regular expression which is used to exclude files from being processed

View Source
var ExcludeFilename = []string{}

ExcludeFilename is a list of filenames which should be ignored

View Source
var ExcludeListExtensions = []string{}

ExcludeListExtensions is a list of extensions which should be ignored

View Source
var ExtensionToLanguage = map[string][]string{}

ExtensionToLanguage is loaded from the JSON that is in constants.go

View Source
var FileListQueueSize = runtime.NumCPU()

FileListQueueSize is the queue of files found and ready to be read into memory

View Source
var FileOutput = ""

FileOutput sets the file that output should be written to

View Source
var FileProcessJobWorkers = runtime.NumCPU() * 4

FileProcessJobWorkers is the number of workers that process the file collecting stats

View Source
var FileSummaryJobQueueSize = runtime.NumCPU()

FileSummaryJobQueueSize is the queue used to hold processed file statistics before formatting

View Source
var FilenameToLanguage = map[string]string{}

FilenameToLanguage similar to ExtensionToLanguage loaded from the JSON in constants.go

View Source
var Files = false

Files indicates if there should be file output or not when formatting

View Source
var Format = ""

Format sets the output format of the formatter

View Source
var FormatMulti = ""

FormatMulti is a rule for defining multiple output formats

View Source
var GcFileCount = 10000

GcFileCount is the number of files to process before turning the GC back on

View Source
var Generated = false

Generated enables generated file detection

View Source
var GeneratedMarkers []string

GeneratedMarkers defines head markers for generated file detection

View Source
var GitIgnore = false

GitIgnore disables .gitignore checks

View Source
var Ignore = false

Ignore disables ignore file checks

View Source
var IgnoreGenerated = false

IgnoreGenerated ignore printing counts for generated files

View Source
var IgnoreMinified = false

IgnoreMinified ignore printing counts for minified files

View Source
var IgnoreMinifiedGenerate = false

IgnoreMinifiedGenerate printing counts for minified/generated files

View Source
var IncludeSymLinks = false

IncludeSymLinks if set true will count symlink files

View Source
var LanguageFeatures = map[string]LanguageFeature{}

LanguageFeatures contains the processed languages from processLanguageFeature

View Source
var LanguageFeaturesMutex = sync.Mutex{}

LanguageFeaturesMutex is the shared mutex used to control getting and setting of language features used rather than sync.Map because it turned out to be marginally faster

View Source
var Languages = false

Languages indicates if the command line should print out the supported languages

View Source
var LargeByteCount int64 = 1000000

LargeByteCount number of bytes before being counted as a large file based on https://github.com/pinpt/ripsrc/blob/master/ripsrc/fileinfo/fileinfo.go#L44

View Source
var LargeLineCount int64 = 40000

LargeLineCount number of lines before being counted as a large file based on https://github.com/pinpt/ripsrc/blob/master/ripsrc/fileinfo/fileinfo.go#L44

View Source
var Minified = false

Minified enables minified file detection

View Source
var MinifiedGenerated = false

MinifiedGenerated enables minified/generated file detection

View Source
var MinifiedGeneratedLineByteLength = 255

MinifiedGeneratedLineByteLength number of bytes per average line to determine file is minified/generated

View Source
var More = false

More enables wider output with more information in formatter

View Source
var NoLarge = false

NoLarge if set true will ignore files over a certain number of lines or bytes

View Source
var Overhead float64 = 2.4

Overhead is the overhead multiplier for corporate overhead (facilities, equipment, accounting, etc.)

View Source
var PathDenyList = []string{}

PathDenyList sets the paths that should be skipped

View Source
var RemapAll = ""

RemapAll allows remapping of all files with a string to search the content for

View Source
var RemapUnknown = ""

RemapUnknown allows remapping of unknown files with a string to search the content for

View Source
var SLOCCountFormat = false

Print a more SLOCCount like COCOMO calculation

View Source
var SQLProject = ""

SQLProject is used to store the name for the SQL insert formats but is optional

View Source
var ShebangLookup = map[string][]string{}

ShebangLookup loaded from the JSON in constants.go contains shebang lookups

View Source
var Size = false

Size toggles the Size calculation

View Source
var SizeUnit = "si"

SizeUnit determines what size calculation is used for megabytes

View Source
var SortBy = ""

SortBy sets which column output in formatter should be sorted by

View Source
var Trace = false

Trace enables trace logging output which is extremely verbose

View Source
var Verbose = false

Verbose enables verbose logging output

View Source
var Version = "3.2.0"

Version indicates the version of the application

Functions

func BloomHash

func BloomHash(b byte) uint64

func ConfigureGc

func ConfigureGc()

ConfigureGc needs to be set outside of ProcessConstants because it should only be enabled in command line mode https://github.com/boyter/scc/issues/32

func ConfigureLazy

func ConfigureLazy(lazy bool)

ConfigureLazy is a simple setter used to turn on lazy loading used only by command line

func ConfigureLimitsUnix

func ConfigureLimitsUnix()

ConfigureLimitsUnix attempts to control ulimits on unix like OS

func CountStats

func CountStats(fileJob *FileJob)

CountStats will process the fileJob If the file contains anything even just a newline its line count should be >= 1. If the file has a size of 0 its line count should be 0. Newlines belong to the line they started on so a file of \n means only 1 line This is the 'hot' path for the application and needs to be as fast as possible

func DetectLanguage

func DetectLanguage(name string) ([]string, string)

DetectLanguage detects a language based on the filename returns the language extension and error

func DetectSheBang

func DetectSheBang(content string) (string, error)

DetectSheBang given some content attempt to determine if it has a #! that maps to a known language and return the language

func DetermineLanguage

func DetermineLanguage(filename string, fallbackLanguage string, possibleLanguages []string, content []byte) string

DetermineLanguage given a filename, fallback language, possible languages and content make a guess to the type. If multiple possible it will guess based on keywords similar to how https://github.com/vmchale/polyglot does

func EstimateCost

func EstimateCost(effortApplied float64, averageWage int64, overhead float64) float64

EstimateCost calculates the cost in dollars applied using generic COCOMO weighted values based on the average yearly wage

func EstimateEffort

func EstimateEffort(sloc int64, eaf float64) float64

EstimateEffort calculate the effort applied using generic COCOMO weighted values

func EstimateScheduleMonths

func EstimateScheduleMonths(effortApplied float64) float64

EstimateScheduleMonths estimates the effort in months based on the result from EstimateEffort

func LoadLanguageFeature

func LoadLanguageFeature(loadName string)

LoadLanguageFeature will load a single feature as requested given the name

func Process

func Process()

Process is the main entry point of the command line it sets everything up and starts running

func ProcessConstants

func ProcessConstants()

ProcessConstants is responsible for setting up the language features based on the JSON file that is stored in constants Needs to be called at least once in order for anything to actually happen

Types

type CheckDuplicates

type CheckDuplicates struct {
	// contains filtered or unexported fields
}

CheckDuplicates is used to hold hashes if duplicate detection is enabled it comes with a mutex that should be locked while a check is being performed then added

func (*CheckDuplicates) Add

func (c *CheckDuplicates) Add(key int64, hash []byte)

Add is a non thread safe add a key into the duplicates check need to use mutex inside struct before calling this

func (*CheckDuplicates) Check

func (c *CheckDuplicates) Check(key int64, hash []byte) bool

Check is a non thread safe check to see if the key exists already need to use mutex inside struct before calling this

type DirectoryJob

type DirectoryJob struct {
	// contains filtered or unexported fields
}

DirectoryJob is a struct for dealing with directories we want to walk

type DirectoryWalker

type DirectoryWalker struct {
	// contains filtered or unexported fields
}

DirectoryWalker is responsible for actually walking directories using cuba

func NewDirectoryWalker

func NewDirectoryWalker(output chan<- *FileJob) *DirectoryWalker

NewDirectoryWalker create the new directory walker

func (*DirectoryWalker) Readdir

func (dw *DirectoryWalker) Readdir(path string) ([]os.FileInfo, error)

Readdir reads a directory such that we know what files are in there

func (*DirectoryWalker) Run

func (dw *DirectoryWalker) Run()

Run continues to run everything

func (*DirectoryWalker) Start

func (dw *DirectoryWalker) Start(root string) error

Start actually starts directory traversal

func (*DirectoryWalker) Walk

func (dw *DirectoryWalker) Walk(handle *cuba.Handle)

Walk walks the directory as quickly as it can

type FileJob

type FileJob struct {
	Language           string
	PossibleLanguages  []string // Used to hold potentially more than one language which populates language when determined
	Filename           string
	Extension          string
	Location           string
	Symlocation        string
	Content            []byte `json:"-"`
	Bytes              int64
	Lines              int64
	Code               int64
	Comment            int64
	Blank              int64
	Complexity         int64
	WeightedComplexity float64
	Hash               hash.Hash
	Callback           FileJobCallback
	Binary             bool
	Minified           bool
	Generated          bool
	EndPoint           int
}

FileJob is a struct used to hold all of the results of processing internally before sent to the formatter

type FileJobCallback

type FileJobCallback interface {
	// ProcessLine should return true to continue processing or false to stop further processing and return
	ProcessLine(job *FileJob, currentLine int64, lineType LineType) bool
}

FileJobCallback is an interface that FileJobs can implement to get a per line callback with the line type

type FileReader

type FileReader struct {
	Buffer *bytes.Buffer
}

FileReader is a struct responsible for reading files into its buffer

func NewFileReader

func NewFileReader() FileReader

NewFileReader creates a new file reader responsible for reading a file

func (*FileReader) ReadFile

func (reader *FileReader) ReadFile(path string, size int) ([]byte, error)

ReadFile actually reads the file into a buffer size controlled by LargeByteCount

type Language

type Language struct {
	LineComment      []string   `json:"line_comment"`
	ComplexityChecks []string   `json:"complexitychecks"`
	Extensions       []string   `json:"extensions"`
	ExtensionFile    bool       `json:"extensionFile"`
	MultiLine        [][]string `json:"multi_line"`
	Quotes           []Quote    `json:"quotes"`
	NestedMultiLine  bool       `json:"nestedmultiline"`
	Keywords         []string   `json:"keywords"`
	FileNames        []string   `json:"filenames"`
	SheBangs         []string   `json:"shebangs"`
}

Language is a struct which contains the values for each language stored in languages.json

type LanguageFeature

type LanguageFeature struct {
	Complexity            *Trie
	MultiLineComments     *Trie
	SingleLineComments    *Trie
	Strings               *Trie
	Tokens                *Trie
	Nested                bool
	ComplexityCheckMask   byte
	SingleLineCommentMask byte
	MultiLineCommentMask  byte
	StringCheckMask       byte
	ProcessMask           byte
	Keywords              []string
	Quotes                []Quote
}

LanguageFeature is a struct which represents the conversion from Language into what is used for matching

type LanguageSummary

type LanguageSummary struct {
	Name               string
	Bytes              int64
	CodeBytes          int64
	Lines              int64
	Code               int64
	Comment            int64
	Blank              int64
	Complexity         int64
	Count              int64
	WeightedComplexity float64
	Files              []*FileJob
}

LanguageSummary is used to hold summarised results for a single language

type LineType

type LineType int32

LineType what type of line are are processing

const (
	LINE_BLANK LineType = iota
	LINE_CODE
	LINE_COMMENT
)

These are not meant to be CAMEL_CASE but as it us used by an external project we cannot change it

type OpenClose

type OpenClose struct {
	Open  []byte
	Close []byte
}

OpenClose is used to hold an open/close pair for matching such as multi line comments

type Quote

type Quote struct {
	Start        string `json:"start"`
	End          string `json:"end"`
	IgnoreEscape bool   `json:"ignoreEscape"` // To enable turning off the \ check for C# @"\" string examples https://github.com/boyter/scc/issues/71
	DocString    bool   `json:"docString"`    // To enable docstring check for Python where "If the triple quote string starts following a newline with only white-space characters in front and ends followed by only a newline or white-space characters it is a comment" https://github.com/boyter/scc/issues/62
}

Quote is a struct which holds rules and start/end values for string quotes

type Trie

type Trie struct {
	Type  int
	Close []byte
	Table [256]*Trie
}

Trie is a structure used to store matches efficiently

func (*Trie) Insert

func (root *Trie) Insert(tokenType int, token []byte)

Insert inserts a string into the trie for matching

func (*Trie) InsertClose

func (root *Trie) InsertClose(tokenType int, openToken, closeToken []byte)

InsertClose closes off a string in the trie

func (*Trie) Match

func (root *Trie) Match(token []byte) (int, int, []byte)

Match checks the created trie structure for a match

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL