hercules

package module
v3.0.0-...-5a75d8f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 20, 2018 License: Apache-2.0 Imports: 41 Imported by: 0

README

Hercules GoDoc Build Status Build status Docker Build Status codecov Go Report Card

Amazingly fast and highly customizable Git repository analysis engine written in Go. Batteries included. Powered by go-git and Babelfish.

There are two tools: hercules and labours.py. The first is the program written in Go which takes a Git repository and runs a Directed Acyclic Graph (DAG) of analysis tasks. The second is the Python script which draws some predefined plots. These two tools are normally used together through a pipe. It is possible to write custom analyses using the plugin system. It is also possible to merge several analysis results together. There is a presentation available.

Hercules DAG of Burndown analysis

The DAG of burndown and couples analyses with UAST diff refining. Generated with hercules --burndown --burndown-people --couples --feature=uast --dry-run --dump-dag doc/dag.dot https://github.com/src-d/hercules

git/git image

torvalds/linux line burndown (granularity 30, sampling 30, resampled by year). Generated with hercules --burndown --pb https://github.com/torvalds/linux | python3 labours.py -f pb -m project

Installation

Grab hercules binary from the Releases page. labours.py requires the Python packages listed in requirements.txt:

pip3 install -r requirements.txt

Numpy and Scipy can be installed on Windows using http://www.lfd.uci.edu/~gohlke/pythonlibs/ Linux releases require libtensorflow.

Build from source

You are going to need Go (>= v1.8), protoc and Python 2 or 3.

go get -d gopkg.in/src-d/hercules.v3/cmd/hercules
cd $GOPATH/src/gopkg.in/src-d/hercules.v3
make

Replace $GOPATH with %GOPATH% on Windows.

Contributions

...are welcome! See CONTRIBUTING and code of conduct.

License

Apache 2.0

Usage

# Use "memory" go-git backend and display the burndown plot. "memory" is the fastest but the repository's git data must fit into RAM.
hercules --burndown https://github.com/src-d/go-git | python3 labours.py -m project --resample month
# Use "file system" go-git backend and print some basic information about the repository.
hercules /path/to/cloned/go-git
# Use "file system" go-git backend, cache the cloned repository to /tmp/repo-cache, use Protocol Buffers and display the burndown plot without resampling.
hercules --burndown --pb https://github.com/git/git /tmp/repo-cache | python3 labours.py -m project -f pb --resample raw

# Now something fun
# Get the linear history from git rev-list, reverse it
# Pipe to hercules, produce burndown snapshots for every 30 days grouped by 30 days
# Save the raw data to cache.yaml, so that later is possible to python3 labours.py -i cache.yaml
# Pipe the raw data to labours.py, set text font size to 16pt, use Agg matplotlib backend and save the plot to output.png
git rev-list HEAD | tac | hercules --commits - --burndown https://github.com/git/git | tee cache.yaml | python3 labours.py -m project --font-size 16 --backend Agg --output git.png

labours.py -i /path/to/yaml allows to read the output from hercules which was saved on disk.

Caching

It is possible to store the cloned repository on disk. The subsequent analysis can run on the corresponding directory instead of cloning from scratch:

# First time - cache
hercules https://github.com/git/git /tmp/repo-cache

# Second time - use the cache
hercules --some-analysis /tmp/repo-cache
Docker image
docker run --rm srcd/hercules hercules --burndown --pb https://github.com/git/git | docker run --rm -i -v $(pwd):/io srcd/hercules labours.py -f pb -m project -o /io/git_git.png

Built-in analyses

Project burndown
hercules --burndown
python3 labours.py -m project

Line burndown statistics for the whole repository. Exactly the same what git-of-theseus does but much faster. Blaming is performed efficiently and incrementally using a custom RB tree tracking algorithm, and only the last modification date is recorded while running the analysis.

All burndown analyses depend on the values of granularity and sampling. Granularity is the number of days each band in the stack consists of. Sampling is the frequency with which the burnout state is snapshotted. The smaller the value, the more smooth is the plot but the more work is done.

There is an option to resample the bands inside labours.py, so that you can define a very precise distribution and visualize it different ways. Besides, resampling aligns the bands across periodic boundaries, e.g. months or years. Unresampled bands are apparently not aligned and start from the project's birth date.

Files
hercules --burndown --burndown-files
python3 labours.py -m file

Burndown statistics for every file in the repository which is alive in the latest revision.

Note: it will generate separate graph for every file. You might don't want to run it on repository with many files.

People
hercules --burndown --burndown-people [-people-dict=/path/to/identities]
python3 labours.py -m person

Burndown statistics for the repository's contributors. If -people-dict is not specified, the identities are discovered by the following algorithm:

  1. We start from the root commit towards the HEAD. Emails and names are converted to lower case.
  2. If we process an unknown email and name, record them as a new developer.
  3. If we process a known email but unknown name, match to the developer with the matching email, and add the unknown name to the list of that developer's names.
  4. If we process an unknown email but known name, match to the developer with the matching name, and add the unknown email to the list of that developer's emails.

If -people-dict is specified, it should point to a text file with the custom identities. The format is: every line is a single developer, it contains all the matching emails and names separated by |. The case is ignored.

Churn matrix

Wireshark top 20 churn matrix

Wireshark top 20 devs - churn matrix

hercules --burndown --burndown-people [-people-dict=/path/to/identities]
python3 labours.py -m churn_matrix

Besides the burndown information, -people collects the added and deleted line statistics per developer. It shows how many lines written by developer A are removed by developer B. The format is the matrix with N rows and (N+2) columns, where N is the number of developers.

  1. First column is the number of lines the developer wrote.
  2. Second column is how many lines were written by the developer and deleted by unidentified developers (if -people-dict is not specified, it is always 0).
  3. The rest of the columns show how many lines were written by the developer and deleted by identified developers.

The sequence of developers is stored in people_sequence YAML node.

Code ownership

Ember.js top 20 code ownership

Ember.js top 20 devs - code ownership

hercules --burndown --burndown-people [-people-dict=/path/to/identities]
python3 labours.py -m ownership

-people also allows to draw the code share through time stacked area plot. That is, how many lines are alive at the sampled moments in time for each identified developer.

Couples

Linux kernel file couples

torvalds/linux files' coupling in Tensorflow Projector

hercules --couples [-people-dict=/path/to/identities]
python3 labours.py -m couples -o <name> [--couples-tmp-dir=/tmp]

Important: it requires Tensorflow to be installed, please follow official instructions.

The files are coupled if they are changed in the same commit. The developers are coupled if they change the same file. hercules records the number of couples throught the whole commit history and outputs the two corresponding co-occurrence matrices. labours.py then trains Swivel embeddings - dense vectors which reflect the co-occurrence probability through the Euclidean distance. The training requires a working Tensorflow installation. The intermediate files are stored in the system temporary directory or --couples-tmp-dir if it is specified. The trained embeddings are written to the current working directory with the name depending on -o. The output format is TSV and matches Tensorflow Projector so that the files and people can be visualized with t-SNE implemented in TF Projector.

Structural hotness
      46  jinja2/compiler.py:visit_Template [FunctionDef]
      42  jinja2/compiler.py:visit_For [FunctionDef]
      34  jinja2/compiler.py:visit_Output [FunctionDef]
      29  jinja2/environment.py:compile [FunctionDef]
      27  jinja2/compiler.py:visit_Include [FunctionDef]
      22  jinja2/compiler.py:visit_Macro [FunctionDef]
      22  jinja2/compiler.py:visit_FromImport [FunctionDef]
      21  jinja2/compiler.py:visit_Filter [FunctionDef]
      21  jinja2/runtime.py:__call__ [FunctionDef]
      20  jinja2/compiler.py:visit_Block [FunctionDef]

Thanks to Babelfish, hercules is able to measure how many times each structural unit has been modified. By default, it looks at functions; refer to UAST XPath manual to set an other query.

hercules --shotness [--shotness-xpath-*]
python3 labours.py -m shotness

Couples analysis automatically loads "shotness" data if available.

Jinja2 functions grouped by structural hotness

hercules --shotness --pb https://github.com/pallets/jinja | python3 labours.py -m couples -f pb

Sentiment (positive and negative code)

Django sentiment

hercules --sentiment --pb https://github.com/django/django | python3 labours.py -m sentiment -f pb

We extract new or changed comments from source code on every commit, apply BiDiSentiment general purpose sentiment recurrent neural network and plot the results. Requires libtensorflow. E.g. sadly, we need to hide the rect from the documentation finder for now is negative and Theano has a built-in optimization for logsumexp (...) so we can just write the expression directly is positive. Don't expect too much though - as was written, the sentiment model is general purpose and the code comments have different nature, so there is no magic (for now).

Everything in a single pass
hercules --burndown --burndown-files --burndown-people --couples --shotness [-people-dict=/path/to/identities]
python3 labours.py -m all

Plugins

Hercules has a plugin system and allows to run custom analyses. See PLUGINS.md.

Merging

hercules combine is the command which joins several analysis results in Protocol Buffers format together.

hercules --burndown --pb https://github.com/src-d/go-git > go-git.pb
hercules --burndown --pb https://github.com/src-d/hercules > hercules.pb
hercules combine go-git.pb hercules.pb | python3 labours.py -f pb -m project --resample M

Bad unicode errors

YAML does not support the whole range of Unicode characters and the parser on labours.py side may raise exceptions. Filter the output from hercules through fix_yaml_unicode.py to discard such offending characters.

hercules --burndown --burndown-people https://github.com/... | python3 fix_yaml_unicode.py | python3 labours.py -m people

Plotting

These options affects all plots:

python3 labours.py [--style=white|black] [--backend=] [--size=Y,X]

--style changes the background to be either white ("black" foreground) or black ("white" foreground). --backend chooses the Matplotlib backend. --size sets the size of the figure in inches. The default is 12,9.

(required in macOS) you can pin the default Matplotlib backend with

echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc

These options are effective in burndown charts only:

python3 labours.py [--text-size] [--relative]

--text-size changes the font size, --relative activate the stretched burndown layout.

Custom plotting backend

It is possible to output all the information needed to draw the plots in JSON format. Simply append .json to the output (-o) and you are done. The data format is not fully specified and depends on the Python code which generates it. Each JSON file should contain "type" which reflects the plot kind.

Caveats

  1. Currently, go-git's file system storage backend is considerably slower than the in-memory one, so you should clone repos instead of reading them from disk whenever possible. Please note that the in-memory storage may require much RAM, for example, the Linux kernel takes over 200GB in 2017.
  2. Parsing YAML in Python is slow when the number of internal objects is big. hercules' output for the Linux kernel in "couples" mode is 1.5 GB and takes more than an hour / 180GB RAM to be parsed. However, most of the repositories are parsed within a minute. Try using Protocol Buffers instead (hercules --pb and labours.py -f pb).
  3. To speed-up yaml parsing
    # Debian, Ubuntu
    apt install libyaml-dev
    # macOS
    brew install yaml-cpp libyaml
    
    # you might need to re-install pyyaml for changes to make effect
    pip uninstall pyyaml
    pip --no-cache-dir install pyyaml
    

Documentation

Overview

Package hercules contains the functions which are needed to gather various statistics from a Git repository.

The analysis is expressed in a form of the tree: there are nodes - "pipeline items" - which require some other nodes to be executed prior to selves and in turn provide the data for dependent nodes. There are several service items which do not produce any useful statistics but rather provide the requirements for other items. The top-level items include:

- BurndownAnalysis - line burndown statistics for project, files and developers.

- CouplesAnalysis - coupling statistics for files and developers.

- ShotnessAnalysis - structural hotness and couples, by any Babelfish UAST XPath (functions by default).

The typical API usage is to initialize the Pipeline class:

import "gopkg.in/src-d/go-git.v4"

var repository *git.Repository
// ...initialize repository...
pipeline := hercules.NewPipeline(repository)

Then add the required analysis:

ba := pipeline.DeployItem(&hercules.BurndownAnalysis{}).(hercules.LeafPipelineItem)

This call will add all the needed intermediate pipeline items. Then link and execute the analysis tree:

pipeline.Initialize(nil)
result, err := pipeline.Run(pipeline.Commits())

Finally extract the result:

result := result[ba].(hercules.BurndownResult)

The actual usage example is cmd/hercules/root.go - the command line tool's code.

Hercules depends heavily on https://github.com/src-d/go-git and leverages the diff algorithm through https://github.com/sergi/go-diff.

Besides, BurndownAnalysis involves File and RBTree. These are low level data structures which enable incremental blaming. File carries an instance of RBTree and the current line burndown state. RBTree implements the red-black balanced binary tree and is based on https://github.com/yasushi-saito/rbtree.

Coupling stats are supposed to be further processed rather than observed directly. labours.py uses Swivel embeddings and visualises them in Tensorflow Projector.

Shotness analysis as well as other UAST-featured items relies on [Babelfish](https://doc.bblf.sh) and requires the server to be running.

Index

Constants

View Source
const (
	// ConfigBlobCacheIgnoreMissingSubmodules is the name of the configuration option for
	// BlobCache.Configure() to not check if the referenced submodules exist.
	ConfigBlobCacheIgnoreMissingSubmodules = "BlobCache.IgnoreMissingSubmodules"
	// DependencyBlobCache identifies the dependency provided by BlobCache.
	DependencyBlobCache = "blob_cache"
)
View Source
const (
	// ConfigBurndownGranularity is the name of the option to set BurndownAnalysis.Granularity.
	ConfigBurndownGranularity = "Burndown.Granularity"
	// ConfigBurndownSampling is the name of the option to set BurndownAnalysis.Sampling.
	ConfigBurndownSampling = "Burndown.Sampling"
	// ConfigBurndownTrackFiles enables burndown collection for files.
	ConfigBurndownTrackFiles = "Burndown.TrackFiles"
	// ConfigBurndownTrackPeople enables burndown collection for authors.
	ConfigBurndownTrackPeople = "Burndown.TrackPeople"
	// ConfigBurndownDebug enables some extra debug assertions.
	ConfigBurndownDebug = "Burndown.Debug"
	// DefaultBurndownGranularity is the default number of days for BurndownAnalysis.Granularity
	// and BurndownAnalysis.Sampling.
	DefaultBurndownGranularity = 30
)
View Source
const (
	// DependencyDay is the name of the dependency which DaysSinceStart provides - the number
	// of days since the first commit in the analysed sequence.
	DependencyDay = "day"

	// FactCommitsByDay contains the mapping between day indices and the corresponding commits.
	FactCommitsByDay = "DaysSinceStart.Commits"
)
View Source
const (
	// ConfigFileDiffDisableCleanup is the name of the configuration option (FileDiff.Configure())
	// to suppress diffmatchpatch.DiffCleanupSemanticLossless() which is supposed to improve
	// the human interpretability of diffs.
	ConfigFileDiffDisableCleanup = "FileDiff.NoCleanup"

	// DependencyFileDiff is the name of the dependency provided by FileDiff.
	DependencyFileDiff = "file_diff"
)
View Source
const (
	// AuthorMissing is the internal author index which denotes any unmatched identities
	// (IdentityDetector.Consume()).
	AuthorMissing = (1 << 18) - 1
	// AuthorMissingName is the string name which corresponds to AuthorMissing.
	AuthorMissingName = "<unmatched>"

	// FactIdentityDetectorPeopleDict is the name of the fact which is inserted in
	// IdentityDetector.Configure(). It corresponds to IdentityDetector.PeopleDict - the mapping
	// from the signatures to the author indices.
	FactIdentityDetectorPeopleDict = "IdentityDetector.PeopleDict"
	// FactIdentityDetectorReversedPeopleDict is the name of the fact which is inserted in
	// IdentityDetector.Configure(). It corresponds to IdentityDetector.ReversedPeopleDict -
	// the mapping from the author indices to the main signature.
	FactIdentityDetectorReversedPeopleDict = "IdentityDetector.ReversedPeopleDict"
	// ConfigIdentityDetectorPeopleDictPath is the name of the configuration option
	// (IdentityDetector.Configure()) which allows to set the external PeopleDict mapping from a file.
	ConfigIdentityDetectorPeopleDictPath = "IdentityDetector.PeopleDictPath"
	// FactIdentityDetectorPeopleCount is the name of the fact which is inserted in
	// IdentityDetector.Configure(). It is equal to the overall number of unique authors
	// (the length of ReversedPeopleDict).
	FactIdentityDetectorPeopleCount = "IdentityDetector.PeopleCount"

	// DependencyAuthor is the name of the dependency provided by IdentityDetector.
	DependencyAuthor = "author"
)
View Source
const (
	// ConfigPipelineDumpPath is the name of the Pipeline configuration option (Pipeline.Initialize())
	// which enables saving the items DAG to the specified file.
	ConfigPipelineDumpPath = "Pipeline.DumpPath"
	// ConfigPipelineDryRun is the name of the Pipeline configuration option (Pipeline.Initialize())
	// which disables Configure() and Initialize() invocation on each PipelineItem during the
	// Pipeline initialization.
	// Subsequent Run() calls are going to fail. Useful with ConfigPipelineDumpPath=true.
	ConfigPipelineDryRun = "Pipeline.DryRun"
	// ConfigPipelineCommits is the name of the Pipeline configuration option (Pipeline.Initialize())
	// which allows to specify the custom commit sequence. By default, Pipeline.Commits() is used.
	ConfigPipelineCommits = "commits"
)
View Source
const (
	// RenameAnalysisDefaultThreshold specifies the default percentage of common lines in a pair
	// of files to consider them linked. The exact code of the decision is sizesAreClose().
	RenameAnalysisDefaultThreshold = 90

	// ConfigRenameAnalysisSimilarityThreshold is the name of the configuration option
	// (RenameAnalysis.Configure()) which sets the similarity threshold.
	ConfigRenameAnalysisSimilarityThreshold = "RenameAnalysis.SimilarityThreshold"
)
View Source
const (
	// ConfigShotnessXpathStruct is the name of the configuration option (ShotnessAnalysis.Configure())
	// which sets the UAST XPath to choose the analysed nodes.
	ConfigShotnessXpathStruct = "Shotness.XpathStruct"
	// ConfigShotnessXpathName is the name of the configuration option (ShotnessAnalysis.Configure())
	// which sets the UAST XPath to find the name of the nodes chosen by ConfigShotnessXpathStruct.
	// These XPath-s can be different for some languages.
	ConfigShotnessXpathName = "Shotness.XpathName"

	// DefaultShotnessXpathStruct is the default UAST XPath to choose the analysed nodes.
	// It extracts functions.
	DefaultShotnessXpathStruct = "//*[@roleFunction and @roleDeclaration]"
	// DefaultShotnessXpathName is the default UAST XPath to choose the names of the analysed nodes.
	// It looks at the current tree level and at the immediate children.
	DefaultShotnessXpathName = "/*[@roleFunction and @roleIdentifier and @roleName] | /*/*[@roleFunction and @roleIdentifier and @roleName]"
)
View Source
const (
	// DependencyTreeChanges is the name of the dependency provided by TreeDiff.
	DependencyTreeChanges = "changes"
	// ConfigTreeDiffEnableBlacklist is the name of the configuration option
	// (TreeDiff.Configure()) which allows to skip blacklisted directories.
	ConfigTreeDiffEnableBlacklist = "TreeDiff.EnableBlacklist"
	// ConfigTreeDiffBlacklistedDirs s the name of the configuration option
	// (TreeDiff.Configure()) which allows to set blacklisted directories.
	ConfigTreeDiffBlacklistedDirs = "TreeDiff.BlacklistedDirs"
)
View Source
const (

	// ConfigUASTEndpoint is the name of the configuration option (UASTExtractor.Configure())
	// which sets the Babelfish server address.
	ConfigUASTEndpoint = "ConfigUASTEndpoint"
	// ConfigUASTTimeout is the name of the configuration option (UASTExtractor.Configure())
	// which sets the maximum amount of time to wait for a Babelfish server response.
	ConfigUASTTimeout = "ConfigUASTTimeout"
	// ConfigUASTPoolSize is the name of the configuration option (UASTExtractor.Configure())
	// which sets the number of goroutines to run for UAST parse queries.
	ConfigUASTPoolSize = "ConfigUASTPoolSize"
	// ConfigUASTFailOnErrors is the name of the configuration option (UASTExtractor.Configure())
	// which enables early exit in case of any Babelfish UAST parsing errors.
	ConfigUASTFailOnErrors = "ConfigUASTFailOnErrors"
	// ConfigUASTLanguages is the name of the configuration option (UASTExtractor.Configure())
	// which sets the list of languages to parse. Language names are at
	// https://doc.bblf.sh/languages.html Names are joined with a comma ",".
	ConfigUASTLanguages = "ConfigUASTLanguages"

	// FeatureUast is the name of the Pipeline feature which activates all the items related to UAST.
	FeatureUast = "uast"
	// DependencyUasts is the name of the dependency provided by UASTExtractor.
	DependencyUasts = "uasts"
)
View Source
const (
	// ConfigUASTChangesSaverOutputPath is the name of the configuration option
	// (UASTChangesSaver.Configure()) which sets the target directory where to save the files.
	ConfigUASTChangesSaverOutputPath = "UASTChangesSaver.OutputPath"
)
View Source
const (
	// DependencyUastChanges is the name of the dependency provided by UASTChanges.
	DependencyUastChanges = "changed_uasts"
)
View Source
const TreeEnd int = -1

TreeEnd denotes the value of the last leaf in the tree.

Variables

View Source
var BinaryGitHash = "<unknown>"

BinaryGitHash is the Git hash of the Hercules binary file which is executing.

View Source
var BinaryVersion = 0

BinaryVersion is Hercules' API version. It matches the package name.

View Source
var Registry = &PipelineItemRegistry{
	provided:     map[string][]reflect.Type{},
	registered:   map[string]reflect.Type{},
	flags:        map[string]reflect.Type{},
	featureFlags: arrayFeatureFlags{Flags: []string{}, Choices: map[string]bool{}},
}

Registry contains all known pipeline item types.

Functions

func BlobToString

func BlobToString(file *object.Blob) (string, error)

BlobToString reads *object.Blob and returns its contents as a string.

func CountLines

func CountLines(file *object.Blob) (int, error)

CountLines returns the number of lines in a *object.Blob.

func LoadCommitsFromFile

func LoadCommitsFromFile(path string, repository *git.Repository) ([]*object.Commit, error)

LoadCommitsFromFile reads the file by the specified FS path and generates the sequence of commits by interpreting each line as a Git commit hash.

func ParseMailmap

func ParseMailmap(contents string) map[string]object.Signature

ParseMailmap parses the contents of .mailmap and returns the mapping between signature parts. It does *not* follow the full signature matching convention, that is, developers are identified by email and by name independently.

func VisitEachNode

func VisitEachNode(root *uast.Node, payload func(*uast.Node))

VisitEachNode is a handy routine to execute a callback on every node in the subtree, including the root itself. Depth first tree traversal.

Types

type BlobCache

type BlobCache struct {
	// Specifies how to handle the situation when we encounter a git submodule - an object without
	// the blob. If false, we look inside .gitmodules and if don't find, raise an error.
	// If true, we do not look inside .gitmodules and always succeed.
	IgnoreMissingSubmodules bool
	// contains filtered or unexported fields
}

BlobCache loads the blobs which correspond to the changed files in a commit. It is a PipelineItem. It must provide the old and the new objects; "blobCache" rotates and allows to not load the same blobs twice. Outdated objects are removed so "blobCache" never grows big.

func (*BlobCache) Configure

func (blobCache *BlobCache) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*BlobCache) Consume

func (blobCache *BlobCache) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*BlobCache) Initialize

func (blobCache *BlobCache) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*BlobCache) ListConfigurationOptions

func (blobCache *BlobCache) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*BlobCache) Name

func (blobCache *BlobCache) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*BlobCache) Provides

func (blobCache *BlobCache) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*BlobCache) Requires

func (blobCache *BlobCache) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

type BurndownAnalysis

type BurndownAnalysis struct {
	// Granularity sets the size of each band - the number of days it spans.
	// Smaller values provide better resolution but require more work and eat more
	// memory. 30 days is usually enough.
	Granularity int
	// Sampling sets how detailed is the statistic - the size of the interval in
	// days between consecutive measurements. It may not be greater than Granularity. Try 15 or 30.
	Sampling int

	// TrackFiles enables or disables the fine-grained per-file burndown analysis.
	// It does not change the project level burndown results.
	TrackFiles bool

	// The number of developers for which to collect the burndown stats. 0 disables it.
	PeopleNumber int

	// Debug activates the debugging mode. Analyse() runs slower in this mode
	// but it accurately checks all the intermediate states for invariant
	// violations.
	Debug bool
	// contains filtered or unexported fields
}

BurndownAnalysis allows to gather the line burndown statistics for a Git repository. It is a LeafPipelineItem. Reference: https://erikbern.com/2016/12/05/the-half-life-of-code.html

func (*BurndownAnalysis) Configure

func (analyser *BurndownAnalysis) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*BurndownAnalysis) Consume

func (analyser *BurndownAnalysis) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*BurndownAnalysis) Deserialize

func (analyser *BurndownAnalysis) Deserialize(pbmessage []byte) (interface{}, error)

Deserialize converts the specified protobuf bytes to BurndownResult.

func (*BurndownAnalysis) Finalize

func (analyser *BurndownAnalysis) Finalize() interface{}

Finalize returns the result of the analysis. Further Consume() calls are not expected.

func (*BurndownAnalysis) Flag

func (analyser *BurndownAnalysis) Flag() string

Flag for the command line switch which enables this analysis.

func (*BurndownAnalysis) Initialize

func (analyser *BurndownAnalysis) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*BurndownAnalysis) ListConfigurationOptions

func (analyser *BurndownAnalysis) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*BurndownAnalysis) MergeResults

func (analyser *BurndownAnalysis) MergeResults(
	r1, r2 interface{}, c1, c2 *CommonAnalysisResult) interface{}

MergeResults combines two BurndownResult-s together.

func (*BurndownAnalysis) Name

func (analyser *BurndownAnalysis) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*BurndownAnalysis) Provides

func (analyser *BurndownAnalysis) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*BurndownAnalysis) Requires

func (analyser *BurndownAnalysis) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

func (*BurndownAnalysis) Serialize

func (analyser *BurndownAnalysis) Serialize(result interface{}, binary bool, writer io.Writer) error

Serialize converts the analysis result as returned by Finalize() to text or bytes. The text format is YAML and the bytes format is Protocol Buffers.

type BurndownResult

type BurndownResult struct {
	// [number of samples][number of bands]
	// The number of samples depends on Sampling: the less Sampling, the bigger the number.
	// The number of bands depends on Granularity: the less Granularity, the bigger the number.
	GlobalHistory [][]int64
	// The key is the path inside the Git repository. The value's dimensions are the same as
	// in GlobalHistory.
	FileHistories map[string][][]int64
	// [number of people][number of samples][number of bands]
	PeopleHistories [][][]int64
	// [number of people][number of people + 2]
	// The first element is the total number of lines added by the author.
	// The second element is the number of removals by unidentified authors (outside reversedPeopleDict).
	// The rest of the elements are equal the number of line removals by the corresponding
	// authors in reversedPeopleDict: 2 -> 0, 3 -> 1, etc.
	PeopleMatrix [][]int64
	// contains filtered or unexported fields
}

BurndownResult carries the result of running BurndownAnalysis - it is returned by BurndownAnalysis.Finalize().

type ChangesXPather

type ChangesXPather struct {
	XPath string
}

ChangesXPather extracts changed UAST nodes from files changed in the current commit.

func (ChangesXPather) Extract

func (xpather ChangesXPather) Extract(changes []UASTChange) []*uast.Node

Extract returns the list of new or changed UAST nodes filtered by XPath.

type CommonAnalysisResult

type CommonAnalysisResult struct {
	// Time of the first commit in the analysed sequence.
	BeginTime int64
	// Time of the last commit in the analysed sequence.
	EndTime int64
	// The number of commits in the analysed sequence.
	CommitsNumber int
	// The duration of Pipeline.Run().
	RunTime time.Duration
}

CommonAnalysisResult holds the information which is always extracted at Pipeline.Run().

func MetadataToCommonAnalysisResult

func MetadataToCommonAnalysisResult(meta *pb.Metadata) *CommonAnalysisResult

MetadataToCommonAnalysisResult copies the data from a Protobuf message.

func (*CommonAnalysisResult) BeginTimeAsTime

func (car *CommonAnalysisResult) BeginTimeAsTime() time.Time

BeginTimeAsTime converts the UNIX timestamp of the beginning to Go time.

func (*CommonAnalysisResult) EndTimeAsTime

func (car *CommonAnalysisResult) EndTimeAsTime() time.Time

EndTimeAsTime converts the UNIX timestamp of the ending to Go time.

func (*CommonAnalysisResult) FillMetadata

func (car *CommonAnalysisResult) FillMetadata(meta *pb.Metadata) *pb.Metadata

FillMetadata copies the data to a Protobuf message.

func (*CommonAnalysisResult) Merge

func (car *CommonAnalysisResult) Merge(other *CommonAnalysisResult)

Merge combines the CommonAnalysisResult with an other one. We choose the earlier BeginTime, the later EndTime, sum the number of commits and the elapsed run times.

type ConfigurationOption

type ConfigurationOption struct {
	// Name identifies the configuration option in facts.
	Name string
	// Description represents the help text about the configuration option.
	Description string
	// Flag corresponds to the CLI token with "--" prepended.
	Flag string
	// Type specifies the kind of the configuration option's value.
	Type ConfigurationOptionType
	// Default is the initial value of the configuration option.
	Default interface{}
}

ConfigurationOption allows for the unified, retrospective way to setup PipelineItem-s.

func (ConfigurationOption) FormatDefault

func (opt ConfigurationOption) FormatDefault() string

FormatDefault converts the default value of ConfigurationOption to string. Used in the command line interface to show the argument's default value.

type ConfigurationOptionType

type ConfigurationOptionType int

ConfigurationOptionType represents the possible types of a ConfigurationOption's value.

const (
	// BoolConfigurationOption reflects the boolean value type.
	BoolConfigurationOption ConfigurationOptionType = iota
	// IntConfigurationOption reflects the integer value type.
	IntConfigurationOption
	// StringConfigurationOption reflects the string value type.
	StringConfigurationOption
	// FloatConfigurationOption reflects a floating point value type.
	FloatConfigurationOption
	// StringsConfigurationOption reflects the array of strings value type.
	StringsConfigurationOption
)

func (ConfigurationOptionType) String

func (opt ConfigurationOptionType) String() string

String() returns an empty string for the boolean type, "int" for integers and "string" for strings. It is used in the command line interface to show the argument's type.

type CouplesAnalysis

type CouplesAnalysis struct {
	// PeopleNumber is the number of developers for which to build the matrix. 0 disables this analysis.
	PeopleNumber int
	// contains filtered or unexported fields
}

CouplesAnalysis calculates the number of common commits for files and authors. The results are matrices, where cell at row X and column Y is the number of commits which changed X and Y together. In case with people, the numbers are summed for every common file.

func (*CouplesAnalysis) Configure

func (couples *CouplesAnalysis) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*CouplesAnalysis) Consume

func (couples *CouplesAnalysis) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*CouplesAnalysis) Deserialize

func (couples *CouplesAnalysis) Deserialize(pbmessage []byte) (interface{}, error)

Deserialize converts the specified protobuf bytes to CouplesResult.

func (*CouplesAnalysis) Finalize

func (couples *CouplesAnalysis) Finalize() interface{}

Finalize returns the result of the analysis. Further Consume() calls are not expected.

func (*CouplesAnalysis) Flag

func (couples *CouplesAnalysis) Flag() string

Flag for the command line switch which enables this analysis.

func (*CouplesAnalysis) Initialize

func (couples *CouplesAnalysis) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*CouplesAnalysis) ListConfigurationOptions

func (couples *CouplesAnalysis) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*CouplesAnalysis) MergeResults

func (couples *CouplesAnalysis) MergeResults(r1, r2 interface{}, c1, c2 *CommonAnalysisResult) interface{}

MergeResults combines two CouplesAnalysis-s together.

func (*CouplesAnalysis) Name

func (couples *CouplesAnalysis) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*CouplesAnalysis) Provides

func (couples *CouplesAnalysis) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*CouplesAnalysis) Requires

func (couples *CouplesAnalysis) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

func (*CouplesAnalysis) Serialize

func (couples *CouplesAnalysis) Serialize(result interface{}, binary bool, writer io.Writer) error

Serialize converts the analysis result as returned by Finalize() to text or bytes. The text format is YAML and the bytes format is Protocol Buffers.

type CouplesResult

type CouplesResult struct {
	PeopleMatrix []map[int]int64
	PeopleFiles  [][]int
	FilesMatrix  []map[int]int64
	Files        []string
	// contains filtered or unexported fields
}

CouplesResult is returned by CouplesAnalysis.Finalize() and carries couples matrices from authors and files.

type DaysSinceStart

type DaysSinceStart struct {
	// contains filtered or unexported fields
}

DaysSinceStart provides the relative date information for every commit. It is a PipelineItem.

func (*DaysSinceStart) Configure

func (days *DaysSinceStart) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*DaysSinceStart) Consume

func (days *DaysSinceStart) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*DaysSinceStart) Initialize

func (days *DaysSinceStart) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*DaysSinceStart) ListConfigurationOptions

func (days *DaysSinceStart) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*DaysSinceStart) Name

func (days *DaysSinceStart) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*DaysSinceStart) Provides

func (days *DaysSinceStart) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*DaysSinceStart) Requires

func (days *DaysSinceStart) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

type FeaturedPipelineItem

type FeaturedPipelineItem interface {
	PipelineItem
	// Features returns the list of names which enable this item to be automatically inserted
	// in Pipeline.DeployItem().
	Features() []string
}

FeaturedPipelineItem enables switching the automatic insertion of pipeline items on or off.

type File

type File struct {
	// contains filtered or unexported fields
}

File encapsulates a balanced binary tree to store line intervals and a cumulative mapping of values to the corresponding length counters. Users are not supposed to create File-s directly; instead, they should call NewFile(). NewFileFromTree() is the special constructor which is useful in the tests.

Len() returns the number of lines in File.

Update() mutates File by introducing tree structural changes and updating the length mapping.

Dump() writes the tree to a string and Validate() checks the tree integrity.

func NewFile

func NewFile(time int, length int, statuses ...Status) *File

NewFile initializes a new instance of File struct.

time is the starting value of the first node;

length is the starting length of the tree (the key of the second and the last node);

statuses are the attached interval length mappings.

func NewFileFromTree

func NewFileFromTree(keys []int, vals []int, statuses ...Status) *File

NewFileFromTree is an alternative constructor for File which is used in tests. The resulting tree is validated with Validate() to ensure the initial integrity.

keys is a slice with the starting tree keys.

vals is a slice with the starting tree values. Must match the size of keys.

statuses are the attached interval length mappings.

func (*File) Dump

func (file *File) Dump() string

Dump formats the underlying line interval tree into a string. Useful for error messages, panic()-s and debugging.

func (*File) Len

func (file *File) Len() int

Len returns the File's size - that is, the maximum key in the tree of line intervals.

func (*File) Status

func (file *File) Status(index int) interface{}

Status returns the bound status object by the specified index.

func (*File) Update

func (file *File) Update(time int, pos int, insLength int, delLength int)

Update modifies the underlying tree to adapt to the specified line changes.

time is the time when the requested changes are made. Sets the values of the inserted nodes.

pos is the index of the line at which the changes are introduced.

ins_length is the number of inserted lines after pos.

del_length is the number of removed lines after pos. Deletions come before the insertions.

The code inside this function is probably the most important one throughout the project. It is extensively covered with tests. If you find a bug, please add the corresponding case in file_test.go.

func (*File) Validate

func (file *File) Validate()

Validate checks the underlying line interval tree integrity. The checks are as follows:

1. The minimum key must be 0 because the first line index is always 0.

2. The last node must carry TreeEnd value. This is the maintained invariant which marks the ending of the last line interval.

3. Node keys must monotonically increase and never duplicate.

type FileDiff

type FileDiff struct {
	CleanupDisabled bool
}

FileDiff calculates the difference of files which were modified. It is a PipelineItem.

func (*FileDiff) Configure

func (diff *FileDiff) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*FileDiff) Consume

func (diff *FileDiff) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*FileDiff) Initialize

func (diff *FileDiff) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*FileDiff) ListConfigurationOptions

func (diff *FileDiff) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*FileDiff) Name

func (diff *FileDiff) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*FileDiff) Provides

func (diff *FileDiff) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*FileDiff) Requires

func (diff *FileDiff) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

type FileDiffData

type FileDiffData struct {
	OldLinesOfCode int
	NewLinesOfCode int
	Diffs          []diffmatchpatch.Diff
}

FileDiffData is the type of the dependency provided by FileDiff.

type FileDiffRefiner

type FileDiffRefiner struct {
}

FileDiffRefiner uses UASTs to improve the human interpretability of diffs. It is a PipelineItem. The idea behind this algorithm is simple: in case of multiple choices which are equally optimal, choose the one which touches less AST nodes.

func (*FileDiffRefiner) Configure

func (ref *FileDiffRefiner) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*FileDiffRefiner) Consume

func (ref *FileDiffRefiner) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*FileDiffRefiner) Features

func (ref *FileDiffRefiner) Features() []string

Features which must be enabled for this PipelineItem to be automatically inserted into the DAG.

func (*FileDiffRefiner) Initialize

func (ref *FileDiffRefiner) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*FileDiffRefiner) ListConfigurationOptions

func (ref *FileDiffRefiner) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*FileDiffRefiner) Name

func (ref *FileDiffRefiner) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*FileDiffRefiner) Provides

func (ref *FileDiffRefiner) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*FileDiffRefiner) Requires

func (ref *FileDiffRefiner) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

type FileGetter

type FileGetter func(path string) (*object.File, error)

FileGetter defines a function which loads the Git file by the specified path. The state can be arbitrary though here it always corresponds to the currently processed commit.

type FileHistory

type FileHistory struct {
	// contains filtered or unexported fields
}

FileHistory contains the intermediate state which is mutated by Consume(). It should implement LeafPipelineItem.

func (*FileHistory) Configure

func (history *FileHistory) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*FileHistory) Consume

func (history *FileHistory) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*FileHistory) Finalize

func (history *FileHistory) Finalize() interface{}

Finalize returns the result of the analysis. Further Consume() calls are not expected.

func (*FileHistory) Flag

func (history *FileHistory) Flag() string

Flag for the command line switch which enables this analysis.

func (*FileHistory) Initialize

func (history *FileHistory) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*FileHistory) ListConfigurationOptions

func (history *FileHistory) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*FileHistory) Name

func (history *FileHistory) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*FileHistory) Provides

func (history *FileHistory) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*FileHistory) Requires

func (history *FileHistory) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

func (*FileHistory) Serialize

func (history *FileHistory) Serialize(result interface{}, binary bool, writer io.Writer) error

Serialize converts the analysis result as returned by Finalize() to text or bytes. The text format is YAML and the bytes format is Protocol Buffers.

type FileHistoryResult

type FileHistoryResult struct {
	Files map[string][]plumbing.Hash
}

FileHistoryResult is returned by Finalize() and represents the analysis result.

type IdentityDetector

type IdentityDetector struct {
	// PeopleDict maps email || name  -> developer id.
	PeopleDict map[string]int
	// ReversedPeopleDict maps developer id -> description
	ReversedPeopleDict []string
}

IdentityDetector determines the author of a commit. Same person can commit under different signatures, and we apply some heuristics to merge those together. It is a PipelineItem.

func (*IdentityDetector) Configure

func (id *IdentityDetector) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*IdentityDetector) Consume

func (id *IdentityDetector) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*IdentityDetector) GeneratePeopleDict

func (id *IdentityDetector) GeneratePeopleDict(commits []*object.Commit)

GeneratePeopleDict loads author signatures from the specified list of Git commits.

func (*IdentityDetector) Initialize

func (id *IdentityDetector) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*IdentityDetector) ListConfigurationOptions

func (id *IdentityDetector) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*IdentityDetector) LoadPeopleDict

func (id *IdentityDetector) LoadPeopleDict(path string) error

LoadPeopleDict loads author signatures from a text file. The format is one signature per line, and the signature consists of several keys separated by "|". The first key is the main one and used to reference all the rest.

func (IdentityDetector) MergeReversedDicts

func (id IdentityDetector) MergeReversedDicts(rd1, rd2 []string) (map[string][3]int, []string)

MergeReversedDicts joins two identity lists together, excluding duplicates, in-order.

func (*IdentityDetector) Name

func (id *IdentityDetector) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*IdentityDetector) Provides

func (id *IdentityDetector) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*IdentityDetector) Requires

func (id *IdentityDetector) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

type LeafPipelineItem

type LeafPipelineItem interface {
	PipelineItem
	// Flag returns the cmdline name of the item.
	Flag() string
	// Finalize returns the result of the analysis.
	Finalize() interface{}
	// Serialize encodes the object returned by Finalize() to YAML or Protocol Buffers.
	Serialize(result interface{}, binary bool, writer io.Writer) error
}

LeafPipelineItem corresponds to the top level pipeline items which produce the end results.

type MergeablePipelineItem

type MergeablePipelineItem interface {
	LeafPipelineItem
	// Deserialize loads the result from Protocol Buffers blob.
	Deserialize(pbmessage []byte) (interface{}, error)
	// MergeResults joins two results together. Common-s are specified as the global state.
	MergeResults(r1, r2 interface{}, c1, c2 *CommonAnalysisResult) interface{}
}

MergeablePipelineItem specifies the methods to combine several analysis results together.

type NodeSummary

type NodeSummary struct {
	InternalRole string
	Roles        []uast.Role
	Name         string
	File         string
}

NodeSummary carries the node attributes which annotate the "shotness" analysis' counters. These attributes are supposed to uniquely identify each node.

func (NodeSummary) String

func (node NodeSummary) String() string

type Pipeline

type Pipeline struct {
	// OnProgress is the callback which is invoked in Analyse() to output it's
	// progress. The first argument is the number of processed commits and the
	// second is the total number of commits.
	OnProgress func(int, int)
	// contains filtered or unexported fields
}

Pipeline is the core Hercules entity which carries several PipelineItems and executes them. See the extended example of how a Pipeline works in doc.go

func NewPipeline

func NewPipeline(repository *git.Repository) *Pipeline

NewPipeline initializes a new instance of Pipeline struct.

func (*Pipeline) AddItem

func (pipeline *Pipeline) AddItem(item PipelineItem) PipelineItem

AddItem inserts a PipelineItem into the pipeline. It does not check any dependencies. See also: DeployItem().

func (*Pipeline) Commits

func (pipeline *Pipeline) Commits() []*object.Commit

Commits returns the critical path in the repository's history. It starts from HEAD and traces commits backwards till the root. When it encounters a merge (more than one parent), it always chooses the first parent.

func (*Pipeline) DeployItem

func (pipeline *Pipeline) DeployItem(item PipelineItem) PipelineItem

DeployItem inserts a PipelineItem into the pipeline. It also recursively creates all of it's dependencies (PipelineItem.Requires()). Returns the same item as specified in the arguments.

func (*Pipeline) GetFact

func (pipeline *Pipeline) GetFact(name string) interface{}

GetFact returns the value of the fact with the specified name.

func (*Pipeline) GetFeature

func (pipeline *Pipeline) GetFeature(name string) (bool, bool)

GetFeature returns the state of the feature with the specified name (enabled/disabled) and whether it exists. See also: FeaturedPipelineItem.

func (*Pipeline) Initialize

func (pipeline *Pipeline) Initialize(facts map[string]interface{})

Initialize prepares the pipeline for the execution (Run()). This function resolves the execution DAG, Configure()-s and Initialize()-s the items in it in the topological dependency order. `facts` are passed inside Configure(). They are mutable.

func (*Pipeline) Len

func (pipeline *Pipeline) Len() int

Len returns the number of items in the pipeline.

func (*Pipeline) RemoveItem

func (pipeline *Pipeline) RemoveItem(item PipelineItem)

RemoveItem deletes a PipelineItem from the pipeline. It leaves all the rest of the items intact.

func (*Pipeline) Run

func (pipeline *Pipeline) Run(commits []*object.Commit) (map[LeafPipelineItem]interface{}, error)

Run method executes the pipeline.

commits is a slice with the sequential commit history. It shall start from the root (ascending order).

Returns the mapping from each LeafPipelineItem to the corresponding analysis result. There is always a "nil" record with CommonAnalysisResult.

func (*Pipeline) SetFact

func (pipeline *Pipeline) SetFact(name string, value interface{})

SetFact sets the value of the fact with the specified name.

func (*Pipeline) SetFeature

func (pipeline *Pipeline) SetFeature(name string)

SetFeature sets the value of the feature with the specified name. See also: FeaturedPipelineItem.

func (*Pipeline) SetFeaturesFromFlags

func (pipeline *Pipeline) SetFeaturesFromFlags(registry ...*PipelineItemRegistry)

SetFeaturesFromFlags enables the features which were specified through the command line flags which belong to the given PipelineItemRegistry instance. See also: AddItem().

type PipelineItem

type PipelineItem interface {
	// Name returns the name of the analysis.
	Name() string
	// Provides returns the list of keys of reusable calculated entities.
	// Other items may depend on them.
	Provides() []string
	// Requires returns the list of keys of needed entities which must be supplied in Consume().
	Requires() []string
	// ListConfigurationOptions returns the list of available options which can be consumed by Configure().
	ListConfigurationOptions() []ConfigurationOption
	// Configure performs the initial setup of the object by applying parameters from facts.
	// It allows to create PipelineItems in a universal way.
	Configure(facts map[string]interface{})
	// Initialize prepares and resets the item. Consume() requires Initialize()
	// to be called at least once beforehand.
	Initialize(*git.Repository)
	// Consume processes the next commit.
	// deps contains the required entities which match Depends(). Besides, it always includes
	// "commit" and "index".
	// Returns the calculated entities which match Provides().
	Consume(deps map[string]interface{}) (map[string]interface{}, error)
}

PipelineItem is the interface for all the units in the Git commits analysis pipeline.

type PipelineItemRegistry

type PipelineItemRegistry struct {
	// contains filtered or unexported fields
}

PipelineItemRegistry contains all the known PipelineItem-s.

func (*PipelineItemRegistry) AddFlags

func (registry *PipelineItemRegistry) AddFlags(flagSet *pflag.FlagSet) (
	map[string]interface{}, map[string]*bool)

AddFlags inserts the cmdline options from PipelineItem.ListConfigurationOptions(), FeaturedPipelineItem().Features() and LeafPipelineItem.Flag() into the global "flag" parser built into the Go runtime. Returns the "facts" which can be fed into PipelineItem.Configure() and the dictionary of runnable analysis (LeafPipelineItem) choices. E.g. if "BurndownAnalysis" was activated through "-burndown" cmdline argument, this mapping would contain ["BurndownAnalysis"] = *true.

func (*PipelineItemRegistry) GetFeaturedItems

func (registry *PipelineItemRegistry) GetFeaturedItems() map[string][]FeaturedPipelineItem

GetFeaturedItems returns all FeaturedPipelineItem-s registered.

func (*PipelineItemRegistry) GetLeaves

func (registry *PipelineItemRegistry) GetLeaves() []LeafPipelineItem

GetLeaves returns all LeafPipelineItem-s registered.

func (*PipelineItemRegistry) GetPlumbingItems

func (registry *PipelineItemRegistry) GetPlumbingItems() []PipelineItem

GetPlumbingItems returns all non-LeafPipelineItem-s registered.

func (*PipelineItemRegistry) Register

func (registry *PipelineItemRegistry) Register(example PipelineItem)

Register adds another PipelineItem to the registry.

func (*PipelineItemRegistry) Summon

func (registry *PipelineItemRegistry) Summon(providesOrName string) []PipelineItem

Summon searches for PipelineItem-s which provide the specified entity or named after the specified string. It materializes all the found types and returns them.

type RenameAnalysis

type RenameAnalysis struct {
	// SimilarityThreshold adjusts the heuristic to determine file renames.
	// It has the same units as cgit's -X rename-threshold or -M. Better to
	// set it to the default value of 90 (90%).
	SimilarityThreshold int
	// contains filtered or unexported fields
}

RenameAnalysis improves TreeDiff's results by searching for changed blobs under different paths which are likely to be the result of a rename with subsequent edits. RenameAnalysis is a PipelineItem.

func (*RenameAnalysis) Configure

func (ra *RenameAnalysis) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*RenameAnalysis) Consume

func (ra *RenameAnalysis) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*RenameAnalysis) Initialize

func (ra *RenameAnalysis) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*RenameAnalysis) ListConfigurationOptions

func (ra *RenameAnalysis) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*RenameAnalysis) Name

func (ra *RenameAnalysis) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*RenameAnalysis) Provides

func (ra *RenameAnalysis) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*RenameAnalysis) Requires

func (ra *RenameAnalysis) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

type ShotnessAnalysis

type ShotnessAnalysis struct {
	XpathStruct string
	XpathName   string
	// contains filtered or unexported fields
}

ShotnessAnalysis contains the intermediate state which is mutated by Consume(). It should implement LeafPipelineItem.

func (*ShotnessAnalysis) Configure

func (shotness *ShotnessAnalysis) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*ShotnessAnalysis) Consume

func (shotness *ShotnessAnalysis) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*ShotnessAnalysis) Features

func (shotness *ShotnessAnalysis) Features() []string

Features which must be enabled for this PipelineItem to be automatically inserted into the DAG.

func (*ShotnessAnalysis) Finalize

func (shotness *ShotnessAnalysis) Finalize() interface{}

Finalize returns the result of the analysis. Further Consume() calls are not expected.

func (*ShotnessAnalysis) Flag

func (shotness *ShotnessAnalysis) Flag() string

Flag returns the command line switch which activates the analysis.

func (*ShotnessAnalysis) Initialize

func (shotness *ShotnessAnalysis) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*ShotnessAnalysis) ListConfigurationOptions

func (shotness *ShotnessAnalysis) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*ShotnessAnalysis) Name

func (shotness *ShotnessAnalysis) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*ShotnessAnalysis) Provides

func (shotness *ShotnessAnalysis) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*ShotnessAnalysis) Requires

func (shotness *ShotnessAnalysis) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

func (*ShotnessAnalysis) Serialize

func (shotness *ShotnessAnalysis) Serialize(result interface{}, binary bool, writer io.Writer) error

Serialize converts the analysis result as returned by Finalize() to text or bytes. The text format is YAML and the bytes format is Protocol Buffers.

type ShotnessResult

type ShotnessResult struct {
	Nodes    []NodeSummary
	Counters []map[int]int
}

ShotnessResult is returned by ShotnessAnalysis.Finalize() and represents the analysis result.

type Status

type Status struct {
	// contains filtered or unexported fields
}

Status is the something we would like to keep track of in File.Update().

func NewStatus

func NewStatus(data interface{}, update func(interface{}, int, int, int)) Status

NewStatus initializes a new instance of Status struct. It is needed to set the only two private fields which are not supposed to be replaced during the whole lifetime.

type TreeDiff

type TreeDiff struct {
	SkipDirs []string
	// contains filtered or unexported fields
}

TreeDiff generates the list of changes for a commit. A change can be either one or two blobs under the same path: "before" and "after". If "before" is nil, the change is an addition. If "after" is nil, the change is a removal. Otherwise, it is a modification. TreeDiff is a PipelineItem.

func (*TreeDiff) Configure

func (treediff *TreeDiff) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*TreeDiff) Consume

func (treediff *TreeDiff) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*TreeDiff) Initialize

func (treediff *TreeDiff) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*TreeDiff) ListConfigurationOptions

func (treediff *TreeDiff) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*TreeDiff) Name

func (treediff *TreeDiff) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*TreeDiff) Provides

func (treediff *TreeDiff) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*TreeDiff) Requires

func (treediff *TreeDiff) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

type UASTChange

type UASTChange struct {
	Before *uast.Node
	After  *uast.Node
	Change *object.Change
}

UASTChange is the type of the items in the list of changes which is provided by UASTChanges.

type UASTChanges

type UASTChanges struct {
	// contains filtered or unexported fields
}

UASTChanges is a structured analog of TreeDiff: it provides UASTs for every logical change in a commit. It is a PipelineItem.

func (*UASTChanges) Configure

func (uc *UASTChanges) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*UASTChanges) Consume

func (uc *UASTChanges) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*UASTChanges) Features

func (uc *UASTChanges) Features() []string

Features which must be enabled for this PipelineItem to be automatically inserted into the DAG.

func (*UASTChanges) Initialize

func (uc *UASTChanges) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*UASTChanges) ListConfigurationOptions

func (uc *UASTChanges) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*UASTChanges) Name

func (uc *UASTChanges) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*UASTChanges) Provides

func (uc *UASTChanges) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*UASTChanges) Requires

func (uc *UASTChanges) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

type UASTChangesSaver

type UASTChangesSaver struct {
	// OutputPath points to the target directory with UASTs
	OutputPath string
	// contains filtered or unexported fields
}

UASTChangesSaver dumps changed files and corresponding UASTs for every commit. it is a LeafPipelineItem.

func (*UASTChangesSaver) Configure

func (saver *UASTChangesSaver) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*UASTChangesSaver) Consume

func (saver *UASTChangesSaver) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*UASTChangesSaver) Features

func (saver *UASTChangesSaver) Features() []string

Features which must be enabled for this PipelineItem to be automatically inserted into the DAG.

func (*UASTChangesSaver) Finalize

func (saver *UASTChangesSaver) Finalize() interface{}

Finalize returns the result of the analysis. Further Consume() calls are not expected.

func (*UASTChangesSaver) Flag

func (saver *UASTChangesSaver) Flag() string

Flag for the command line switch which enables this analysis.

func (*UASTChangesSaver) Initialize

func (saver *UASTChangesSaver) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*UASTChangesSaver) ListConfigurationOptions

func (saver *UASTChangesSaver) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*UASTChangesSaver) Name

func (saver *UASTChangesSaver) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*UASTChangesSaver) Provides

func (saver *UASTChangesSaver) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*UASTChangesSaver) Requires

func (saver *UASTChangesSaver) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

func (*UASTChangesSaver) Serialize

func (saver *UASTChangesSaver) Serialize(result interface{}, binary bool, writer io.Writer) error

Serialize converts the analysis result as returned by Finalize() to text or bytes. The text format is YAML and the bytes format is Protocol Buffers.

type UASTExtractor

type UASTExtractor struct {
	Endpoint       string
	Context        func() (context.Context, context.CancelFunc)
	PoolSize       int
	Languages      map[string]bool
	FailOnErrors   bool
	ProcessedFiles map[string]int
	// contains filtered or unexported fields
}

UASTExtractor retrieves UASTs from Babelfish server which correspond to changed files in a commit. It is a PipelineItem.

func (*UASTExtractor) Configure

func (exr *UASTExtractor) Configure(facts map[string]interface{})

Configure sets the properties previously published by ListConfigurationOptions().

func (*UASTExtractor) Consume

func (exr *UASTExtractor) Consume(deps map[string]interface{}) (map[string]interface{}, error)

Consume runs this PipelineItem on the next commit data. `deps` contain all the results from upstream PipelineItem-s as requested by Requires(). Additionally, "commit" is always present there and represents the analysed *object.Commit. This function returns the mapping with analysis results. The keys must be the same as in Provides(). If there was an error, nil is returned.

func (*UASTExtractor) Features

func (exr *UASTExtractor) Features() []string

Features which must be enabled for this PipelineItem to be automatically inserted into the DAG.

func (*UASTExtractor) Initialize

func (exr *UASTExtractor) Initialize(repository *git.Repository)

Initialize resets the temporary caches and prepares this PipelineItem for a series of Consume() calls. The repository which is going to be analysed is supplied as an argument.

func (*UASTExtractor) ListConfigurationOptions

func (exr *UASTExtractor) ListConfigurationOptions() []ConfigurationOption

ListConfigurationOptions returns the list of changeable public properties of this PipelineItem.

func (*UASTExtractor) Name

func (exr *UASTExtractor) Name() string

Name of this PipelineItem. Uniquely identifies the type, used for mapping keys, etc.

func (*UASTExtractor) Provides

func (exr *UASTExtractor) Provides() []string

Provides returns the list of names of entities which are produced by this PipelineItem. Each produced entity will be inserted into `deps` of dependent Consume()-s according to this list. Also used by hercules.Registry to build the global map of providers.

func (*UASTExtractor) Requires

func (exr *UASTExtractor) Requires() []string

Requires returns the list of names of entities which are needed by this PipelineItem. Each requested entity will be inserted into `deps` of Consume(). In turn, those entities are Provides() upstream.

Directories

Path Synopsis
cmd
contrib

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL