Documentation
Overview ¶
Package git implements derivation of a file graph from git log.
Distance ¶
The distance is derived from the probability that two files appeared in the same commit. The core idea is that relevant files tend to be modified together.
Distance(x, y) = -log(P(y is relevant to x)) P(y is relevant to x) := sum(1/(len(c.files)-1) for c in x.commits if y in c.files) / len(x.commits)
or in English, distance from x to y is -log of the probability that y appears in a commit that touches x and is chosen randomly and independently. There are more nuances to this formula, e.g. file removals are not counted towards len(commit.files), and commits with len(files) = 1 or len(files) > limit are ignored.
Note that this formula penalizes large commits. The more files are modified in a commit, the weaker is the strength of its signal.
This graph defines distance only between files, and not directories.
Index ¶
Constants ¶
Variables ¶
Functions ¶
Types ¶
type EdgeReader ¶
type EdgeReader struct { // Reversed indicates that incoming edges must be read instead of outgoing. // In other words, read the edges of the tranposed graph. Reversed bool }
EdgeReader implements filegraph.EdgeReader. It works only with nodes returned by Graph.Node().
type Graph ¶
type Graph struct { // Commit is the git commit that the graph state corresponds to. Commit string // contains filtered or unexported fields }
Graph is a file graph based on the git history.
The graph represents aggregated history of all file changes in the repo, rather than the state of the repo at a single point of time. In particular, the graph may include nodes for files that no longer exist. It is generally not possible to tell if a node is a file or a directory, because it might have been a file at one point of time, and a directory at another.
TODO(nodir): introduce a decay function to remove old nodes/edges.
func Load ¶
Load returns a file graph for a git repository. Caches the graph under the .git directory. May take minutes and log progress if the cache is cold.
If the cache exists, but no longer matches the current ref commit, then applies new changes to the loaded graph and updates the cache.
func (*Graph) Node ¶
Node returns a node by its name. Returns nil if the node is not found. See also Node.Name().
Idempotent: calling many times with the same name returns the same Node object.
func (*Graph) Update ¶
Update updates the graph based on changes in a git repository. This is the only way to mutate the Graph. Applies all changes reachable from rev, but not from g.Commit, and updates g.Commit.
If returns an error which wasn't returned by the callback, then it is possible that the graph is corrupted.
func (*Graph) Write ¶
Write writes the graph to w. It is the opposite of (*Graph).Read().
Spec:
graph = header version git-commit-hash root total-number-of-edges root-edges header = 54 version = 0 root = node node = prob-sum-denominator number-of-children children-sorted-by-base-name children-sorted-by-base-name = child* child = base-name node root-edges = node-edges node-edges = number-of-edges edge* edge = index-of-the-adjacent-node-as-found-in-the-file prob-sum edges-of-children-sorted-by-base-name edges-of-children-sorted-by-base-name = edge* where all integer types are encoded as varint all strings are encoded as length-prefixed utf8 `*` means "0 or more"
type LoadOptions ¶
type LoadOptions struct { UpdateOptions // Ref is the git ref to load the graph for. // Defaults to refs/heads/main. // // If it is refs/heads/main, but it does not exist, then falls back to // refs/heads/master. Ref string }
LoadOptions are options for Load() function.
type UpdateOptions ¶
type UpdateOptions struct { // Callback, if not nil, is called each time after each commit is processed // and Graph.Commit is updated. Callback func() error // MaxCommitSize is the maximum number of files touched by a commit. // Commits that exceed this limit are ignored. // The rationale is that large commits provide a weak signal of file // relatedness and are expensive to process, O(N^2). MaxCommitSize int }
UpdateOptions are options for Graph.Update().