Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type FileHistory ¶
type FileHistory struct {
Path string
CommitCount int
FirstCommitAt int64
LastCommitAt int64
AuthorCount int
LastAuthor string
LastSubject string
}
FileHistory holds git commit statistics for a single file.
func CollectHistories ¶ added in v0.3.0
func CollectHistories(gitRoot string, relPaths []string, workers int) []FileHistory
CollectHistories collects a FileHistory for every relPath under gitRoot, fanning the per-file `git log --follow` forks across up to `workers` goroutines. On a large versioned corpus this is the dominant index cost: each fork is CPU-bound `--follow` rename detection that runs in a child process, so a serial loop pegs a single core while the rest idle (measured on a 64k-commit vscode clone, 383 doc files: ~304s wall serial vs ~32s with workers=NumCPU on 14 cores — 9.4×, file_history rows bit-identical to serial).
CollectHistory is a pure per-file function with no shared state, so each goroutine writes its own disjoint results slot — no mutex, no batcher (unlike similarity.runPairwiseWorkers, whose edge writes must funnel through one SQLite writer). Rows are returned in the same order as relPaths so callers can keep their serial UpsertFileHistory loop unchanged. workers <= 1, an empty list, or a single path runs serially; workers is clamped to len.
Globally fork-bounded: every git child this spawns is gated by the package-level forkSem (cap NumCPU), so total concurrent `git log` children stay ≤ NumCPU even when multiple callers fan out at once. Both call sites rely on this — the single-store index flush (one CollectHistories at workers=NumCPU) and the multi-project workspace IndexAll (one CollectHistories per project, with NumCPU projects in flight). The per-project `workers` here can therefore be NumCPU regardless of how many projects run concurrently: it only sets how wide each project *requests*, never how many forks actually run (forkSem decides that). Blocked goroutines waiting on the budget are cheap; only ≤ NumCPU forks are ever live.
func CollectHistory ¶
func CollectHistory(gitRoot, relPath string) FileHistory
CollectHistory runs git log to gather change history for relPath within gitRoot. Returns a zero-value FileHistory (CommitCount == 0) on any error: git not installed, directory not a git repo, or file untracked.