gitstats

package module
v0.0.0-...-3037a55 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 10, 2024 License: EUPL-1.2 Imports: 21 Imported by: 0

README

git-stats displays aggregate statistics for git repos.

There's a public instance at https://gitstats.arp242.net, with various popular repos and some things I found interesting, updated semi-regularly. There isn't really any way to add something to that (yet), partly because this isn't really finished yet, and partly because importing a repo can be relatively slow so I need to build something to manage that a bit better. But if you ask nicely I can add something.

Note there are MANY inaccuracies with all of this:

  • A "commit" can mean different things in different projects; I've seen people merge small bugfixes with 20 commits, and I've seen people do a huge feature in a single commit.

  • "Number of commits" doesn't tell the full story; Linus Torvalds is ranked #57 for Linux and he's not even the top Linus – Linus Walleij is ranked #22 with 4,338 commits (vs. Torvalds' 2,586).

  • Domains may be misleading; for example the top two committers for ElasticSearch work for Elastic, but neither use an @elasticsearch.com email address.

  • Committing code isn't the only way to contribute to a project.

  • Some commits aren't code; for example README typo fixes, i18n updates, and things like that. For many projects this is a negligible amount of commits, but for some it's a large number. It filters some of these out but there is no guarantee it filters everything.

  • People can use multiple accounts to commit to a project. It tries to merge as much as reasonably safe, but this is not 100%. There is also a small chance of a false positive in cases where two committers have exactly the same name (projects where two notable committers have exactly identical names are probably very rare though).

  • People's affiliations are not fixed. There are people who committed to Go before they were employed at Google, while they were employed at Google, and after they were employed at Google. Or any combination of the above.

  • Some projects use git "incorrectly" and don't record authorship information in the Author header. For example PostgreSQL, Vim, NeoVim, bash, probably more.

But all of that said, I still feel it's useful. Linux is not the average project, there tends to be a large amount of overlap between the top contributors and the people doing reviews, support, and the like, and the rest can be solved by actually look at the data and use your brain before uncritically accepting any numbers here.

As an aside, GitHub's graph isn't accurate as it only shows commits that can be linked to a GitHub account and excludes everything else. It's not too uncommon people commit with an email address not associated with their account, have their accounts deleted, or things like that. This can be fixed with a .mailmap file, but many don't bother.

A good example is mpv: compare with git-stats, where a major mpv author deleted their GitHub account, and many of the MPlayer/MPlayer2 authors don't have a GitHub account. The "top" author on GitHub is actually #11 (and the margin is huge). The chart also starts in 2010 instead of 2001. All combined it gives a completely misleading picture.

Also this codebase isn't super great; I quickly wrote much of this about 4 years ago to prove a point about something, and I quickly hacked up some more stuff to look at the contributors for the recent Redis license change. There's tons of obvious things that can be improved, made less ugly, etc.

Installation

Install from source with:

% go install zgo.at/git-stats@latest

Or just use go build (or install) after a git clone.

Other than a PostgreSQL database, there are no other dependencies.

Usage

You will need a PostgreSQL database, as getting the data out of git is too slow for many repos, and this drastically reduces storage requirements.

You can also use git-cache on local repositories, or it can automatically clone remote repositories to a cache directory.

The git repo is stored in a cache directory (default: /tmp/git-stats). This is /tmp by default as it's ephemeral: you don't need to keep the cache around; the last commit will be recorded, and on an update it will fetch a shallow clone from there.

Insert data in the database:

% git-stats update https://github.com/golang/go

Or a local repo:

% git-stats update ~/code/my-git-repo

There are two interfaces: the CLI and web UI. Start the web UI with:

% git-stats serve

And most of the interface should be self-explanatory.

You can mix the web and CLI usage: both are backed by the same database.

CLI usage

You can get stats with e.g.:

% git-stats author https://github.com/golang/go

Or with the short name:

% git-stats author go

The authors command is the main meat, as this was my primary interest, but there are also some others:

% git-stats ls go
% git-stats activity go

The web interface has a few more features though, basically just out of laziness.

Alternatives

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DBFiles embed.FS
View Source
var TplFiles embed.FS

Functions

func CountCommits

func CountCommits(ctx context.Context, repoID int32, rng ztime.Range) (int, error)

func FindOrInsertAuthor

func FindOrInsertAuthor(ctx context.Context, c Commit, names, emails map[string]int64) (int64, error)

Find author by name or email, or insert if it doesn't exist.

TODO: treat as case-insensitive, now we have:

Eric Anholt <anholt@freebsd.org>, <anholt@FreeBSD.org>
walter harms, Walter Harms <wharms@bfs.de>

And that's not too useful.

func FindOrInsertFile

func FindOrInsertFile(ctx context.Context, c Commit, files map[string]int64) ([]int64, Ints, Ints, error)

Types

type Author

type Author struct {
	ID     int64   `db:"author_id,id"`
	RepoID int32   `db:"repo_id"`
	Names  Strings `db:"names"`
	Emails Strings `db:"emails"`

	Commits int    `db:"-"`
	Added   int    `db:"-"`
	Removed int    `db:"-"`
	First   string `db:"-"`
	Last    string `db:"-"`
}

func (*Author) ByID

func (a *Author) ByID(ctx context.Context, repoID int32, authorID int64) error

type AuthorStat

type AuthorStat struct {
	AuthorID    int       `db:"author_id"`
	Commits     int       `db:"commits"`
	Added       int       `db:"added"`
	Removed     int       `db:"removed"`
	CommitPerc  float32   `db:"commit_perc"`
	AddedPerc   float32   `db:"added_perc"`
	RemovedPerc float32   `db:"removed_perc"`
	Names       Strings   `db:"names"`
	Emails      Strings   `db:"emails"`
	First       time.Time `db:"first"`
	Last        time.Time `db:"last"`

	Domains []string `db:"-"`
}

func (*AuthorStat) ByID

func (s *AuthorStat) ByID(ctx context.Context, repoID int32, authorID int64, rng ztime.Range) error

type AuthorStats

type AuthorStats []AuthorStat

func (AuthorStats) Domains

func (s AuthorStats) Domains() []Domain

func (*AuthorStats) List

func (s *AuthorStats) List(ctx context.Context, repoID int32, order string, rng ztime.Range) error

type Authors

type Authors []Author

func (*Authors) List

func (a *Authors) List(ctx context.Context, repoID int32) error

type Commit

type Commit struct {
	RepoID     int32        `db:"repo_id"`
	Hash       Hash         `db:"hash,id"`
	Date       time.Time    `db:"date"`
	AuthorID   int          `db:"author_id"`
	Exclude    sql.NullBool `db:"exclude"`
	Files      Strings      `db:"files"`
	Added      Ints         `db:"added"`
	Removed    Ints         `db:"removed"`
	AddedSum   int          `db:"added_sum"`
	RemovedSum int          `db:"removed_sum"`
	Subject    string       `db:"subject"`

	Email    string       `db:"-"`
	Name     string       `db:"-"`
	UpdFiles []CommitFile `db:"files"`
}

func (*Commit) ShouldExclude

func (c *Commit) ShouldExclude() bool

Also: toml 782628a7 (gofmt -s)

type CommitFile

type CommitFile struct {
	Path, Added, Removed string
	Exclude              sql.NullBool
}

func (*CommitFile) ShouldExclude

func (c *CommitFile) ShouldExclude() bool

type CommitStat

type CommitStat struct {
	Date    time.Time `db:"date" json:"date"`
	Commits int       `db:"commits" json:"commits"`
	Added   int       `db:"added" json:"added"`
	Removed int       `db:"removed" json:"removed"`
}

type CommitStats

type CommitStats []CommitStat

func (*CommitStats) List

func (s *CommitStats) List(ctx context.Context, repoID int32, rng ztime.Range, authorID int64, groupMonth bool) error

type Commits

type Commits []Commit

func (*Commits) ByAuthor

func (c *Commits) ByAuthor(ctx context.Context, repoID int32, authorID int64, rng ztime.Range) error

type Domain

type Domain struct {
	Domain string
	Count  int
}

type Event

type Event struct {
	EventID int32     `db:"event_id,id" json:"-"`
	RepoID  int32     `db:"repo_id" json:"-"`
	Name    string    `db:"name" json:"name"`
	Date    time.Time `db:"date" json:"date"`
	Kind    EventKind `db:"kind" json:"kind"`
}

func (*Event) Find

func (e *Event) Find(ctx context.Context) error

func (*Event) Insert

func (e *Event) Insert(ctx context.Context) error

type EventKind

type EventKind byte

func (EventKind) String

func (e EventKind) String() string

type Events

type Events []Event

func (*Events) List

func (t *Events) List(ctx context.Context, repoID int32) error

type File

type File struct {
	ID      int64  `db:"file_id,id"`
	RepoID  int    `db:"repo_id"`
	Path    string `db:"path"`
	Exclude bool   `db:"exclude"`
}

type FileStat

type FileStat []struct {
	ID         int    `db:"file_id"`
	Path       string `db:"path"`
	NumCommits string `db:"num_commits"`
}

func (*FileStat) List

func (s *FileStat) List(ctx context.Context, repoID int32) error

TODO: positional: "/foo/bar%" -depth group stuff by max depth Add authors info

type Files

type Files []File

func (*Files) List

func (f *Files) List(ctx context.Context, repoID int32) error

func (*Files) Map

func (f *Files) Map() map[string]int64

Map returns a path → file_id map.

type Hash

type Hash [20]byte

func NewHash

func NewHash(s string) Hash
func (h Hash) Link(repoURL string) template.HTML

func (*Hash) Scan

func (h *Hash) Scan(v any) error

func (Hash) Short

func (h Hash) Short() string

func (Hash) String

func (h Hash) String() string

func (Hash) URL

func (h Hash) URL(repoURL string) template.URL

func (Hash) Value

func (h Hash) Value() (driver.Value, error)

type Ints

type Ints []int64

func (*Ints) Scan

func (l *Ints) Scan(v any) error

func (Ints) String

func (l Ints) String() string

func (Ints) Sum

func (l Ints) Sum() int64

func (Ints) Value

func (l Ints) Value() (driver.Value, error)

type Repo

type Repo struct {
	ID            int32      `db:"repo_id,id"`
	Path          string     `db:"path"`
	Name          string     `db:"name"`
	FirstCommit   *Hash      `db:"first_commit"`
	LastCommit    *Hash      `db:"last_commit"`
	FirstCommitAt *time.Time `db:"first_commit_at"`
	LastCommitAt  *time.Time `db:"last_commit_at"`

	Commits int `db:"commits,noinsert"`
}

func (*Repo) ByName

func (r *Repo) ByName(ctx context.Context, name string) error

func (*Repo) ByPath

func (r *Repo) ByPath(ctx context.Context, path string) error

func (*Repo) Find

func (r *Repo) Find(ctx context.Context, name string) error

func (*Repo) Insert

func (r *Repo) Insert(ctx context.Context) error

func (Repo) Remote

func (r Repo) Remote() bool

func (*Repo) Update

func (r *Repo) Update(ctx context.Context) error

type Repos

type Repos []Repo

func (*Repos) List

func (s *Repos) List(ctx context.Context) error

type Strings

type Strings []string

func (Strings) Join

func (l Strings) Join(sep string) string

func (*Strings) Scan

func (l *Strings) Scan(v any) error

func (Strings) String

func (l Strings) String() string

func (Strings) Value

func (l Strings) Value() (driver.Value, error)

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL