gitproviderreceiver

package module
v0.0.0-...-7513c8a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 12, 2024 License: Apache-2.0 Imports: 12 Imported by: 0

README

Git Provider Receiver

Status
Stability development: metrics
Distributions liatrio
Issues Open issues Closed issues

The Git Provider receiver scrapes data from Git vendors.

As a starting point, this receiver can infer many of the same core git metrics across vendors, while being able to receive additional data specific to vendors.

The current default set of metrics common across all vendors can be found in documentation.md.

These default metrics can be used as leading indicators to the DORA metrics; helping provide insight into modern-day engineering practices.

GitHub Metrics

The current metrics available via scraping from GitHub are:

  • Repository count
  • Repository branch time
  • Repository branch count
  • Repository contributor count
  • Repository pull request open time
  • Repository pull request time to merge
  • Repository pull request time to approval
  • Repository pull request count | stores an attribute of pull_request_state equal to open or merged

Note: Some metrics may be disabled by default and have to be explicitly enabled. For example, the repository contributor count metric is one such metric. This is because this metric relies on the REST API which is subject to lower rate limits.

GitLab Metrics

The current metrics available via scraping from GitLab are:

  • Repository count
  • Repository branch time
  • Repository branch count
  • Repository contributor count
  • Repository pull request time
  • Repository pull request merge time
  • Repository pull request approval time
  • Repository pull request deployment time

Getting Started

The collection interval is common to all scrapers and is set to 30 seconds by default.

Note: Generally speaking, if the vendor allows for anonymous API calls, then you won't have to configure any authentication, but you may only see public repositories and organizations.

gitprovider:
    collection_interval: <duration> #default = 30s
    scrapers:
        <scraper1>:
        <scraper2>:
        ...

A more complete example using the GitHub & GitLab scrapers with authentication is as follows:

extensions:
    basicauth/github:
        client_auth:
            username: ${env:GH_USER}
            password: ${env:GH_PAT}
    bearertokenauth/gitlab:
        token: ${env:GITLAB_PAT}

receivers:
    gitprovider:
        initial_delay: 1s
        collection_interval: 60s
        scrapers:
            github:
                metrics:
                    git.repository.contributor.count:
                        enabled: true
                github_org: myfancyorg
                #optional query override, defaults to "{org,user}:<github_org>"
                search_query: "org:myfancyorg topic:o11yalltheway"
                endpoint: "https://selfmanagedenterpriseserver.com"
                auth:
                    authenticator: basicauth/github
service:
    extensions: [basicauth/github, bearertokenauth/gitlab]
    pipelines:
        metrics:
            receivers: [..., gitprovider]
            processors: []
            exporters: [...]

This receiver is developed upstream in the liatrio-otel-collector distribution where a quick start exists with an example config

The available scrapers are:

Scraper Description
[github] Git Metrics from GitHub
[gitlab] Git Metrics from GitLab

Rate Limiting

Given this receiver scrapes data from Git providers, it is subject to rate limiting. The following section will give some sensible defaults for each git provider.

GitHub

The GitHub scraper within this receiver primarily interacts with GitHub's GraphQL API. The default rate limit for GraphQL API is 5,000 points per hour (unless your PAT is associated to a GitHub Enterprise Cloud organization, then your limit is 10,000). The receiver on average costs 4 points per repository, allowing it to scrape up to 1250 repositories per hour under normal conditions.

Given this average cost a good collection interval in seconds is:

\text{collection\_interval (seconds)} = \frac{4n}{r/3600} + 300
\begin{aligned}
    \text{where:} \\
    n &= \text{number of repositories} \\
    r &= \text{hourly rate limit} \\
\end{aligned}

$r$ is likely 5000 but there are factors that can change this, for more information see GitHub's docs. The $300$ is a buffer to account for this being a rough estimate and to account for the initial query to grab repositories.

In addition to these primary rate limits, GitHub enforces secondary rate limits to prevent abuse and maintain API availability. The following secondary limit is particularly relevant:

  • Concurrent Requests Limit: The API allows no more than 100 concurrent requests. This limit is shared across the REST and GraphQL APIs. Since the scraper creates a goroutine per repository, having more than 100 repositories returned by the search_query will result in exceeding this limit.

It is recommended to use the search_query config option to limit the number of repositories that are scraped. We recommend one instance of the receiver per team (note: team is not a valid quantifier when searching repositories topic is). Reminder that each instance of the receiver should have its own corresponding token for authentication as this is what rate limits are tied to.

In summary, we recommend the following:

  • One instance of the receiver per team
  • Each instance of the receiver should have its own token
  • Leverage search_query config option to limit repositories returned to 100 or less per instance
  • collection_interval should be long enough to avoid rate limiting (see above formula), recall these are lagging indicators so a longer interval is acceptable.

Additional Resources:

Updating tests

After using make gen you may find your tests failing. This could be due to the expected_happy_path.yaml missing some of the changes from your code, or being out of order.

You can resolve this manually by updating the file, or by regenerating it by uncommenting the lines starting with //golden.WriteMetrics in your test files, and rerunning the unit tests. Comment the lines out again and commit the new changes.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewFactory

func NewFactory() receiver.Factory

NewFactory creates a factory for the git provider receiver

Types

type Config

type Config struct {
	scraperhelper.ControllerConfig `mapstructure:",squash"`
	Scrapers                       map[string]internal.Config `mapstructure:"scrapers"`
	metadata.MetricsBuilderConfig  `mapstructure:",squash"`
}

Config that is exposed to this github receiver through the OTEL config.yaml

func (*Config) Unmarshal

func (cfg *Config) Unmarshal(componentParser *confmap.Conf) error

Unmarshal a config.Parser into the config struct.

func (*Config) Validate

func (cfg *Config) Validate() error

Validate the configuration passed through the OTEL config.yaml

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL