gitproviderreceiver

package module

v0.0.0-...-7513c8a Latest Latest Go to latest Published: Apr 12, 2024 License: Apache-2.0 Imports: 12 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/liatrio/liatrio-otel-collector

README ¶

Git Provider Receiver

Status
Stability	development: metrics
Distributions	liatrio
Issues

The Git Provider receiver scrapes data from Git vendors.

As a starting point, this receiver can infer many of the same core git metrics across vendors, while being able to receive additional data specific to vendors.

The current default set of metrics common across all vendors can be found in documentation.md.

These default metrics can be used as leading indicators to the DORA metrics; helping provide insight into modern-day engineering practices.

GitHub Metrics

The current metrics available via scraping from GitHub are:

Repository count
Repository branch time
Repository branch count
Repository contributor count
Repository pull request open time
Repository pull request time to merge
Repository pull request time to approval
Repository pull request count | stores an attribute of pull_request_state equal to open or merged

Note: Some metrics may be disabled by default and have to be explicitly enabled. For example, the repository contributor count metric is one such metric. This is because this metric relies on the REST API which is subject to lower rate limits.

GitLab Metrics

The current metrics available via scraping from GitLab are:

Repository count
Repository branch time
Repository branch count
Repository contributor count
Repository pull request time
Repository pull request merge time
Repository pull request approval time
Repository pull request deployment time

Getting Started

The collection interval is common to all scrapers and is set to 30 seconds by default.

Note: Generally speaking, if the vendor allows for anonymous API calls, then you won't have to configure any authentication, but you may only see public repositories and organizations.

gitprovider:
    collection_interval: <duration> #default = 30s
    scrapers:
        <scraper1>:
        <scraper2>:
        ...

A more complete example using the GitHub & GitLab scrapers with authentication is as follows:

extensions:
    basicauth/github:
        client_auth:
            username: ${env:GH_USER}
            password: ${env:GH_PAT}
    bearertokenauth/gitlab:
        token: ${env:GITLAB_PAT}

receivers:
    gitprovider:
        initial_delay: 1s
        collection_interval: 60s
        scrapers:
            github:
                metrics:
                    git.repository.contributor.count:
                        enabled: true
                github_org: myfancyorg
                #optional query override, defaults to "{org,user}:<github_org>"
                search_query: "org:myfancyorg topic:o11yalltheway"
                endpoint: "https://selfmanagedenterpriseserver.com"
                auth:
                    authenticator: basicauth/github
service:
    extensions: [basicauth/github, bearertokenauth/gitlab]
    pipelines:
        metrics:
            receivers: [..., gitprovider]
            processors: []
            exporters: [...]

This receiver is developed upstream in the liatrio-otel-collector distribution where a quick start exists with an example config

The available scrapers are:

Scraper	Description
[github]	Git Metrics from GitHub
[gitlab]	Git Metrics from GitLab

Rate Limiting

Given this receiver scrapes data from Git providers, it is subject to rate limiting. The following section will give some sensible defaults for each git provider.

GitHub

The GitHub scraper within this receiver primarily interacts with GitHub's GraphQL API. The default rate limit for GraphQL API is 5,000 points per hour (unless your PAT is associated to a GitHub Enterprise Cloud organization, then your limit is 10,000). The receiver on average costs 4 points per repository, allowing it to scrape up to 1250 repositories per hour under normal conditions.

Given this average cost a good collection interval in seconds is:

\text{collection\_interval (seconds)} = \frac{4n}{r/3600} + 300

\begin{aligned}
    \text{where:} \\
    n &= \text{number of repositories} \\
    r &= \text{hourly rate limit} \\
\end{aligned}

$r$ is likely 5000 but there are factors that can change this, for more information see GitHub's docs. The $300$ is a buffer to account for this being a rough estimate and to account for the initial query to grab repositories.

In addition to these primary rate limits, GitHub enforces secondary rate limits to prevent abuse and maintain API availability. The following secondary limit is particularly relevant:

Concurrent Requests Limit: The API allows no more than 100 concurrent requests. This limit is shared across the REST and GraphQL APIs. Since the scraper creates a goroutine per repository, having more than 100 repositories returned by the search_query will result in exceeding this limit.

It is recommended to use the search_query config option to limit the number of repositories that are scraped. We recommend one instance of the receiver per team (note: team is not a valid quantifier when searching repositories topic is). Reminder that each instance of the receiver should have its own corresponding token for authentication as this is what rate limits are tied to.

In summary, we recommend the following:

One instance of the receiver per team
Each instance of the receiver should have its own token
Leverage search_query config option to limit repositories returned to 100 or less per instance
collection_interval should be long enough to avoid rate limiting (see above formula), recall these are lagging indicators so a longer interval is acceptable.

Additional Resources:

Updating tests

After using make gen you may find your tests failing. This could be due to the expected_happy_path.yaml missing some of the changes from your code, or being out of order.

You can resolve this manually by updating the file, or by regenerating it by uncommenting the lines starting with //golden.WriteMetrics in your test files, and rerunning the unit tests. Comment the lines out again and commit the new changes.

Documentation ¶

Index ¶

func NewFactory() receiver.Factory
type Config
- func (cfg *Config) Unmarshal(componentParser *confmap.Conf) error
- func (cfg *Config) Validate() error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewFactory ¶

func NewFactory() receiver.Factory

NewFactory creates a factory for the git provider receiver

Types ¶

type Config ¶

type Config struct {
	scraperhelper.ControllerConfig `mapstructure:",squash"`
	Scrapers                       map[string]internal.Config `mapstructure:"scrapers"`
	metadata.MetricsBuilderConfig  `mapstructure:",squash"`
}

Config that is exposed to this github receiver through the OTEL config.yaml

func (*Config) Unmarshal ¶

func (cfg *Config) Unmarshal(componentParser *confmap.Conf) error

Unmarshal a config.Parser into the config struct.

func (*Config) Validate ¶

func (cfg *Config) Validate() error

Validate the configuration passed through the OTEL config.yaml

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
metadata
scraper/githubscraper
scraper/gitlabscraper

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL