collector

package
v0.3.16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 22, 2026 License: MPL-2.0 Imports: 20 Imported by: 0

Documentation

Index

Constants

View Source
const (
	BehaviorWhenCommitIsAddedKeepApprovalsId = iota + 1
	BehaviorWhenCommitIsAddedRemoveCodeOwnerApprovalsId
	BehaviorWhenCommitIsAddedRemoveApprovalsId
)

Behavior when commit is added constants

View Source
const (
	BehaviorWhenCommitIsAddedKeepApprovalsText   = "Keep approvals"
	BehaviorWhenCommitIsAddedRemoveCodeOwnerText = "Remove approvals by Code Owners if their files changed"
	BehaviorWhenCommitIsAddedRemoveApprovalsText = "Remove all approvals"
)

Behavior when commit is added text values

View Source
const (
	SquashOptionNever      = "never"       // Never squash
	SquashOptionAlways     = "always"      // Always squash
	SquashOptionDefaultOn  = "default_on"  // Squash by default (can be turned off)
	SquashOptionDefaultOff = "default_off" // Don't squash by default (can be turned on)
)

GitLab squash option constants

View Source
const DataCollectionTypeGitlabPipelineImageVersion = "0.2.0"
View Source
const DataCollectionTypeGitlabPipelineOriginVersion = "0.2.0"
View Source
const (
	DataCollectionTypeGitlabProtectionVersion = "0.2.0"
)
View Source
const EnvDisableGitHubAPI = "PLUMBER_DISABLE_GITHUB_API"

EnvDisableGitHubAPI, when set to a truthy value, forces the GitHub metadata client into degraded mode regardless of gh auth state. Set to "1" by the test suite to keep unit tests offline and fast; production code does not read this variable.

Variables

View Source
var ErrAuthRequired = fmt.Errorf(
	"GitHub authentication required for upstream-fetch mode (--github-url). " +
		"Set up one of:\n" +
		"  export GH_TOKEN=<token>          # personal token (see README §Step 3 for scope guidance)\n" +
		"  export GITHUB_TOKEN=<token>      # auto-set in GitHub Actions runners\n" +
		"  gh auth login                    # recommended for local dev")

ErrAuthRequired is the actionable error surfaced when go-gh cannot resolve any auth credential. The message points the user at the three supported sources and the README section that documents scopes. Exported so cmd/ and control/ layers can detect the sentinel via errors.Is and short-circuit their normal wrap/log behaviour — a redundant logrus error log on top of cobra's "Error:" prefix on top of "analysis failed:" on top of "github api client:" produces a frame stack instead of the actionable message we want.

Functions

func CollectOverriddenJobs added in v0.3.0

func CollectOverriddenJobs(o *GitlabPipelineOriginDataFull, data *GitlabPipelineOriginData) []ir.OverriddenJob

CollectOverriddenJobs returns the jobs inherited from origin that were locally redefined with forbidden CI/CD keys. The IR uses it to expose override metadata to Rego policies; the PBOM generator reuses it so both paths share the same rule for what counts as an override.

func FetchGitHubBranchProtection added in v0.3.0

func FetchGitHubBranchProtection(host, owner, repo string, opts BranchFetchOptions) ([]ir.Branch, error)

FetchGitHubBranchProtection resolves branch-protection state for the names the caller asks about, with pagination/cost optimised for the typical "I just want main protected" config. See BranchFetchOptions for how ExactNames and Listing combine.

host is the GitHub API host (empty → api.github.com; non-empty → GHES). Auth is consumed from the same go-gh chain used elsewhere (GH_TOKEN / GH_ENTERPRISE_TOKEN / gh auth login). Without auth, or when the token lacks `repo` / Administration:read scope, the API returns 403/404 — those degrade silently to whatever subset we have already collected (the rego rule then sees fewer branches and may emit fewer findings; quiet is preferable to crash for a partial- data control).

Mapping decisions GitHub → IR shape:

  • branch.protected = true when the API marks the branch as such (the listing endpoint already merges classic Branch Protection and the newer Repository Rulesets, so this flag is correct regardless of which mechanism the repo uses).
  • allowForcePush = api.AllowForcePushes.Enabled.
  • codeOwnerApprovalRequired = api.RequiredPullRequestReviews .RequireCodeOwnerReviews.
  • min*AccessLevel: deliberately left 0 on GitHub. GitLab uses a numeric 0..60 access ladder where 0 = "no one allowed" (strictest). The legacy ISSUE-505 rule treats config min=0 as "always violates", which would false-positive on every GitHub branch that simply requires PR reviews. GitHub has no equivalent ladder; encoding an approximation produced misleading findings. The other ISSUE-505 reasons (allowForcePush, codeOwnerApprovalRequired) still apply.

func FetchGitHubDefaultBranch added in v0.3.0

func FetchGitHubDefaultBranch(host, owner, repo string) (string, error)

FetchGitHubDefaultBranch resolves the repo's default branch name via the REST API. Returns an empty string with no error when the repo can't be queried (degraded mode), which keeps the `defaultMustBeProtected` rule a silent no-op rather than a noisy crash.

func ParseGitlabComponentPath

func ParseGitlabComponentPath(path string, instanceURL string) (string, string, string)

ParseGitlabComponentPath parses a GitLab component path to extract: 1. The instance (if any) 2. The clean path without instance prefix 3. The version (if any)

func ScanGitHubWorkflows added in v0.3.0

func ScanGitHubWorkflows(projectPath, defaultBranch, rootDir, apiHost string, enrichActionMetadata bool) (pipeline *ir.NormalizedPipeline, partialErrors []error, err error)

ScanGitHubWorkflows reads every .yml/.yaml file under <rootDir>/.github/workflows/ and aggregates them into a single NormalizedPipeline. Job names are namespaced by the workflow file base name ("ci/lint", "release/build", ...) so two workflows can expose identically-named jobs without clashing in the IR.

A missing workflows directory is not an error: the returned pipeline simply carries no jobs. Individual unreadable or unparseable files are returned in partialErrors so the caller can surface them without aborting the whole scan.

func ScanGitHubWorkflowsRemote added in v0.3.0

func ScanGitHubWorkflowsRemote(host, owner, repo, ref string, enrichActionMetadata bool, progressFn ProgressFunc) (*ir.NormalizedPipeline, []error, error)

ScanGitHubWorkflowsRemote fetches `.github/workflows/*.{yml,yaml}` from a GitHub project via the Contents API and runs them through the same parser as the local scanner. Used by `plumber analyze --project owner/repo` (with optional --github-url for GHES) when the user is not inside a local checkout.

host empty → api.github.com. ref empty → repo's default branch. Auth resolves via the same go-gh chain the metadata client uses (GH_TOKEN / GH_ENTERPRISE_TOKEN / GITHUB_TOKEN / gh auth login).

Repo-side artefacts that need a local checkout (Dockerfiles, dependabot.yml, SECURITY.md, Renovate config) are NOT collected in remote mode — controls that depend on them simply see absent inputs and produce no findings. Same degraded-mode contract as missing API auth elsewhere.

func ScanGitHubWorkflowsWithProgress added in v0.3.0

func ScanGitHubWorkflowsWithProgress(projectPath, defaultBranch, rootDir, apiHost string, enrichActionMetadata bool, progressFn ProgressFunc) (pipeline *ir.NormalizedPipeline, partialErrors []error, err error)

ScanGitHubWorkflowsWithProgress mirrors ScanGitHubWorkflows but notifies the caller through progressFn as it works. The progress total is sized so the bar advances monotonically end-to-end:

step 1                 Scanning workflow files
step 2..(1+N)          Resolving action <n>      (N unique refs)
step 2+N               Scan complete

The last step (policy evaluation) is reported by the caller (RunGitHubAnalysis) using the same total so the bar keeps climbing. progressFn may be nil; callers that don't care about progress should call the plain ScanGitHubWorkflows variant.

func ToNormalizedPipeline added in v0.3.0

func ToNormalizedPipeline(
	projectPath string,
	defaultBranch string,
	ciConfigPath string,
	origin *GitlabPipelineOriginData,
	images *GitlabPipelineImageData,
	protection *GitlabProtectionAnalysisData,
) *ir.NormalizedPipeline

ToNormalizedPipeline projects the GitLab collector outputs onto a provider-agnostic IR. Phase 1b: only the fields required by the first rule ported to Rego (image/mutable_tag) are mapped. Additional fields (services, includes, branch protection, etc.) will be filled in as each rule is migrated.

This function is pure: no I/O, no external state. It is safe to call from tests with hand-built fixtures.

func TotalProgressStepsForPipeline added in v0.3.0

func TotalProgressStepsForPipeline(pipeline *ir.NormalizedPipeline) int

TotalProgressStepsForPipeline returns the grand total the caller (RunGitHubAnalysis / RunGitHubAnalysisRemote) should use when emitting its own progress updates for the post-scan phases, so the bar stays in sync with what the collector already reported.

Layout in slots, both modes:

1                    "Scanning" (local) or "Listing" (remote)
2..(1+N)             per-file fetch ticks (remote only;
                     WorkflowFileCount is 0 in local mode)
(2+N)..(1+N+M)       per-action enrichment ticks (M = unique refs)
(2+N+M)              "Resolving branch protection"
(3+N+M)              "Evaluating policies"
(4+N+M)              "Analysis complete"

Total = N + M + 4. WorkflowFileCount is populated by ScanGitHubWorkflowsRemote; local scans leave it at zero so the formula collapses to M + 4 there.

Types

type BranchFetchOptions added in v0.3.0

type BranchFetchOptions struct {
	// ExactNames are branch names without glob characters. Each is
	// fetched directly via /repos/{owner}/{repo}/branches/{name}; a
	// 404 is treated as "branch doesn't exist on this repo" and
	// silently skipped (the rego rule then has nothing to flag for
	// that name). Duplicates are deduped.
	ExactNames []string

	// Listing, when true, additionally paginates the /branches
	// endpoint (capped at maxBranchListingPages * 100 entries) so
	// wildcard patterns can match. Off by default because the
	// targeted path covers the typical config and avoids the
	// pagination foot-gun.
	Listing bool

	// OnProgress, when non-nil, is invoked at user-meaningful
	// checkpoints during the fetch: each listing page (with a
	// running branches-seen count) and each per-branch protection-
	// detail call. The caller is expected to forward these to its
	// progress spinner as label updates at the same global slot;
	// the messages are short single-line strings already shaped for
	// terminal display. Used by the CLI to keep the bar's label
	// alive during the otherwise-silent "Resolving branch
	// protection" phase on large repos (grafana/grafana has 772
	// branches across 8 listing pages, ~10s of API time).
	OnProgress func(message string)

	// InScope, when non-nil, gates the slow protection-detail calls
	// during the listing pagination: branches for which InScope
	// returns false are still added to the IR (Protected flag
	// preserved from the listing so ISSUE-501 still has the data it
	// needs) but the classic /protection + Rulesets endpoints are
	// skipped for them. Saves hundreds of API calls on repos where
	// the listing returns many protected branches that do not match
	// any of the user's configured namePatterns (grafana's hundreds
	// of `release-X.Y.Z` branches when the config asks for
	// `release/*`, for example). The rego rule applies the same
	// scope check at evaluation time; this just avoids paying for
	// data the rule is going to discard.
	InScope func(name string) bool
}

BranchFetchOptions controls which branches FetchGitHubBranchProtection reaches out for. The split between targeted and listing modes is deliberate: a single `?per_page=100` page on a busy repo (think grafana/grafana with thousands of `dependabot/*` and `release-*` branches) does not necessarily contain `main` — the alphabetical page-1 falls through long before we reach the default branch — so a naive listing silently produces "0 branches to protect" findings on every realistic config. Targeted /branches/{name} bypasses that problem entirely; listing is only used when the user has at least one wildcard pattern (e.g. `release/*`) we cannot enumerate ahead of time.

type GitHubMetadata added in v0.3.0

type GitHubMetadata struct {
	RepoArchived     bool
	RefExists        bool
	RefKind          string
	TagSha           string
	LatestTag        string
	LatestReleaseSha string
	RefIsAmbiguous   bool
	Advisories       []string
}

GitHubMetadata is the facts the API-backed policies need to know about a single `owner/repo@ref` action reference.

  • RepoArchived: the GitHub repo hosting the action is archived.
  • RefExists: the ref (tag / branch / commit SHA) resolves.
  • RefKind: "tag", "branch", "commit", "unknown".
  • TagSha: when RefKind=="tag", the commit SHA the tag currently points at.
  • LatestTag: the repo's newest release tag, "" when the API returns no releases.
  • LatestReleaseSha: the SHA that tag resolves to upstream.
  • RefIsAmbiguous: the ref resolves as BOTH a tag and a branch (ref-confusion).
  • Advisories: security advisory identifiers from the GitHub Advisory Database whose affected version range covers this ref, if any.

Zero value (all fields empty / false) is explicitly "unknown" — it is also what the policies see when the API call failed. They should treat zero value as "I don't know" and stay silent.

type GitHubMetadataClient added in v0.3.0

type GitHubMetadataClient struct {
	// contains filtered or unexported fields
}

GitHubMetadataClient resolves `owner/repo@ref` references against the real GitHub REST API (via github.com/cli/go-gh which reuses the installed `gh` CLI's stored credentials) and caches every answer so the collector never hits the API twice for the same key. Safe for concurrent use.

When `gh` is not authenticated — or go-gh cannot find a token — the client operates in degraded mode: every lookup returns an empty GitHubMetadata and Available() returns false. Policies are expected to key their deny rules on the positive evidence the client surfaces, so the degraded-mode output is a zero-finding run rather than a crash.

func NewGitHubMetadataClient added in v0.3.0

func NewGitHubMetadataClient() *GitHubMetadataClient

NewGitHubMetadataClient builds a client using the gh-CLI auth store. Returns a usable client even when authentication is missing — see Available() to check. Honors the PLUMBER_DISABLE_GITHUB_API env var which short-circuits the client into degraded mode regardless of auth state.

Targets api.github.com by default. For GitHub Enterprise Server instances, use NewGitHubMetadataClientForHost with the GHES API host (e.g. "ghes.example.com" or "ghes.example.com/api/v3").

func NewGitHubMetadataClientForHost added in v0.3.0

func NewGitHubMetadataClientForHost(host string) *GitHubMetadataClient

NewGitHubMetadataClientForHost is the GHES-aware constructor. When host is empty the client targets api.github.com via the default go-gh resolution chain (gh auth, GH_TOKEN, GITHUB_TOKEN). When host is non-empty the client is bound to that host — pair with a GH_TOKEN (or GH_ENTERPRISE_TOKEN) that has access to the GHES instance.

func (*GitHubMetadataClient) Available added in v0.3.0

func (c *GitHubMetadataClient) Available() bool

Available reports whether the client has a usable gh auth token.

func (*GitHubMetadataClient) Resolve added in v0.3.0

func (c *GitHubMetadataClient) Resolve(ownerRepoRef string) GitHubMetadata

Resolve looks up "owner/repo@ref" and returns what the API told us. Never returns an error — all failures degrade to "unknown" (zero-valued GitHubMetadata). Repeated calls for the same key return the cached value.

func (*GitHubMetadataClient) ResolveTagSha added in v0.3.0

func (c *GitHubMetadataClient) ResolveTagSha(ownerRepo, tag string) string

ResolveTagSha exposes the tag → SHA lookup publicly so the ref-version-mismatch enrichment can query the commented tag without going through the full Resolve() probe chain.

type GitlabPipelineImageData

type GitlabPipelineImageData struct {
	// Gitlab CI configuration
	MergedConf *gitlab.GitlabCIConf
	CiValid    bool
	CiMissing  bool

	// Default image and variables
	DefaultImage string
	InstanceVars map[string]string
	GroupVars    map[string]string
	ProjectVars  map[string]string
	GlobalVars   map[string]string

	// Images found in the pipeline
	Images []GitlabPipelineImageInfo `json:"images"`
}

type GitlabPipelineImageDataCollection

type GitlabPipelineImageDataCollection struct{}

func (*GitlabPipelineImageDataCollection) Run

type GitlabPipelineImageInfo

type GitlabPipelineImageInfo struct {
	Link     string `json:"link"`
	Name     string `json:"image"`
	Tag      string `json:"tag"`
	Registry string `json:"registry"`
	Job      string `json:"job"`
}

type GitlabPipelineImageMetrics

type GitlabPipelineImageMetrics struct {
	Total                      uint `json:"total"`
	IssueUntrusted             uint `json:"issueUntrusted"`
	IssueUntrustedDismissed    uint `json:"issueUntrustedDismissed"`
	IssueForbiddenTag          uint `json:"issueForbiddenTag"`
	IssueForbiddenTagDismissed uint `json:"issueForbiddenTagDismissed"`
}

type GitlabPipelineJobData

type GitlabPipelineJobData struct {
	Name         string   `json:"name"`
	Extends      []string `json:"extends"`
	Lines        int      `json:"lines"`
	IsHardocded  bool     `json:"isHardcoded"`
	IsOverridden bool     `json:"isOverridden"`
}

type GitlabPipelineJobGitlabComponent

type GitlabPipelineJobGitlabComponent struct {
	RepoFullPath           string `json:"repoFullPath"`
	RepoWebPath            string `json:"repoWebPath"`
	RepoName               string `json:"repoName"`
	ComponentName          string `json:"componentName"`
	ComponentLatestVersion string `json:"componentLatestVersion"`
	ComponentIncludePath   string `json:"componentIncludePath"`
}

GitlabPipelineJobGitlabComponent represents a GitLab component

type GitlabPipelineJobPlumberOrigin added in v0.1.31

type GitlabPipelineJobPlumberOrigin struct {
	ID                uint   `json:"id"`
	Path              string `json:"path"`
	LatestVersion     string `json:"latestVersion"`
	RepoDefaultBranch string `json:"repoDefaultBranch"`
}

GitlabPipelineJobPlumberOrigin represents a Plumber template origin

type GitlabPipelineOriginData

type GitlabPipelineOriginData struct {

	// Gitlab CI catalog data
	GitlabCatalogResources    []gitlab.CICatalogResource
	GitlabCatalogComponentMap map[string]int      // path -> index in catalogResources
	VersionMap                map[string][]string // path -> []versions (newest first)

	// Gitlab CI configuration
	Conf            *gitlab.GitlabCIConf
	ConfString      string
	MergedConf      *gitlab.GitlabCIConf
	MergedResponse  *gitlab.MergedCIConfResponse
	CiValid         bool
	CiMissing       bool
	CiErrors        []string // Specific CI config errors for output
	LimitedAnalysis bool

	// Origins and jobs data
	Origins []GitlabPipelineOriginDataFull

	// CI conf content
	JobMap              map[string]*GitlabPipelineJobData
	JobExtendsMap       map[string][]string
	JobHardcodedMap     map[string]bool
	JobHardcodedContent map[string]interface{}
}

type GitlabPipelineOriginDataCollection

type GitlabPipelineOriginDataCollection struct{}

func (*GitlabPipelineOriginDataCollection) Run

type GitlabPipelineOriginDataFull

type GitlabPipelineOriginDataFull struct {
	// Origin data generic and specific
	GitlabPipelineOriginDataGeneric
	GitlabPipelineOriginDataProjectSpecific
}

type GitlabPipelineOriginDataGeneric

type GitlabPipelineOriginDataGeneric struct {
	OriginType          string                           `json:"originType"`
	FromPlumber         bool                             `json:"fromPlumber"`
	FromGitlabCatalog   bool                             `json:"fromGitlabCatalog"`
	PlumberOrigin       GitlabPipelineJobPlumberOrigin   `json:"plumberOrigin"`
	GitlabIncludeOrigin gitlab.IncludeOriginWithoutRef   `json:"gitlabIncludeOrigin"`
	GitlabComponent     GitlabPipelineJobGitlabComponent `json:"gitlabComponent"`
	OriginHash          uint64                           `json:"originHash"`
}

type GitlabPipelineOriginDataProjectSpecific

type GitlabPipelineOriginDataProjectSpecific struct {
	// Data specific to this project
	Version  string `json:"version"`
	UpToDate bool   `json:"upToDate"`
	Nested   bool   `json:"nested"`

	// Job related data
	Jobs []GitlabPipelineJobData `json:"jobs"`
}

type GitlabPipelineOriginMetrics

type GitlabPipelineOriginMetrics struct {

	// Data metrics: jobs
	JobTotal     uint `json:"jobTotal"`
	JobHardcoded uint `json:"jobHardcoded"`

	// Data metrics: origin
	OriginTotal         uint `json:"originTotal"`
	OriginComponent     uint `json:"originComponent"`
	OriginLocal         uint `json:"originLocal"`
	OriginProject       uint `json:"originProject"`
	OriginRemote        uint `json:"originRemote"`
	OriginTemplate      uint `json:"originTemplate"`
	OriginGitLabCatalog uint `json:"originGitLabCatalog"`
	OriginOutdated      uint `json:"originOutdated"`
}

type GitlabProtectionAnalysisData

type GitlabProtectionAnalysisData struct {
	Branches           []string                    `json:"branches"`
	BranchProtections  []gitlab.BranchProtection   `json:"branchProtections"`
	MRApprovalRules    []*glab.ProjectApprovalRule `json:"mrApprovalRules"`
	MRApprovalSettings *glab.ProjectApprovals      `json:"mrApprovalSettings"`
	MRSettings         *glab.Project               `json:"mrSettings"`
	ProjectMembers     []gitlab.GitlabMemberInfo   `json:"projectMembers"`
}

GitlabProtectionAnalysisData holds all the data needed by protection controls

type GitlabProtectionData

type GitlabProtectionData struct {
	Branches []*GitlabProtectionDataBranch `json:"branches"`
}

GitlabProtectionData holds the collected protection data

type GitlabProtectionDataBranch

type GitlabProtectionDataBranch struct {
	BranchName string `json:"branchName"`
	Default    bool   `json:"default"`
}

GitlabProtectionDataBranch holds branch information

type GitlabProtectionDataCollection

type GitlabProtectionDataCollection struct{}

GitlabProtectionDataCollection handles protection data collection

func (*GitlabProtectionDataCollection) Run

Run fetches all GitLab protection data needed by the controls

type GitlabProtectionMetrics

type GitlabProtectionMetrics struct {
	Branches int `json:"branches"`
}

GitlabProtectionMetrics holds metrics about protection data

type ProgressFunc added in v0.3.0

type ProgressFunc func(step, total int, message string)

ProgressFunc is the signature callers use to observe the progress of long-running collector operations — currently the GitHub API enrichment phase.

type RuleParameters added in v0.3.0

type RuleParameters struct {
	// pull_request rule
	RequireCodeOwnerReview bool `json:"require_code_owner_review,omitempty"`
}

RuleParameters is the union of parameter shapes across rule types we care about. JSON unmarshal populates only the fields present in the source — extras are silently ignored.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL