cluster-code-coverage-analysis

module

v0.0.0-...-a542726 Latest Latest Go to latest Published: Feb 28, 2026 License: Apache-2.0

README ¶

coverage-collector CLI

A command-line tool for processing Go code coverage collected from running applications across an entire OpenShift/Kubernetes cluster. It downloads coverage data from S3, compiles it into a SQLite database, clones source repositories, and generates interactive HTML reports with annotated source code.

A "collection" can span multiple cluster lifecycles and is not tied to a single cluster instance.

Installation

Build from source

go build -o coverage-collector ./cmd/coverage-collector

Or install directly:

go install github.com/jupierce/cluster-code-coverage-analysis/cmd/coverage-collector@latest

How It Works

Go binaries built with -cover can be linked with a small HTTP coverage server that serves binary coverage data on a well-known port (default 53700+). A coverage producer running on the cluster collects this data and uploads it to S3. This tool downloads that data and processes it into interactive HTML reports.

Coverage data is cumulative: Go's in-process counters record every code path executed since the process started. Multiple collections accumulate additional covcounters.* files, so no data is lost. When the data is compiled and rendered, all counter files are merged to produce combined reports.

Lines originating from coverage_server.go (the embedded HTTP server) are automatically filtered out of all reports.

Workflow

The primary workflow uses the cluster command group. Run these subcommands in order:

coverage-collector cluster download      --collection <name>  # Download from S3
coverage-collector cluster compile       --collection <name>  # Build SQLite DB
coverage-collector cluster clone-sources --collection <name>  # Clone source repos
coverage-collector cluster render        --collection <name>  # Generate HTML

1. `cluster download`

Download coverage data (covmeta and covcounters files) from an S3 bucket. Generates metadata.json for each coverage entry from S3 path components.

coverage-collector cluster download --collection my-collection \
  --bucket art-ocp-code-coverage \
  --prefix openshift-ci/coverage \
  --profile saml \
  --region us-east-1

# Skip already-downloaded entries
coverage-collector cluster download --collection my-collection \
  --bucket art-ocp-code-coverage \
  --prefix openshift-ci/coverage \
  --skip-existing

Flags:

Flag	Default	Description
`--bucket`	(required)	S3 bucket name
`--prefix`	(required)	S3 path prefix
`--profile`		AWS CLI profile
`--region`		AWS region
`--skip-existing`	false	Skip entries that already have local data

2. `cluster compile`

Process raw coverage data into an SQLite database. This step:

Converts binary coverage to text format via go tool covdata textfmt
Filters out coverage_server.go lines
Groups reports by owner (Deployment, DaemonSet, StatefulSet, Job, Host, Pod)
Merges coverage from multiple pods of the same owner/binary
Resolves source repository URLs from info.json files and image labels/env vars
Computes per-file coverage statistics

Change detection uses MD5 hashes; only changed reports are reprocessed.

# Incremental compile
coverage-collector cluster compile --collection my-collection

# Force full recompilation
coverage-collector cluster compile --collection my-collection --update '*'

# Force recompilation for a specific namespace
coverage-collector cluster compile --collection my-collection \
  --update 'namespace=openshift-apiserver'

Flags:

Flag	Default	Description
`--update`		Force recomputation (repeatable, AND logic). Use `'*'` for all, or `field=glob` for `namespace`, `node`, `container`, `image`

3. `cluster clone-sources`

Clone source repositories identified during compile. Uses source_url and source_commit from the image_sources table to clone at the exact commit.

coverage-collector cluster clone-sources --collection my-collection

Flags:

Flag	Default	Description
`--skip-existing`	true	Skip already-cloned repositories

4. `cluster render`

Generate HTML reports from the compiled database. Produces one HTML report per unique binary (identified by covmeta hash), plus an interactive index.html.

Multiple owners running the same binary share a single HTML report. The index shows all owners with their individual metadata, linking to the shared report.

# Render to the default location (<collection>/html/)
coverage-collector cluster render --collection my-collection

# Render to a custom directory
coverage-collector cluster render --collection my-collection \
  --output-dir my-collection/html-post-e2e

# Only generate the index (skip per-binary HTML)
coverage-collector cluster render --collection my-collection \
  --skip-component-html

Flags:

Flag	Default	Description
`--output-dir`	`<collection>/html`	Output directory for HTML reports
`--skip-component-html`	false	Only generate the index

Shared cluster flags

These flags apply to all cluster subcommands:

Flag	Default	Description
`--collection`	(required)	Collection name; also used as the working directory
`--verbosity`	info	Log verbosity: `error`, `info`, `debug`, `trace`
`--max-concurrency`	8	Maximum concurrent operations

Directory Structure

After a full run, the collection directory looks like:

<collection>/
  coverage/                 # Raw coverage data from S3
    <ns>-<pod>-<container>/
      metadata.json         # Pod/container/binary info
      info.json             # Source URL/commit (from S3 producer)
      covmeta.<hash>        # Coverage metadata (deterministic per binary)
      covcounters.<hash>    # Coverage counters (unique per collection)
  coverage.db               # SQLite database (~7GB for a full cluster)
  repos/                    # Cloned source repositories
    github.com/<org>/<repo>/<commit-prefix>/
  html/                     # Generated HTML reports
    index.html              # Interactive dashboard
    <hash>.html             # Per-binary coverage reports (named by covmeta hash)
  logs/                     # Timestamped log files

Interactive Index

The generated index.html provides:

Search by namespace, owner name, container, or binary
Filter by namespace, owner type, or coverage level
Sort by any column (namespace, owner, container/binary, coverage %, statements)
Color-coded coverage: Excellent (>=70%), Good (>=50%), Moderate (>=30%), Poor (>=15%), Critical (<15%)
Click-through to per-binary HTML reports with annotated source code
Expandable rows showing pods, hosts, and image details
Checkbox filters to hide e2e-* and openshift-must-gather-* namespaces (checked by default)
Deduplicated stats: Overall coverage percentages are computed by unique binary hash, so the same binary running in multiple owners is only counted once

Per-Binary HTML Reports

Each <hash>.html report includes:

Collapsible header listing all owner groups that run this binary (namespace, type, owner, containers, pod count, hosts)
Stat cards with source file count, overall coverage %, total and covered statements
File table with search, coverage level filter, and sortable columns
Source code viewer with line numbers, green/red coverage highlighting, and per-line execution counts
Split view mode for viewing the file list alongside source code
Deep linking via URL hash (#file0, #file1, etc.)
Unresolved files: Files without cloned source show "No source code resolved for this file" with their coverage stats still computed

Owner Grouping

Reports are grouped by owner type, inferred from pod name patterns:

Pattern	Owner Type
`name-<hash>-<5char>`	Deployment
`name-<number>`	StatefulSet
`name-<5char>`	DaemonSet
`installer-`, `pruner-`	Job
Host-level processes	Host
Unrecognized pods	Pod (No Owner)

Owners with the same binary (same covmeta hash) share a single HTML report. This prevents inflated statement counts when the same binary runs in multiple pods with different names (e.g., static pods with per-node names).

Source Repository Resolution

The tool uses a 3-strategy cascade to find source code for annotated reports:

Image labels/env vars (fast): Looks up io.openshift.build.source-location / io.openshift.build.commit.id from container image labels, and __doozer_group / __doozer_key from image environment variables. Also checks info.json files from the coverage producer. Validates that the repo's Go module matches the coverage package path.
Package path matching: Walks cloned repos and scores by Go module prefix match.
Owner name fallback: Matches owner name to repository directory names.

For host binaries (no container image), source info comes from info.json files using synthetic host:<binary_name> keys.

BigQuery Ingest

The bigquery command group persists coverage data from the SQLite database into BigQuery for cross-collection analysis and querying.

# Ingest all coverage data
coverage-collector bigquery --project my-gcp-project --dataset my_dataset \
    ingest --collection my-collection

# Ingest only specific namespaces
coverage-collector bigquery --project my-gcp-project --dataset my_dataset \
    ingest --collection my-collection \
    --namespace 'openshift-apiserver' --namespace 'openshift-etcd'

# Filter by owner name
coverage-collector bigquery --project my-gcp-project --dataset my_dataset \
    ingest --collection my-collection \
    --owner 'kube-apiserver' --owner 'etcd'

Flags (bigquery group):

Flag	Default	Description
`--project`	(required)	GCP project ID
`--dataset`	(required)	BigQuery dataset name

Flags (ingest subcommand):

Flag	Default	Description
`--collection`	(required)	Collection name (same as cluster subcommands)
`--namespace`	`*`	Namespace glob filter (repeatable, OR logic)
`--owner`	`*`	Owner name glob filter (repeatable, OR logic)

BigQuery Tables

Two tables are created automatically:

coverage_data: One row per source line per binary. Includes source code text, line number, and execution count. Partitioned by ingestion_time, clustered by (binary_hash, collection_id).
coverage_generators: One row per unique binary hash. Lists all owners (namespace, owner, container, binary) as a repeated record. Includes software_group and software_key metadata from the coverage producer. Partitioned by ingestion_time, clustered by (software_group, binary_hash, collection_id, source_url).

Authentication

Uses GCP Application Default Credentials. Run gcloud auth application-default login before using, or set the GOOGLE_APPLICATION_CREDENTIALS environment variable.

Prerequisites

Coverage data in S3: The coverage producer must have uploaded covmeta and covcounters files to the configured S3 bucket.
Go toolchain: Required for go tool covdata textfmt (converting binary coverage to text format).
AWS CLI: Required by download to fetch data from S3.
Git: Required by clone-sources to clone repositories.
oc CLI (optional): Used during compile to inspect container image labels and environment variables for source repository and software group/key info. Falls back to info.json if unavailable.
GCP credentials (for BigQuery ingest): gcloud auth application-default login or GOOGLE_APPLICATION_CREDENTIALS environment variable.

Acknowledgments

The coverage HTTP client was derived from psturc/go-coverage-http.

Directories ¶

Path	Synopsis
cmd
coverage-collector command
pkg
log

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

coverage-collector CLI

Installation

Build from source

How It Works

Workflow

1. cluster download

2. cluster compile

3. cluster clone-sources

4. cluster render

Shared cluster flags

Directory Structure

Interactive Index

Per-Binary HTML Reports

Owner Grouping

Source Repository Resolution

BigQuery Ingest

BigQuery Tables

Authentication

Prerequisites

Acknowledgments

Directories ¶

1. `cluster download`

2. `cluster compile`

3. `cluster clone-sources`

4. `cluster render`