kollect

module
v0.6.0-rc.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 9, 2026 License: MIT

README

OpenSSF Scorecard OpenSSF Best Practices Documentation CI Preflight
Docs CodeQL Release codecov
Go Container

Kollect

Your cluster, in Git, diffable. Declare GVK + CEL in CRDs and get a clean, Git-committed inventory of anything running in your cluster — no scripts, no apiserver hammering. When the cluster changes, the inventory commits change; git log is your audit trail and git diff is your drift report. The same snapshot fans out to Postgres, object stores, and event streams — consumers read export data, never unbounded list/watch against the live cluster.

Kubernetes is the source of truth for what is running; it is a poor system of record for stakeholder inventory. Kollect closes that gap: select resources by GVK → extract attributes (CEL or JSONPath) → aggregate across targets → debounceexport to pluggable sinks. Inventory is configuration, not code — owned per team in its own namespace, GitOps-friendly from day one.

Read the docs: konih.github.io/kollect — architecture, quick start, CR reference, ADRs, and examples. This README is the front door; the site is the map.

Pre-beta. APIs and defaults may change until the first release candidate. See the roadmap for current status.

Why Kollect?

  • Decoupled read model — consumers query a sink, not the apiserver. No RBAC blast radius, no watch-storm risk, no etcd size limits (why).
  • Event-driven, no polling — one shared informer per GVK keeps inventory current as the cluster changes (ADR-0301).
  • Schema-flexible — declare the attributes you want in a KollectProfile; no bespoke collector per resource kind.
  • Pluggable sinks, no privileged backend — the same snapshot fans out to Git, Postgres, object store, or an event stream (sink taxonomy).
  • Multi-tenant by designKollectScope gates which teams, namespaces, and sinks each tenant may use.
  • Fleet-readyN single-mode operators → one shared sink, partitioned by spec.cluster; no central hub tier to operate (ADR-0501).
  • Built for scale — a 10,000-row baseline validated in CI, a 100,000-row design target per cluster with export sharding, plus tunable reconcile/dispatch concurrency (performance).

See it end-to-end

A real pipeline is a handful of Kubernetes resources. This is the Deployment-inventory walkthrough — collect container images from Deployments and export them to Postgres (for portals) and Git (for audit) at the same time:

flowchart LR
  Profile["<b>KollectProfile</b><br/>Deployment schema"]
  Target["<b>KollectTarget</b><br/>select Deployments"]
  Inv["<b>KollectInventory</b><br/>aggregate · debounce · export"]
  Snap["<b>KollectSnapshotSink</b>"]
  Db["<b>KollectDatabaseSink</b>"]
  Ev["<b>KollectEventSink</b>"]
  K8s[("Kubernetes API")]

  Profile --> Target
  K8s -- "informer per GVK" --> Target
  Target --> Inv
  Inv --> Snap
  Inv --> Db
  Inv --> Ev
  Snap --> SnapOut["Git · GitLab · S3 · GCS"]
  Db --> DbOut["Postgres · MongoDB"]
  Ev --> EvOut["Kafka"]

Quick start (MVP)

Spin up the full pipeline on a local kind cluster in one command (needs Docker, kind, kubectl, and Task):

git clone https://github.com/konih/kollect.git && cd kollect
task dev-up                       # build, create kind cluster, install operator + sample CRs
kubectl get kinv,ktgt,ksnap,kdb -A    # watch the pipeline come up

task dev-up builds the manager, boots a kollect-dev kind cluster, installs the operator, and applies the sample Profile → Sink → Target → Inventory pipeline. Watch the KollectInventory Ready condition, then read your sink — the live demo repo shows what the Git export looks like.

Full walkthrough — prerequisites, Helm install, maturity notes: Quick start →

How it works

Kollect operator pipeline from Kubernetes API through shared informers, in-memory collect store, and debounced KollectInventory export to Git, GitLab, S3, GCS, Postgres, MongoDB, and Kafka sink projections.

The in-memory snapshot per inventory is canonical; every sink is a projection of it — no single backend is privileged (sink roles). Sinks are split into three CRD families (ADR-0414):

Sink family Examples Good for
KollectSnapshotSink Git, GitLab, S3, GCS Audit, diff, GitOps-friendly history
KollectDatabaseSink Postgres, MongoDB Rich queries for portals and dashboards
KollectEventSink Kafka Change streams, downstream consumers

Full payload lives in sinks; CR .status holds summaries only (etcd limits).

Performance

Kollect is built for large single clusters and multi-cluster fleets, with honest, tested targets (ADR-0603):

Tier Scope Collected rows Status
Baseline 1 cluster 10,000+ Validated in nightly load tests
Design target 1 cluster 100,000 Requires export sharding + Postgres bulk upsert + resourcesProfile: large
Fleet Shared sink 10k–100k × N operators Partitioned by spec.cluster; no hub merge tier

Tuning knobs — reconcile/dispatch concurrency, export debounce (exportMinInterval, default 30s), namespace-scoped informers, Git commit fingerprinting, and maxExportBytes caps — are catalogued in the performance guide.

Learn more

Topic Link
Problem statement, CRD model, reconciliation Architecture
Locked platform decisions Platform decisions
CR fields, RBAC, failure modes CR reference
Multi-cluster fleet ADR-0501
Sink taxonomy (state vs stream) ADR-0401
Build-order phases and status Roadmap
Examples index Examples
Example: Deployment → Git export Walkthrough
Live demo inventory (Git sink) kollect-inventory-demo

Developers: run task lint, task test, and task verify before opening a PR — CONTRIBUTING.md.

Community

Contributing CONTRIBUTING.md — DCO, PR workflow, good first tasks
Code of Conduct CODE_OF_CONDUCT.md — Contributor Covenant v2.1
Governance GOVERNANCE.md — roles, decisions, continuity

Security

Report vulnerabilities privately — see SECURITY.md. Security architecture: docs/ASSURANCE-CASE.md.

License

Copyright (c) 2026 Konrad Heimel. Licensed under the MIT License.

Directories

Path Synopsis
api
v1alpha1
Package v1alpha1 contains API Schema definitions for the v1alpha1 API group.
Package v1alpha1 contains API Schema definitions for the v1alpha1 API group.
internal
aggregate
Package aggregate holds cross-target rollup helpers for Phase 4 (ADR-0304).
Package aggregate holds cross-target rollup helpers for Phase 4 (ADR-0304).
errors
Package errors provides typed reconcile error classes (ADR-0602).
Package errors provides typed reconcile error classes (ADR-0602).
export
Package export defines the versioned inventory export data contract (ADR-0405).
Package export defines the versioned inventory export data contract (ADR-0405).
pathvalidate
Package pathvalidate holds shared relative-path rules for Git and object-store export paths.
Package pathvalidate holds shared relative-path rules for Git and object-store export paths.
sink/cap
Package cap holds sink capability types shared by the registry and backends without import cycles.
Package cap holds sink capability types shared by the registry and backends without import cycles.
sink/layout
Package layout projects an inventory snapshot into the readable file tree written by Git/GitLab snapshot sinks (ADR-0419).
Package layout projects an inventory snapshot into the readable file tree written by Git/GitLab snapshot sinks (ADR-0419).
sink/objectstore
Package objectstore holds shared helpers for Git/S3/GCS snapshot path layout (ADR-0401, ADR-0407).
Package objectstore holds shared helpers for Git/S3/GCS snapshot path layout (ADR-0401, ADR-0407).
sink/parquet
Package parquet encodes inventory snapshots to Parquet (ADR-0401 hybrid schema, Q11).
Package parquet encodes inventory snapshots to Parquet (ADR-0401 hybrid schema, Q11).
sink/preview
Package preview renders read-only sink implications without side effects (ADR-0416).
Package preview renders read-only sink implications without side effects (ADR-0416).
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL