kollect

module

v0.6.0-rc.1 Latest Latest Go to latest Published: Jun 9, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/konih/kollect

Links

Open Source Insights

README ¶

Kollect

Your cluster, in Git, diffable. Declare GVK + CEL in CRDs and get a clean, Git-committed inventory of anything running in your cluster — no scripts, no apiserver hammering. When the cluster changes, the inventory commits change; git log is your audit trail and git diff is your drift report. The same snapshot fans out to Postgres, object stores, and event streams — consumers read export data, never unbounded list/watch against the live cluster.

Kubernetes is the source of truth for what is running; it is a poor system of record for stakeholder inventory. Kollect closes that gap: select resources by GVK → extract attributes (CEL or JSONPath) → aggregate across targets → debounce → export to pluggable sinks. Inventory is configuration, not code — owned per team in its own namespace, GitOps-friendly from day one.

Read the docs: konih.github.io/kollect — architecture, quick start, CR reference, ADRs, and examples. This README is the front door; the site is the map.

Pre-beta. APIs and defaults may change until the first release candidate. See the roadmap for current status.

Why Kollect?

Decoupled read model — consumers query a sink, not the apiserver. No RBAC blast radius, no watch-storm risk, no etcd size limits (why).
Event-driven, no polling — one shared informer per GVK keeps inventory current as the cluster changes (ADR-0301).
Schema-flexible — declare the attributes you want in a KollectProfile; no bespoke collector per resource kind.
Pluggable sinks, no privileged backend — the same snapshot fans out to Git, Postgres, object store, or an event stream (sink taxonomy).
Multi-tenant by design — KollectScope gates which teams, namespaces, and sinks each tenant may use.
Fleet-ready — N single-mode operators → one shared sink, partitioned by spec.cluster; no central hub tier to operate (ADR-0501).
Built for scale — a 10,000-row baseline validated in CI, a 100,000-row design target per cluster with export sharding, plus tunable reconcile/dispatch concurrency (performance).

See it end-to-end

A real pipeline is a handful of Kubernetes resources. This is the Deployment-inventory walkthrough — collect container images from Deployments and export them to Postgres (for portals) and Git (for audit) at the same time:

flowchart LR
  Profile["<b>KollectProfile</b><br/>Deployment schema"]
  Target["<b>KollectTarget</b><br/>select Deployments"]
  Inv["<b>KollectInventory</b><br/>aggregate · debounce · export"]
  Snap["<b>KollectSnapshotSink</b>"]
  Db["<b>KollectDatabaseSink</b>"]
  Ev["<b>KollectEventSink</b>"]
  K8s[("Kubernetes API")]

  Profile --> Target
  K8s -- "informer per GVK" --> Target
  Target --> Inv
  Inv --> Snap
  Inv --> Db
  Inv --> Ev
  Snap --> SnapOut["Git · GitLab · S3 · GCS"]
  Db --> DbOut["Postgres · MongoDB"]
  Ev --> EvOut["Kafka"]

Quick start (MVP)

Spin up the full pipeline on a local kind cluster in one command (needs Docker, kind, kubectl, and Task):

git clone https://github.com/konih/kollect.git && cd kollect
task dev-up                       # build, create kind cluster, install operator + sample CRs
kubectl get kinv,ktgt,ksnap,kdb -A    # watch the pipeline come up

task dev-up builds the manager, boots a kollect-dev kind cluster, installs the operator, and applies the sample Profile → Sink → Target → Inventory pipeline. Watch the KollectInventory Ready condition, then read your sink — the live demo repo shows what the Git export looks like.

Full walkthrough — prerequisites, Helm install, maturity notes: Quick start →

How it works

Kollect operator pipeline from Kubernetes API through shared informers, in-memory collect store, and debounced KollectInventory export to Git, GitLab, S3, GCS, Postgres, MongoDB, and Kafka sink projections.

The in-memory snapshot per inventory is canonical; every sink is a projection of it — no single backend is privileged (sink roles). Sinks are split into three CRD families (ADR-0414):

Sink family	Examples	Good for
`KollectSnapshotSink`	Git, GitLab, S3, GCS	Audit, diff, GitOps-friendly history
`KollectDatabaseSink`	Postgres, MongoDB	Rich queries for portals and dashboards
`KollectEventSink`	Kafka	Change streams, downstream consumers

Full payload lives in sinks; CR .status holds summaries only (etcd limits).

Performance

Kollect is built for large single clusters and multi-cluster fleets, with honest, tested targets (ADR-0603):

Tier	Scope	Collected rows	Status
Baseline	1 cluster	10,000+	Validated in nightly load tests
Design target	1 cluster	100,000	Requires export sharding + Postgres bulk upsert + `resourcesProfile: large`
Fleet	Shared sink	10k–100k × N operators	Partitioned by `spec.cluster`; no hub merge tier

Tuning knobs — reconcile/dispatch concurrency, export debounce (exportMinInterval, default 30s), namespace-scoped informers, Git commit fingerprinting, and maxExportBytes caps — are catalogued in the performance guide.

Learn more

Topic	Link
Problem statement, CRD model, reconciliation	Architecture
Locked platform decisions	Platform decisions
CR fields, RBAC, failure modes	CR reference
Multi-cluster fleet	ADR-0501
Sink taxonomy (state vs stream)	ADR-0401
Build-order phases and status	Roadmap
Examples index	Examples
Example: Deployment → Git export	Walkthrough
Live demo inventory (Git sink)	kollect-inventory-demo

Developers: run task lint, task test, and task verify before opening a PR — CONTRIBUTING.md.

Community


Contributing	CONTRIBUTING.md — DCO, PR workflow, good first tasks
Code of Conduct	CODE_OF_CONDUCT.md — Contributor Covenant v2.1
Governance	GOVERNANCE.md — roles, decisions, continuity

Security

Report vulnerabilities privately — see SECURITY.md. Security architecture: docs/ASSURANCE-CASE.md.

License

Directories ¶

Path	Synopsis
api
v1alpha1 Package v1alpha1 contains API Schema definitions for the v1alpha1 API group.	Package v1alpha1 contains API Schema definitions for the v1alpha1 API group.
cmd
internal
aggregate Package aggregate holds cross-target rollup helpers for Phase 4 (ADR-0304).	Package aggregate holds cross-target rollup helpers for Phase 4 (ADR-0304).
collect
controller
digest
errors Package errors provides typed reconcile error classes (ADR-0602).	Package errors provides typed reconcile error classes (ADR-0602).
export Package export defines the versioned inventory export data contract (ADR-0405).	Package export defines the versioned inventory export data contract (ADR-0405).
inventory
metrics
operator
pathvalidate Package pathvalidate holds shared relative-path rules for Git and object-store export paths.	Package pathvalidate holds shared relative-path rules for Git and object-store export paths.
scope
sink
sink/bigquery
sink/cap Package cap holds sink capability types shared by the registry and backends without import cycles.	Package cap holds sink capability types shared by the registry and backends without import cycles.
sink/gcs
sink/git
sink/gitlab
sink/kafka
sink/layout Package layout projects an inventory snapshot into the readable file tree written by Git/GitLab snapshot sinks (ADR-0419).	Package layout projects an inventory snapshot into the readable file tree written by Git/GitLab snapshot sinks (ADR-0419).
sink/mongodb
sink/nats
sink/objectstore Package objectstore holds shared helpers for Git/S3/GCS snapshot path layout (ADR-0401, ADR-0407).	Package objectstore holds shared helpers for Git/S3/GCS snapshot path layout (ADR-0401, ADR-0407).
sink/parquet Package parquet encodes inventory snapshots to Parquet (ADR-0401 hybrid schema, Q11).	Package parquet encodes inventory snapshots to Parquet (ADR-0401 hybrid schema, Q11).
sink/postgres
sink/preview Package preview renders read-only sink implications without side effects (ADR-0416).	Package preview renders read-only sink implications without side effects (ADR-0416).
sink/s3
validation
webhook/v1alpha1
test
schema

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL