k8s-resource-inspector

command

v0.2.0 Latest Latest Go to latest Published: Apr 26, 2026 License: MIT Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/LiveViewTech/platform-lab

Links

Open Source Insights

README ¶

k8s-resource-inspector (kri)

Analyzes Kubernetes workload resource utilization by combining ArgoCD Application CRs with Prometheus metrics. Classifies pod behavior, emits right-sizing recommendations, and can open PRs to apply them.

Installation

go install github.com/LiveViewTech/platform-lab/tools/k8s-resource-inspector@latest

Or build from source:

go build -o ~/go/bin/kri ./tools/k8s-resource-inspector/

Configuration

~/.kri/config.yaml:

clusters:
  - argocd_cluster: in-cluster   # matches spec.destination.name in ArgoCD Application CR
    prometheus: http://localhost:9090

# Optional: namespace where ArgoCD Application CRs live (default: "argocd")
argocd_namespace: argocd

# Optional: floor values for recommendations (defaults shown)
minimums:
  cpu_millicores: 10
  memory_mi: 16

# Optional: git committer identity for kri-authored commits (defaults shown)
git:
  author_name: kri
  author_email: kri@noreply.local

# Optional: GitHub settings (defaults shown)
github:
  base_branch: main
  api_url: https://api.github.com   # override for GitHub Enterprise Server

Usage

Inspect

kri inspect [flags]

Flags:
  --window string       Observation window for Prometheus queries (default "7d")
  --confidence float    Minimum confidence threshold for recommendations (default 0.8)
  --findings-only       Only show workloads with recommendations or HPA warnings
  --app string          Filter to a single ArgoCD application by name
  -n, --namespace string  Filter to a specific namespace
  -o, --output string   Output format: table (default) or json
  --kubeconfig string   Path to kubeconfig (defaults to KUBECONFIG env / ~/.kube/config)
  --context string      Kubeconfig context to use
  --config string       Path to kri config file (default ~/.kri/config.yaml)

Plan and apply

kri can open one PR per app containing a values-resources.yaml file with the recommended resource changes. See Apply workflow for setup.

Two-step (recommended):

kri plan               # generates kri-plan.yaml
# edit kri-plan.yaml — set apply: false to skip an app, adjust values
kri apply              # prints summary, prompts for confirmation, opens PRs
kri apply --dry-run    # shows what would happen without opening PRs

One-shot:

kri apply --all        # runs inspect, prints table + findings, prompts, opens PRs
kri apply --all --dry-run

kri plan [flags]
  --window string     Observation window (default "7d")
  --confidence float  Confidence threshold (default 0.8)
  --dir string        Directory to write kri-plan.yaml (default: current directory)

kri apply [flags]
  --all               Run inspect pipeline instead of reading kri-plan.yaml
  --dry-run           Show what would be applied without opening PRs
  --dir string        Directory to read kri-plan.yaml from (default: current directory)
  --window string     Observation window (only with --all, default "7d")
  --confidence float  Confidence threshold (only with --all, default 0.8)

GITHUB_TOKEN must be set in the environment when running kri apply (not required for --dry-run).

Apply workflow

kri writes resource changes to a separate values-resources.yaml file alongside each app's main values.yaml. This keeps kri's changes isolated and avoids YAML roundtrip issues with the main config.

One-time ArgoCD setup: Add values-resources.yaml to your AppSet's valueFiles list with ignoreMissingValueFiles: true. The file is picked up automatically once kri creates it; before that ArgoCD silently ignores the missing file.

# In your ApplicationSet template
spec:
  source:
    helm:
      valueFiles:
        - values.yaml
        - values-resources.yaml
      ignoreMissingValueFiles: true

Values file format (values-resources.yaml — do not edit manually):

Single-container apps:

# Generated by kri — do not edit manually
resources:
  requests:
    cpu: 10m
    memory: 16Mi
  limits:
    cpu: 10m
    memory: 16Mi

Multi-container apps use a containers: list with per-container resources blocks.

Note: When requests == limits (Guaranteed QoS), kri updates both together to preserve the QoS class.

Output

Table columns

Column	Description
APP	ArgoCD Application CR name (`metadata.name`). Typically matches the Helm release name unless `spec.source.helm.releaseName` is set explicitly.
CLUSTER	ArgoCD destination cluster name (`spec.destination.name`)
NAMESPACE	Pod namespace
POD	Pod name
CONTAINER	Container name
CPU_REQ	CPU request from kube-state-metrics
CPU_P95	p95 CPU usage over the observation window
CPU_P99	p99 CPU usage over the observation window
MEM_REQ	Memory request from kube-state-metrics
MEM_P95	p95 memory working set over the observation window
MEM_P99	p99 memory working set over the observation window
MEM/LIM	p99 memory as a percentage of the memory limit
BEHAVIOR	Classified behavior (see below)
CONF	Classification confidence
HPA	HPA validation result: `-`, `OK`, `WARN`, or `ERROR`
REC	Recommendation flag: `-` (none), `ok` (within tolerance), `YES` (actionable), `hold`

Rows flagged with WARN/ERROR in HPA or YES in REC are expanded in the Findings block printed below the table.

Behavior classes

Class	Meaning
STATIC	Low, stable utilization — good candidate for right-sizing
SPIKY	High p99/p50 ratio — bursting workload
GROWTH	Sustained memory trend upward toward the limit
RUNAWAY	Memory p99 at or near limit — OOM risk
MIXED	Pods within the same workload disagree — investigate divergence before acting
UNKNOWN	Insufficient data

Classification thresholds:

RUNAWAY: mem p99 ≥ 90% of limit
SPIKY: CPU p99/p50 ≥ 2.0 or mem p99/p50 ≥ 1.8
GROWTH: trend > 1% of mem p50/hr AND mem p99 ≥ 30% of limit (pods well within their limit are not classified GROWTH, to avoid trend noise on idle workloads)
STATIC: CPU p99/p50 < 1.5 AND mem p99/p50 < 1.3 AND flat trend

Recommendations add headroom above observed p99: +20% for CPU (rounded up to nearest 10m), +30% for memory (rounded up to nearest Mi). A change is only emitted when the recommended value differs from the current request by more than 10%.

HPA validation

Check	Condition	Severity
CPU request missing	HPA targets CPU but no CPU request set	ERROR
Memory request missing	HPA targets memory but no memory request set	ERROR
Target utilization too high	HPA target % above p95 actual utilization	WARN
Target utilization too low	HPA target % well below p50	WARN
Min replicas too low	minReplicas = 1 on a SPIKY workload	WARN
Max replicas too low	maxReplicas hit in Prometheus history	WARN
Scaling metric mismatch	CPU HPA on a memory-bound workload	WARN

Data sources

Pod inventory: kube_pod_container_resource_requests, kube_pod_container_resource_limits, kube_pod_status_phase from kube-state-metrics
Workload resolution: kube_pod_owner (pod → ReplicaSet) + kube_replicaset_owner (RS → Deployment) chain
CPU usage: rate(container_cpu_usage_seconds_total[5m]) quantiles via quantile_over_time
Memory usage: container_memory_working_set_bytes quantiles via quantile_over_time
Memory trend: deriv(container_memory_working_set_bytes[window]) * 3600 (bytes/hour)
HPA config: autoscaling/v2 HorizontalPodAutoscaler resources via Kubernetes API
Values files: Helm values read directly from git via shallow clone

kri-operator

kri logic is automated by kri-operator, a Kubernetes operator that runs the inspect → plan → apply workflow on a schedule and posts rollback diagnosis reports to Slack. The CLI remains fully functional as a developer and debug interface into the same underlying logic.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cmd
pkg
analysis
argo
config
git
github
gitops
hpa
inspect
metrics
output
plan
pods

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL