k8s-resource-inspector

command
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2026 License: MIT Imports: 1 Imported by: 0

README

k8s-resource-inspector (kri)

Analyzes Kubernetes workload resource utilization by combining ArgoCD Application CRs with Prometheus metrics. Classifies pod behavior, emits right-sizing recommendations, and can open PRs to apply them.

Installation

go install github.com/LiveViewTech/platform-lab/tools/k8s-resource-inspector@latest

Or build from source:

go build -o ~/go/bin/kri ./tools/k8s-resource-inspector/

Configuration

~/.kri/config.yaml:

clusters:
  - argocd_cluster: in-cluster   # matches spec.destination.name in ArgoCD Application CR
    prometheus: http://localhost:9090

# Optional: namespace where ArgoCD Application CRs live (default: "argocd")
argocd_namespace: argocd

# Optional: floor values for recommendations (defaults shown)
minimums:
  cpu_millicores: 10
  memory_mi: 16

# Optional: git committer identity for kri-authored commits (defaults shown)
git:
  author_name: kri
  author_email: kri@noreply.local

# Optional: GitHub settings (defaults shown)
github:
  base_branch: main
  api_url: https://api.github.com   # override for GitHub Enterprise Server

Usage

Inspect
kri inspect [flags]

Flags:
  --window string       Observation window for Prometheus queries (default "7d")
  --confidence float    Minimum confidence threshold for recommendations (default 0.8)
  --findings-only       Only show workloads with recommendations or HPA warnings
  --app string          Filter to a single ArgoCD application by name
  -n, --namespace string  Filter to a specific namespace
  -o, --output string   Output format: table (default) or json
  --kubeconfig string   Path to kubeconfig (defaults to KUBECONFIG env / ~/.kube/config)
  --context string      Kubeconfig context to use
  --config string       Path to kri config file (default ~/.kri/config.yaml)
Plan and apply

kri can open one PR per app containing a values-resources.yaml file with the recommended resource changes. See Apply workflow for setup.

Two-step (recommended):

kri plan               # generates kri-plan.yaml
# edit kri-plan.yaml — set apply: false to skip an app, adjust values
kri apply              # prints summary, prompts for confirmation, opens PRs
kri apply --dry-run    # shows what would happen without opening PRs

One-shot:

kri apply --all        # runs inspect, prints table + findings, prompts, opens PRs
kri apply --all --dry-run
kri plan [flags]
  --window string     Observation window (default "7d")
  --confidence float  Confidence threshold (default 0.8)
  --dir string        Directory to write kri-plan.yaml (default: current directory)

kri apply [flags]
  --all               Run inspect pipeline instead of reading kri-plan.yaml
  --dry-run           Show what would be applied without opening PRs
  --dir string        Directory to read kri-plan.yaml from (default: current directory)
  --window string     Observation window (only with --all, default "7d")
  --confidence float  Confidence threshold (only with --all, default 0.8)

GITHUB_TOKEN must be set in the environment when running kri apply (not required for --dry-run).

Apply workflow

kri writes resource changes to a separate values-resources.yaml file alongside each app's main values.yaml. This keeps kri's changes isolated and avoids YAML roundtrip issues with the main config.

One-time ArgoCD setup: Add values-resources.yaml to your AppSet's valueFiles list with ignoreMissingValueFiles: true. The file is picked up automatically once kri creates it; before that ArgoCD silently ignores the missing file.

# In your ApplicationSet template
spec:
  source:
    helm:
      valueFiles:
        - values.yaml
        - values-resources.yaml
      ignoreMissingValueFiles: true

Values file format (values-resources.yaml — do not edit manually):

Single-container apps:

# Generated by kri — do not edit manually
resources:
  requests:
    cpu: 10m
    memory: 16Mi
  limits:
    cpu: 10m
    memory: 16Mi

Multi-container apps use a containers: list with per-container resources blocks.

Note: When requests == limits (Guaranteed QoS), kri updates both together to preserve the QoS class.

Output

Table columns
Column Description
APP ArgoCD Application CR name (metadata.name). Typically matches the Helm release name unless spec.source.helm.releaseName is set explicitly.
CLUSTER ArgoCD destination cluster name (spec.destination.name)
NAMESPACE Pod namespace
POD Pod name
CONTAINER Container name
CPU_REQ CPU request from kube-state-metrics
CPU_P95 p95 CPU usage over the observation window
CPU_P99 p99 CPU usage over the observation window
MEM_REQ Memory request from kube-state-metrics
MEM_P95 p95 memory working set over the observation window
MEM_P99 p99 memory working set over the observation window
MEM/LIM p99 memory as a percentage of the memory limit
BEHAVIOR Classified behavior (see below)
CONF Classification confidence
HPA HPA validation result: -, OK, WARN, or ERROR
REC Recommendation flag: - (none), ok (within tolerance), YES (actionable), hold

Rows flagged with WARN/ERROR in HPA or YES in REC are expanded in the Findings block printed below the table.

Behavior classes
Class Meaning
STATIC Low, stable utilization — good candidate for right-sizing
SPIKY High p99/p50 ratio — bursting workload
GROWTH Sustained memory trend upward toward the limit
RUNAWAY Memory p99 at or near limit — OOM risk
MIXED Pods within the same workload disagree — investigate divergence before acting
UNKNOWN Insufficient data

Classification thresholds:

  • RUNAWAY: mem p99 ≥ 90% of limit
  • SPIKY: CPU p99/p50 ≥ 2.0 or mem p99/p50 ≥ 1.8
  • GROWTH: trend > 1% of mem p50/hr AND mem p99 ≥ 30% of limit (pods well within their limit are not classified GROWTH, to avoid trend noise on idle workloads)
  • STATIC: CPU p99/p50 < 1.5 AND mem p99/p50 < 1.3 AND flat trend

Recommendations add headroom above observed p99: +20% for CPU (rounded up to nearest 10m), +30% for memory (rounded up to nearest Mi). A change is only emitted when the recommended value differs from the current request by more than 10%.

HPA validation
Check Condition Severity
CPU request missing HPA targets CPU but no CPU request set ERROR
Memory request missing HPA targets memory but no memory request set ERROR
Target utilization too high HPA target % above p95 actual utilization WARN
Target utilization too low HPA target % well below p50 WARN
Min replicas too low minReplicas = 1 on a SPIKY workload WARN
Max replicas too low maxReplicas hit in Prometheus history WARN
Scaling metric mismatch CPU HPA on a memory-bound workload WARN

Data sources

  • Pod inventory: kube_pod_container_resource_requests, kube_pod_container_resource_limits, kube_pod_status_phase from kube-state-metrics
  • Workload resolution: kube_pod_owner (pod → ReplicaSet) + kube_replicaset_owner (RS → Deployment) chain
  • CPU usage: rate(container_cpu_usage_seconds_total[5m]) quantiles via quantile_over_time
  • Memory usage: container_memory_working_set_bytes quantiles via quantile_over_time
  • Memory trend: deriv(container_memory_working_set_bytes[window]) * 3600 (bytes/hour)
  • HPA config: autoscaling/v2 HorizontalPodAutoscaler resources via Kubernetes API
  • Values files: Helm values read directly from git via shallow clone

kri-operator

kri logic is automated by kri-operator, a Kubernetes operator that runs the inspect → plan → apply workflow on a schedule and posts rollback diagnosis reports to Slack. The CLI remains fully functional as a developer and debug interface into the same underlying logic.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
pkg
git
hpa

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL