English | δΈζ
πΎ catpaw
catpaw is a lightweight monitoring agent with AI-powered diagnostics.
It detects anomalies through plugin-based checks, produces standardized events, and β when an alert fires β can automatically trigger AI root-cause analysis using 70+ built-in diagnostic tools.
Events can be forwarded to any alert platform (Flashduty, PagerDuty, or any HTTP endpoint), or simply printed to the console for quick validation.
β¨ Key Features
- πͺΆ Lightweight, zero heavy dependencies β single binary, easy to deploy
- π Plugin-based monitoring β 25+ check plugins, enable only what you need
- π€ AI-powered diagnosis β automatic root-cause analysis triggered by alerts
- π¬ Interactive AI chat β troubleshoot issues conversationally with AI + tools
- π©Ί Proactive health inspection β on-demand AI-driven health checks
- π οΈ 70+ diagnostic tools β system, network, storage, security, process, kernel
- π‘ Flexible notification β console, generic WebAPI, Flashduty, PagerDuty, or any combination
- π Self-monitoring friendly β ideal for monitoring your monitoring systems
ποΈ Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β catpaw agent β
β β
β βββββββββββββββ alert ββββββββββββββββ AI + Tools β
β β 25+ Check β ββββββββββ β AI Diagnose β βββββββββββββββ β
β β Plugins β trigger β Engine β β β
β ββββββββ¬βββββββ ββββββββββββββββ β β
β β βΌ β
β β events ββββββββββββββββ βββββββββββββββββ β
β βββββββββββ β Notifiers β β 70+ Diagnose β β
β β (multiple) β β Tools β β
β ββββββββββββββββ βββββββββββββββββ β
β β
β βββββββββββββββ β
β β AI Chat β βββββ interactive troubleshoot β
β β (CLI) β β
β βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Check Plugins
| Plugin |
Description |
cert |
TLS certificate expiry check (remote TLS + local files; STARTTLS, SNI, glob) |
conntrack |
Linux conntrack table usage β prevent silent packet drops |
cpu |
CPU utilization and per-core normalized load average |
disk |
Disk space, inode, and writability check |
dns |
DNS resolution check |
docker |
Docker container monitoring (state, restart, health, CPU/mem) |
exec |
Run scripts/commands to produce events (JSON and Nagios modes) |
filecheck |
File existence, mtime, and checksum check |
filefd |
System-level file descriptor usage (Linux) |
http |
HTTP availability, status code, response body, cert expiry |
journaltail |
Incremental journalctl log reading with keyword matching (Linux) |
logfile |
Log file monitoring (offset tracking, rotation, glob, multi-encoding) |
mem |
Memory and swap usage check |
mount |
Mount point baseline (fs type, options compliance; Linux) |
neigh |
ARP/neighbor table usage β prevent new-IP failures (K8s) |
net |
TCP/UDP connectivity and response time |
netif |
Network interface health (link state, error/drop delta; Linux) |
ntp |
NTP sync, clock offset, stratum (Linux) |
ping |
ICMP reachability, packet loss, latency |
procfd |
Per-process fd usage β prevent nofile exhaustion |
procnum |
Process count check (multiple lookup methods) |
redis |
Redis monitoring for standalone, master/replica, and Redis Cluster; includes Redis-specific AI diagnosis tools |
redis_sentinel |
Redis Sentinel monitoring for quorum, master reachability from Sentinel's view, and Sentinel-specific AI diagnosis tools |
scriptfilter |
Script output filter-rule matching |
secmod |
SELinux/AppArmor baseline (Linux) |
sockstat |
TCP listen queue overflow detection (Linux) |
sysctl |
Kernel parameter baseline β detect silent resets (Linux) |
systemd |
systemd service status (Linux) |
tcpstate |
TCP state monitoring (CLOSE_WAIT/TIME_WAIT; Netlink; Linux) |
uptime |
Unexpected reboot detection |
zombie |
Zombie process detection |
When AI diagnosis is triggered (by alert, inspection, or chat), the AI agent has access to a rich toolkit:
βοΈ System & Process: CPU top, memory breakdown, OOM history, cgroup limits, process threads (with wchan), open files, environment variables, PSI pressure
π Network: ping, traceroute, DNS resolve, ARP neighbors, TCP connection states, socket details (RTT/cwnd), retransmission rate, connection latency summary, listen queue overflow, TCP tuning check, softnet stats, route table, IP addresses, interface stats, firewall rules
πΎ Storage: disk I/O latency, block device topology, LVM status, mount info
π Kernel & Security: dmesg, interrupts distribution, conntrack stats, NUMA stats, thermal zones, sysctl snapshot, SELinux/AppArmor status, coredump list
π Logs: log tail, log grep (with pattern matching), journald query
π³ Services: systemd service status, failed services list, timer list, Docker ps/inspect
π Remote plugins (Redis, Redis Sentinel, etc.) contribute their own specialized diagnostic tools for deep introspection.
For Redis-specific checks, cluster semantics, and diagnosis tools, see plugins/redis/README.md.
For Redis Sentinel-specific checks, diagnosis tools, and config semantics, see plugins/redis_sentinel/README.md.
π₯οΈ CLI Commands
catpaw run [flags] # Start the monitoring agent
catpaw chat [-v] # Interactive AI chat for troubleshooting
catpaw inspect <plugin> [target] # Proactive AI health inspection
catpaw diagnose list|show <id> # View past diagnosis records
catpaw selftest [filter] [-q] # Smoke-test all diagnostic tools
π Quick Start
π¦ Installation
Download the binary from GitHub Releases.
Basic Monitoring
- Enable plugin configs under
conf.d/p.<plugin>/
- Start:
./catpaw run
The default config enables [notify.console], so events are printed to the terminal with colored output β no external service needed for a quick test.
π‘ Event Notification
catpaw supports multiple notification channels. Configure one or more in conf.d/config.toml:
| Channel |
Config Section |
Description |
| Console |
[notify.console] |
Print events to terminal (enabled by default) |
| WebAPI |
[notify.webapi] |
Push raw Event JSON to any HTTP endpoint |
| Flashduty |
[notify.flashduty] |
Forward to Flashduty alert platform |
| PagerDuty |
[notify.pagerduty] |
Forward to PagerDuty incident management |
Multiple channels can be active simultaneously. For example, you can print to console for debugging while also forwarding to your alert platform.
Console (default β for quick validation):
[notify.console]
enabled = true
WebAPI (push raw Event JSON to any HTTP endpoint):
[notify.webapi]
url = "https://your-service.example.com/api/v1/events"
# method = "POST"
# timeout = "10s"
[notify.webapi.headers]
Authorization = "Bearer ${WEBAPI_TOKEN}"
Flashduty:
[notify.flashduty]
integration_key = "your-integration-key"
PagerDuty:
[notify.pagerduty]
routing_key = "your-routing-key"
π€ AI Diagnosis (optional)
Add to conf.d/config.toml:
[ai]
enabled = true
model_priority = ["default"]
[ai.models.default]
base_url = "https://api.openai.com/v1"
api_key = "${OPENAI_API_KEY}"
model = "gpt-4o"
Now when alerts fire, AI automatically analyzes root cause using built-in diagnostic tools.
π¬ Interactive Chat
./catpaw chat
Ask questions like "Why is CPU high?" or "Check disk I/O latency" β the AI uses diagnostic tools and shell commands (with confirmation) to investigate.
βοΈ Configuration
- Global config:
conf.d/config.toml
- Local override:
conf.d/config.local.toml (loaded last, git-ignored, ideal for developer-only changes)
- Plugin configs:
conf.d/p.<plugin>/*.toml (multiple files merged on load)
- Top-level load order:
config.toml -> other files in conf.d/ -> config.local.toml
- Hot-reload plugin configs with
SIGHUP:
kill -HUP $(pidof catpaw)
π Documentation
WeChat: add picobyte and mention catpaw to join the group.