Litmus: Deterministic Alertmanager Validation

Stop hoping your alerts work. Start proving they do.
Litmus is a testing and validation framework for Prometheus Alertmanager configurations. It brings software engineering rigor to alert routing through automated regression snapshots, behavioral unit tests, and static analysis.
Features
✅ Interactive Web UI — Explore routes, run tests, and view diffs in your browser
✅ Regression Testing — Detect unintended routing changes
✅ Behavioral Tests — Verify alert logic under real-world conditions
✅ Static Analysis — Catch shadowed routes and circular inhibitions
✅ CI/CD Ready — Fast validation, exit codes, clear reporting
✅ Production Parity — Uses official Alertmanager libraries
Quick Start
# Initialize workspace
litmus init
# Create regression baseline
litmus snapshot capture
# Validate configuration
litmus check
# See what changed
litmus diff
Interactive UI
Litmus includes a web-based UI for interactive testing and exploration.
# Start the web server
litmus serve
This launches a local web server that provides:
- Route Explorer — Interactively trace how an alert is routed.
- Test Lab — Run behavioral and regression tests from your browser.
- Diff Viewer — See regression test failures in a rich UI.
For a full guide, see the Interactive UI Guide.
Documentation
Script (curl)
curl -sSL https://raw.githubusercontent.com/nyambati/litmus/main/scripts/install.sh | sh
Docker
docker run ghcr.io/nyambati/litmus:latest litmus check
Example Workflow
# 1. Initialize
$ litmus init
Created litmus.yaml, tests/, .gitattributes
# 2. Create baseline
$ litmus snapshot capture
✓ Generated baseline: regressions/regressions.litmus.yml
✓ Archived: regressions/<timestamp>.mpk
# 3. Write a test
$ cat > tests/critical-alert.yml << 'EOF'
- name: "Critical alerts reach on-call"
system_state:
active_alerts: []
alert:
labels:
severity: critical
team: database
expect:
receivers: [database-oncall]
EOF
# 4. Validate
$ litmus check
Litmus Check: alertmanager.yaml
--------------------------------------------------
1. Sanity (Static Analysis)
[OK] No shadowed routes detected
2. Regressions (Automated)
[PASS] 42/42 cases passed
3. Behavioral (Unit Tests)
[PASS] 1/1 unit tests passed
Design Philosophy
Parity with Production
Uses official Prometheus Alertmanager libraries. If a test passes in Litmus, it behaves identically in production.
Three Layers of Validation
- Regression — Catch accidental routing changes
- Behavioral — Verify intent under specific conditions
- Sanity — Find dead code and logical errors
Shift Left
Catch alert routing errors during development, not in production.
Use Cases
"Did I break anything?"
litmus diff # See exactly what changed
litmus check # Validate the change
"How do I know my alerts work?"
Write behavioral tests. Litmus verifies routing and silencing logic.
"Can I safely refactor my config?"
Use litmus snapshot update to accept changes, then test with litmus check.
Project Status
Version: 0.2.0-alpha
Status: Active Development
See Backlog for planned features.
Contributing
Contributions welcome! See Engineering Standards.
Support
License
MIT License. See LICENSE file.
Made with ❤️
By the Litmus team. Part of the observability ecosystem.