Documentation
¶
Overview ¶
Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates. No GPU required.
Mamba-3 SSM has O(1) per-token decode cost (state recurrence), while Transformer attention has O(n) cost per token (KV cache scan grows with sequence length). This benchmark quantifies the throughput advantage at sequence lengths 512, 2048, and 8192.
Usage:
bench_mamba [--layers 24] [--d-model 2048] [--d-state 16] [--d-inner 4096] [--heads 16] [--head-dim 128] [--gpu-tflops 150]
Click to show internal directories.
Click to hide internal directories.