bench_mamba

command

v1.26.2 Latest Latest Go to latest Published: Mar 27, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates. No GPU required.

Mamba-3 SSM has O(1) per-token decode cost (state recurrence), while Transformer attention has O(n) cost per token (KV cache scan grows with sequence length). This benchmark quantifies the throughput advantage at sequence lengths 512, 2048, and 8192.

Usage:

bench_mamba [--layers 24] [--d-model 2048] [--d-state 16] [--d-inner 4096] [--heads 16] [--head-dim 128] [--gpu-tflops 150]

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL