bench_mamba

command
v1.26.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 27, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Command bench_mamba benchmarks Mamba-3 SSM vs Transformer attention decode throughput using synthetic FLOPs-based timing estimates. No GPU required.

Mamba-3 SSM has O(1) per-token decode cost (state recurrence), while Transformer attention has O(n) cost per token (KV cache scan grows with sequence length). This benchmark quantifies the throughput advantage at sequence lengths 512, 2048, and 8192.

Usage:

bench_mamba [--layers 24] [--d-model 2048] [--d-state 16] [--d-inner 4096] [--heads 16] [--head-dim 128] [--gpu-tflops 150]

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL