benchmark

package
v1.38.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

Package benchmark provides a standardized benchmark suite for measuring ML model inference performance: tok/s decode, tok/s prefill, memory usage, and time to first token.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ResultsJSON

func ResultsJSON(results []BenchmarkResult) ([]byte, error)

ResultsJSON returns the benchmark results as a JSON byte slice.

Types

type BenchmarkResult

type BenchmarkResult struct {
	ModelName           string  `json:"model_name"`
	Quantization        string  `json:"quantization"`
	BatchSize           int     `json:"batch_size"`
	DecodeTokensPerSec  float64 `json:"decode_tokens_per_sec"`
	PrefillTokensPerSec float64 `json:"prefill_tokens_per_sec"`
	MemoryUsageMB       float64 `json:"memory_usage_mb"`
	TimeToFirstTokenMS  float64 `json:"time_to_first_token_ms"`
	Timestamp           string  `json:"timestamp"`
}

BenchmarkResult holds the metrics from a single benchmark configuration.

func RunB

func RunB(b *testing.B, cfg Config, infer InferenceFunc) []BenchmarkResult

RunB is a helper for integrating with Go's testing.B. It creates a suite and runs it within the benchmark function, reporting decode tok/s as the benchmark metric.

type Config

type Config struct {
	Models        []ModelSpec `json:"models"`
	Quantizations []string    `json:"quantizations"`
	BatchSizes    []int       `json:"batch_sizes"`
	WarmupRuns    int         `json:"warmup_runs"`
	BenchmarkRuns int         `json:"benchmark_runs"`
}

Config controls what the benchmark suite measures.

func (Config) Validate

func (c Config) Validate() error

Validate checks that the configuration is well-formed.

type InferenceFunc

type InferenceFunc func(ctx context.Context, model ModelSpec, quantization string, batchSize int) (RunMetrics, error)

InferenceFunc is the function signature that the suite calls to run a single inference benchmark. Implementations should return metrics for one run of the given model, quantization, and batch size.

type ModelSpec

type ModelSpec struct {
	Path         string `json:"path"`
	Name         string `json:"name"`
	Architecture string `json:"architecture"`
}

ModelSpec identifies a model to benchmark.

type RunMetrics

type RunMetrics struct {
	DecodeTokensPerSec  float64
	PrefillTokensPerSec float64
	MemoryUsageMB       float64
	TimeToFirstTokenMS  float64
}

RunMetrics holds the raw measurements from a single inference run.

type Suite

type Suite struct {
	// contains filtered or unexported fields
}

Suite orchestrates running standardized benchmarks across all combinations of models, quantizations, and batch sizes.

func NewSuite

func NewSuite(cfg Config, infer InferenceFunc) (*Suite, error)

NewSuite creates a benchmark suite with the given configuration and inference function.

func (*Suite) Run

func (s *Suite) Run(ctx context.Context) ([]BenchmarkResult, error)

Run executes all model x quantization x batch_size combinations, performing warmup runs followed by benchmark runs. It returns one BenchmarkResult per combination with mean metrics across the benchmark runs.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL