Documentation
¶
Overview ¶
Package benchmark provides a standardized benchmark suite for measuring ML model inference performance: tok/s decode, tok/s prefill, memory usage, and time to first token.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ResultsJSON ¶
func ResultsJSON(results []BenchmarkResult) ([]byte, error)
ResultsJSON returns the benchmark results as a JSON byte slice.
Types ¶
type BenchmarkResult ¶
type BenchmarkResult struct {
ModelName string `json:"model_name"`
Quantization string `json:"quantization"`
BatchSize int `json:"batch_size"`
DecodeTokensPerSec float64 `json:"decode_tokens_per_sec"`
PrefillTokensPerSec float64 `json:"prefill_tokens_per_sec"`
MemoryUsageMB float64 `json:"memory_usage_mb"`
TimeToFirstTokenMS float64 `json:"time_to_first_token_ms"`
Timestamp string `json:"timestamp"`
}
BenchmarkResult holds the metrics from a single benchmark configuration.
func RunB ¶
func RunB(b *testing.B, cfg Config, infer InferenceFunc) []BenchmarkResult
RunB is a helper for integrating with Go's testing.B. It creates a suite and runs it within the benchmark function, reporting decode tok/s as the benchmark metric.
type Config ¶
type Config struct {
Models []ModelSpec `json:"models"`
Quantizations []string `json:"quantizations"`
BatchSizes []int `json:"batch_sizes"`
WarmupRuns int `json:"warmup_runs"`
BenchmarkRuns int `json:"benchmark_runs"`
}
Config controls what the benchmark suite measures.
type InferenceFunc ¶
type InferenceFunc func(ctx context.Context, model ModelSpec, quantization string, batchSize int) (RunMetrics, error)
InferenceFunc is the function signature that the suite calls to run a single inference benchmark. Implementations should return metrics for one run of the given model, quantization, and batch size.
type ModelSpec ¶
type ModelSpec struct {
Path string `json:"path"`
Name string `json:"name"`
Architecture string `json:"architecture"`
}
ModelSpec identifies a model to benchmark.
type RunMetrics ¶
type RunMetrics struct {
DecodeTokensPerSec float64
PrefillTokensPerSec float64
MemoryUsageMB float64
TimeToFirstTokenMS float64
}
RunMetrics holds the raw measurements from a single inference run.
type Suite ¶
type Suite struct {
// contains filtered or unexported fields
}
Suite orchestrates running standardized benchmarks across all combinations of models, quantizations, and batch sizes.