llmapibenchmark

module

v1.0.9 Latest Latest Go to latest Published: Mar 25, 2026 License: GPL-3.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Yoosu-L/llmapibenchmark

Links

Open Source Insights

README ¶

LLM API Benchmark Tool

Overview

The LLM API Benchmark Tool is a flexible Go-based utility designed to measure and analyze the performance of OpenAI-compatible API endpoints across different concurrency levels. This tool provides in-depth insights into API throughput, generation speed, and token processing capabilities.

Key Features

🚀 Dynamic Concurrency Testing
📊 Comprehensive Performance Metrics
🔍 Flexible Configuration
📝 Markdown Result Reporting
🌐 Compatible with Any OpenAI-Like API
📏 Arbitrary Length Dynamic Input Prompt

Performance Metrics Measured

Generation Throughput
- Measures tokens generated per second
- Calculates across multiple concurrency levels
Prompt Throughput
- Analyzes input token processing speed
- Helps understand API's prompt handling efficiency
Time to First Token (TTFT)
- Measures initial response latency
- Provides both minimum and maximum TTFT
- Critical for understanding real-time responsiveness

Example Output

Input Tokens: 45
Output Tokens: 512
Test Model: Qwen2.5-7B-Instruct-AWQ
Latency: 2.20 ms

Concurrency	Generation Throughput (tokens/s)	Prompt Throughput (tokens/s)	Min TTFT (s)	Max TTFT (s)
1	58.49	846.81	0.05	0.05
2	114.09	989.94	0.08	0.09
4	222.62	1193.99	0.11	0.15
8	414.35	1479.76	0.11	0.24
16	752.26	1543.29	0.13	0.47
32	653.94	1625.07	0.14	0.89

Usage

Quick Start Guide

Minimal Configuration

Linux:

./llmapibenchmark_linux_amd64 --base-url https://your-api-endpoint.com/v1

Windows:

llmapibenchmark_windows_amd64.exe --base-url https://your-api-endpoint.com/v1

Full Configuration

Linux:

./llmapibenchmark_linux_amd64 \
  --base-url https://your-api-endpoint.com/v1 \
  --api-key YOUR_API_KEY \
  --model gpt-3.5-turbo \
  --concurrency 1,2,4,8,16 \
  --max-tokens 512 \
  --num-words 513 \
  --prompt "Your custom prompt here" \
  --format json

Windows:

llmapibenchmark_windows_amd64.exe ^
  --base-url https://your-api-endpoint.com/v1 ^
  --api-key YOUR_API_KEY ^
  --model gpt-3.5-turbo ^
  --concurrency 1,2,4,8,16 ^
  --max-tokens 512 ^
  --num-words 513 ^
  --prompt "Your custom prompt here" ^
  --format json

Command-Line Parameters

Parameter	Short	Description	Default	Required
`--base-url`	`-u`	Base URL for LLM API endpoint	Empty (MUST be specified)	Yes
`--api-key`	`-k`	API authentication key	None	No
`--model`	`-m`	Specific AI model to test	Automatically discovers first available model	No
`--concurrency`	`-c`	Comma-separated concurrency levels to test	`1,2,4,8,16,32,64,128`	No
`--max-tokens`	`-t`	Maximum tokens to generate per request	`512`	No
`--num-words`	`-n`	Number of words for random input prompt	`0`	No
`--prompt`	`-p`	Text prompt for generating responses	A long story	No
`--format`	`-f`	Output format (json, yaml)	`""`	No
`--help`	`-h`	Show help message	`false`	No

Output

The tool provides output in multiple formats, controlled by the --format flag.

Default (CLI Table and Markdown File)

If no format is specified, the tool generates:

Real-time console results: A table is displayed in the terminal with live updates.
Markdown file: A detailed report is saved to API_Throughput_{ModelName}.md.

Markdown File Columns:

Concurrency: Number of concurrent requests
Generation Throughput: Tokens generated per second
Prompt Throughput: Input token processing speed
Min TTFT: Minimum time to first token
Max TTFT: Maximum time to first token

JSON Output (`--format json`)

When using the --format json flag, the results are printed to the console in JSON format.

YAML Output (`--format yaml`)

When using the --format yaml flag, the results are printed to the console in YAML format.

Best Practices

Test with various prompt lengths and complexities
Compare different models
Monitor for consistent performance
Be mindful of API rate limits
Use -numWords to control input length

Limitations

Requires active API connection
Results may vary based on network conditions
Does not simulate real-world complex scenarios

Disclaimer

This tool is for performance analysis and should be used responsibly in compliance with API provider's usage policies.

Directories ¶

Path	Synopsis
cmd
internal
api
utils

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL