Documentation
¶
Overview ¶
benchrunner is a black-box benchmark harness that compares coding agents by running them against a set of tasks and collecting structured traces.
Usage:
go run ./bench/cmd/benchrunner/ [flags] go build ./bench/cmd/benchrunner/ && ./benchrunner [flags]
Flags:
--agent string Filter to a single agent ID (e.g., "deepseekcode-current") --task string Filter to a single task ID (e.g., "ctx-long-readonly") --dry-run Show what would run without executing --bench-dir string Root bench directory (default "bench")
Click to show internal directories.
Click to hide internal directories.