cochabench

package module

v0.2.0 Latest Latest Go to latest Published: Apr 10, 2026 License: GPL-3.0 Imports: 0 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/EinfachNiklas/cochabench

Links

Open Source Insights

README ¶

CochaBench

Go Version

CochaBench is a comprehensive coding challenge benchmark suite designed to evaluate and compare the performance of developers and AI coding agents across multiple programming languages.

Features

Multi-Language Support: Challenges available in JavaScript, Python, and Go
Standardized Evaluation: Consistent metrics across all challenges including:
- Test execution results (pass/fail rates)
- Execution time tracking
- AI-powered code quality assessment (quality, maintainability, security)
Flexible Workflow: Initialize, start, stop, and evaluate coding challenge attempts
Challenge Management: Easy download and management of challenge sets
Persistent Storage: SQLite-based tracking of all runs and evaluations

Installation

Prerequisites

Go 1.25.5 or higher
Git
For JavaScript challenge evaluation: npm
For Python challenge evaluation: python3 or python, venv, and pip
For AI-assisted evaluation: an API key in LLM_API_KEY

Build from Source

git clone https://github.com/EinfachNiklas/cochabench.git
cd cochabench
go build -o cochabench ./cmd/cochabench

Install Globally

go install github.com/EinfachNiklas/cochabench/cmd/cochabench@latest

Quick Start

1. Initialize Configuration

cochabench config init

The config file is created at ~/.config/cochabench/config.json

2. List Available Challenges

cochabench challenge list

3. Download a Challenge

cochabench challenge get <challenge-id>

Or download all challenges:

cochabench challenge get all

4. Create a Run

Navigate to a challenge directory and initialize a run:

cd <challenge-directory>
cochabench run init --name "my-first-attempt"

Each challenge directory is expected to contain:

src/ with the starter implementation
test/ with the benchmark tests
challenge.config.json with challenge metadata

5. Start Working

cochabench run start --id <run-id>

Work on your solution in the solutions/<run-id>/ directory. CochaBench also creates a local cochabench.db SQLite database in the challenge directory to persist run metadata and evaluation results.

6. Stop the Run

cochabench run stop --id <run-id>

7. Evaluate Your Solution

cochabench run eval --runID <run-id>

Usage

Challenge Management

# List all available challenges
cochabench challenge list

# Download a specific challenge
cochabench challenge get <challenge-id>

# Download all challenges
cochabench challenge get all

Run Management

# Initialize a new run
cochabench run init --name "attempt-1"

# Start a run
cochabench run start --id <run-id>

# Stop a run
cochabench run stop --id <run-id>

# Cancel a run
cochabench run cancel --id <run-id>

# List all runs for current challenge
cochabench run list

Evaluation

# Evaluate a completed run
cochabench run eval --runID <run-id>

# Evaluate without AI assessment
cochabench run eval --runID <run-id> --no-ai-eval

# Debug mode (keep temporary files)
cochabench run eval --runID <run-id> --debug

Configuration

# Initialize config
cochabench config init

# Show all config values
cochabench config show

# Get a specific config value
cochabench config get <key>

# Set a config value
cochabench config set <key> <value>

Evaluation Metrics

CochaBench provides comprehensive evaluation metrics:

Test Results: Total, passed, and failed test counts
Execution Time: Duration of test execution
Quality Score: AI-evaluated code quality (1-10, averaged across multiple runs)
Maintainability Score: AI-evaluated code maintainability (1-10, averaged across multiple runs)
Security Score: AI-evaluated code security (1-10, averaged across multiple runs)

AI support

Currently, only Anthropic (Claude), OpenAI (ChatGPT, Codex) and Google (Gemini) are supported.

The generated config file exposes the following keys:

LLM_PROVIDER (anthropic, openai, google)
LLM_BASE_PATH
LLM_MODEL
CHALLENGE_SERVER

Environment Variables

CochaBench supports the following environment variables:

GITHUB_TOKEN: GitHub personal access token for API requests (optional, required to connect to private challenge server repos)
LLM_API_KEY: API key used for AI evaluation

Security Notice

CochaBench executes challenge code, test suites, and package installation commands locally on your machine.

Do not run untrusted challenges or untrusted solution code without additional isolation.
JavaScript evaluation runs npm install and npm test.
Python evaluation creates a virtual environment and installs packages with pip.
Go evaluation may download modules with go mod download and go mod tidy.

CochaBench does not currently provide sandboxing or container isolation for evaluation.

For vulnerability reporting and additional security guidance, see SECURITY.md.

Development

Running Tests

go test ./...

Project Structure

cochabench/
├── cmd/
│   └── cochabench/          # Main CLI application
├── internal/
│   ├── challenge/           # Challenge download and management
│   ├── cochabenchData/      # Database operations
│   ├── config/              # Configuration management
│   ├── eval/                # Evaluation engine
│   │   └── agent/          # AI evaluation agent
│   ├── run/                 # Run lifecycle management
│   └── tools/               # Shared utilities
└── .github/
    └── workflows/           # CI/CD pipelines

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Please read CONTRIBUTING.md before opening a pull request.

Fork the repository
Create your feature branch (git checkout -b feat/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the GNU GPL v3.0. See LICENSE for the full license text.

Acknowledgments

Built with urfave/cli for CLI interface
Uses langchaingo for AI evaluation
Database management with modernc.org/sqlite

Contact

For questions, issues, or suggestions, please open an issue on GitHub.

Documentation ¶

Overview ¶

Package cochabench documents the CochaBench module.

CochaBench is a coding challenge benchmark suite for developers and AI coding agents. The module primarily ships the `cochabench` CLI, which can download challenge sets, manage benchmark runs, and evaluate solutions.

Install the CLI with:

go install github.com/EinfachNiklas/cochabench/cmd/cochabench@latest

Most implementation packages are internal to the module and are not intended to be imported by external consumers.

Source Files ¶

View all Source files

doc.go

Directories ¶

Path	Synopsis
cmd
cochabench command Package main provides the cochabench command-line interface.	Package main provides the cochabench command-line interface.
internal
challenge
cochabenchData
config
eval
eval/agent
run
tools

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL