errata-cli

module

v0.1.5 Latest Latest Go to latest Published: Mar 23, 2026 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/errata-app/errata-cli

Links

Open Source Insights

README ¶

Errata

Compare AI models on real tasks. Same prompt, same tools, different models.

Prerequisites

You'll need to install go.

Install

git clone https://github.com/errata-app/errata-cli.git
cd errata-cli
go build -o errata ./cmd/errata

or

go install github.com/errata-app/errata-cli/cmd/errata@latest

or

Windows

git clone https://github.com/errata-app/errata-cli.git
cd errata-cli
go build -o errata.exe ./cmd/errata

Quick Start

Set your API keys in a .env file:

ANTHROPIC_API_KEY=sk-ant-...
OPENROUTER_API_KEY=sk-or-...
GOOGLE_API_KEY=AI...
OPENAI_API_KEY=sk...

The example recipe assumes the OpenRouter naming convention Errata uses for models. If you are running provider APIs directly you may be able to remove the */ before the model names.

Run the example recipe (The recipe costs between $1-2 dollars per run):

./errata run -r go_docstore.md --verbose

or

errata run -r go_docstore.md --verbose

or

Windows

.\errata.exe run -r go_docstore.md --verbose

Sample output:

errata: DocStore Bug Fix Challenge (1 task, 6 models, task_mode=isolated)

[1/1] The Go project at go_gauntlet_test/challenge11_docstore/ has failing te...
    [claude-sonnet-4-6] bash: cd go_gauntlet_test/challenge11_docstore && go test -v ./...
    [claude-sonnet-4-6] reading: go_gauntlet_test/challenge11_docstore/collection.go
    [claude-sonnet-4-6] reading: go_gauntlet_test/challenge11_docstore/query.go
    [claude-sonnet-4-6] writing: go_gauntlet_test/challenge11_docstore/collection.go
    [claude-sonnet-4-6] writing: go_gauntlet_test/challenge11_docstore/index.go
    [claude-sonnet-4-6] bash: cd go_gauntlet_test/challenge11_docstore && go test -v ./...
    [gemini-2.5-pro] bash: cd go_gauntlet_test/challenge11_docstore && go test -v ./...
    [gemini-2.5-pro] reading: go_gauntlet_test/challenge11_docstore/document.go
    ...
  claude-sonnet-4-6      PASS   12804ms  $0.0891  4/4 criteria
  gemini-2.5-pro         PASS   18443ms  $0.0467  4/4 criteria
  o3                     PASS   21587ms  $0.1203  4/4 criteria
  gpt-4.1                FAIL   15872ms  $0.0312  2/4 criteria
  claude-haiku-4.5       FAIL    9241ms  $0.0038  1/4 criteria
  llama-3.1-8b-instruct  FAIL    6102ms  $0.0004  0/4 criteria

Summary: 1 task, $0.2915 total cost
Report saved to data/outputs/rpt_019cba97.json

Write Your Own Recipe

Create a Markdown file with these sections:

## Models
<!-- Which models to test -->
- claude-sonnet-4-6
- openai/gpt-4o
- google/gemini-2.5-flash

## System Prompt
<!-- Instructions given to every model -->
You are a senior Go developer. Always run tests before proposing changes.

## Tools
<!-- Which tools models can use, empty section for none, omit section entirely for all -->
- read_file
- edit_file
- bash
- search_code

## Tasks
<!-- The prompts to send to each model -->
- Write a CLI that fetches weather data from an API
- Refactor the handler to use dependency injection

## Success Criteria
<!-- How to score each response -->
- run: go build ./...
- run: go test ./...
- file_exists: main.go

Then run it:

errata run -r my-recipe.md

Interactive Mode (TUI)

Start the TUI with errata (no subcommand). Type a prompt, and every configured model works on it concurrently. Live panels show each model's tool activity. When they finish, pick the best response — its file writes are applied to disk and your choice is logged to data/preferences.jsonl.

errata                           # fresh session
errata --continue                # resume most recent session
errata --resume <id>             # resume a specific session

Key commands

Command	What it does
`/config`	Browse and edit the session recipe (models, tools, constraints, system prompt)
`/stats`	Show model win counts and session cost
`/compact`	Summarize conversation history to free context window
`/resume`	Re-run only interrupted models from a cancelled run
`/rewind`	Undo the last run (revert writes and remove from context)
`/save [path]`	Save the session recipe to disk
`/load <path>`	Load a recipe file into the session
`/export [path]`	Export output report
`/publish`	Publish recipe to errata.app
`/pull <author/slug>`	Pull recipe from errata.app
`/verbose`	Toggle showing model text alongside tool events
`/help`	Show all commands

Data and output

Errata stores all data under data/:

Path	Contents
`data/preferences.jsonl`	Every model selection (append-only)
`data/outputs/`	JSON reports from headless runs and `/export`
`data/sessions/`	Per-session history, feed, and recipe state
`data/prompt_history.jsonl`	Prompt recall for Up-arrow / Ctrl-R

View a summary with /stats in the TUI or errata stats from the command line. Filter by recipe with errata stats --recipe <name>.

Querying with jq examples

Win counts per model:

jq -s 'group_by(.selected) | map({model: .[0].selected, wins: length}) | sort_by(-.wins)' data/preferences.jsonl

All models that passed every criterion in a headless report:

jq '.tasks[].criteria_results | to_entries[] | select(.value | all(.passed)) | .key' data/outputs/*.json

Per-model pass rate from a headless report:

jq '.summary.per_model | to_entries[] | "\(.key): \(.value.criteria_passed)/\(.value.criteria_total)"' data/outputs/*.json

Share recipes via errata.app. Authenticate with GitHub, then publish and pull recipes from the command line or the TUI.

errata login                     # authenticate via GitHub
errata whoami                    # show current user
errata publish                   # publish the session recipe
errata pull alice/code-review    # download a community recipe
errata logout                    # revoke and delete token

In the TUI, use /publish and /pull <author/slug> for the same functionality.

Community

Browse and share recipes: errata.app
Report issues: GitHub Issues

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
errata command
internal
adapters
api Package api provides an HTTP client for the errata.app backend API.	Package api provides an HTTP client for the errata.app backend API.
capabilities Package capabilities provides hardcoded model capability defaults and merging logic for user-provided overrides.	Package capabilities provides hardcoded model capability defaults and merging logic for user-provided overrides.
checkpoint Package checkpoint provides save/load for interrupted run state, enabling resume of partially-completed agent runs.	Package checkpoint provides save/load for interrupted run state, enabling resume of partially-completed agent runs.
commands Package commands defines the canonical list of Errata slash commands.	Package commands defines the canonical list of Errata slash commands.
config Package config loads Errata settings from environment variables and .env.	Package config loads Errata settings from environment variables and .env.
criteria Package criteria parses and evaluates success criteria from Errata recipe files.	Package criteria parses and evaluates success criteria from Errata recipe files.
datastore Package datastore provides a unified data layer for session-scoped persistence.	Package datastore provides a unified data layer for session-scoped persistence.
diff Package diff computes unified-style diffs for proposed file writes.	Package diff computes unified-style diffs for proposed file writes.
headless Package headless runs Errata recipe tasks without user interaction.	Package headless runs Errata recipe tasks without user interaction.
hooks Package hooks provides lifecycle event hooks for the agentic loop.	Package hooks provides lifecycle event hooks for the agentic loop.
jsonutil Package jsonutil provides generic helpers for atomic JSON file I/O.	Package jsonutil provides generic helpers for atomic JSON file I/O.
logging Package logging provides optional per-run logging for all model adapter calls.	Package logging provides optional per-run logging for all model adapter calls.
mcp Package mcp implements a minimal MCP (Model Context Protocol) client.	Package mcp implements a minimal MCP (Model Context Protocol) client.
models Package models defines the ModelAdapter interface and shared data types.	Package models defines the ModelAdapter interface and shared data types.
output Package output generates structured JSON reports after each Errata run.	Package output generates structured JSON reports after each Errata run.
paths Package paths provides a single source of truth for all data directory paths.	Package paths provides a single source of truth for all data directory paths.
pricing
prompt
prompthistory Package prompthistory persists the user's submitted prompts so they can be recalled across sessions (Up-arrow cycling, Ctrl-R search).	Package prompthistory persists the user's submitted prompts so they can be recalled across sessions (Up-arrow cycling, Ctrl-R search).
reminders Package reminders provides conditional mid-conversation prompt injection.	Package reminders provides conditional mid-conversation prompt injection.
runner Package runner fans out prompts to multiple model adapters concurrently.	Package runner fans out prompts to multiple model adapters concurrently.
sandbox Package sandbox provides OS-level process sandboxing for bash subprocesses spawned by Errata's agentic tool loop.	Package sandbox provides OS-level process sandboxing for bash subprocesses spawned by Errata's agentic tool loop.
session Package session manages ephemeral session lifecycle: IDs, per-session directory paths, metadata persistence, and feed serialization for replay.	Package session manages ephemeral session lifecycle: IDs, per-session directory paths, metadata persistence, and feed serialization for replay.
subagent Package subagent implements sub-agent spawning for the spawn_agent tool.	Package subagent implements sub-agent spawning for the spawn_agent tool.
tooloutput Package tooloutput provides deterministic truncation of tool output before it is fed back into the model context.	Package tooloutput provides deterministic truncation of tool output before it is fed back into the model context.
tools Package tools defines the canonical tool schemas and file I/O executors.	Package tools defines the canonical tool schemas and file I/O executors.
ui Package ui implements the bubbletea TUI for Errata.	Package ui implements the bubbletea TUI for Errata.
uid Package uid provides type-prefixed UUID v7 generation for all Errata IDs.	Package uid provides type-prefixed UUID v7 generation for all Errata IDs.
pkg
recipe Package recipe parses and resolves Errata recipe.md configuration files.	Package recipe parses and resolves Errata recipe.md configuration files.
recipestore Package recipestore provides a content-addressed store for recipe/configuration snapshots.	Package recipestore provides a content-addressed store for recipe/configuration snapshots.