errata-cli

module
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 23, 2026 License: Apache-2.0

README

Errata

Compare AI models on real tasks. Same prompt, same tools, different models.

Prerequisites

You'll need to install go.

Install

git clone https://github.com/errata-app/errata-cli.git
cd errata-cli
go build -o errata ./cmd/errata

or

go install github.com/errata-app/errata-cli/cmd/errata@latest

or

Windows

git clone https://github.com/errata-app/errata-cli.git
cd errata-cli
go build -o errata.exe ./cmd/errata

Quick Start

Set your API keys in a .env file:

ANTHROPIC_API_KEY=sk-ant-...
OPENROUTER_API_KEY=sk-or-...
GOOGLE_API_KEY=AI...
OPENAI_API_KEY=sk...

The example recipe assumes the OpenRouter naming convention Errata uses for models. If you are running provider APIs directly you may be able to remove the */ before the model names.

Run the example recipe (The recipe costs between $1-2 dollars per run):

./errata run -r go_docstore.md --verbose

or

errata run -r go_docstore.md --verbose

or

Windows

.\errata.exe run -r go_docstore.md --verbose

Sample output:

errata: DocStore Bug Fix Challenge (1 task, 6 models, task_mode=isolated)

[1/1] The Go project at go_gauntlet_test/challenge11_docstore/ has failing te...
    [claude-sonnet-4-6] bash: cd go_gauntlet_test/challenge11_docstore && go test -v ./...
    [claude-sonnet-4-6] reading: go_gauntlet_test/challenge11_docstore/collection.go
    [claude-sonnet-4-6] reading: go_gauntlet_test/challenge11_docstore/query.go
    [claude-sonnet-4-6] writing: go_gauntlet_test/challenge11_docstore/collection.go
    [claude-sonnet-4-6] writing: go_gauntlet_test/challenge11_docstore/index.go
    [claude-sonnet-4-6] bash: cd go_gauntlet_test/challenge11_docstore && go test -v ./...
    [gemini-2.5-pro] bash: cd go_gauntlet_test/challenge11_docstore && go test -v ./...
    [gemini-2.5-pro] reading: go_gauntlet_test/challenge11_docstore/document.go
    ...
  claude-sonnet-4-6      PASS   12804ms  $0.0891  4/4 criteria
  gemini-2.5-pro         PASS   18443ms  $0.0467  4/4 criteria
  o3                     PASS   21587ms  $0.1203  4/4 criteria
  gpt-4.1                FAIL   15872ms  $0.0312  2/4 criteria
  claude-haiku-4.5       FAIL    9241ms  $0.0038  1/4 criteria
  llama-3.1-8b-instruct  FAIL    6102ms  $0.0004  0/4 criteria

Summary: 1 task, $0.2915 total cost
Report saved to data/outputs/rpt_019cba97.json

Write Your Own Recipe

Create a Markdown file with these sections:

## Models
<!-- Which models to test -->
- claude-sonnet-4-6
- openai/gpt-4o
- google/gemini-2.5-flash

## System Prompt
<!-- Instructions given to every model -->
You are a senior Go developer. Always run tests before proposing changes.

## Tools
<!-- Which tools models can use, empty section for none, omit section entirely for all -->
- read_file
- edit_file
- bash
- search_code

## Tasks
<!-- The prompts to send to each model -->
- Write a CLI that fetches weather data from an API
- Refactor the handler to use dependency injection

## Success Criteria
<!-- How to score each response -->
- run: go build ./...
- run: go test ./...
- file_exists: main.go

Then run it:

errata run -r my-recipe.md

Interactive Mode (TUI)

Start the TUI with errata (no subcommand). Type a prompt, and every configured model works on it concurrently. Live panels show each model's tool activity. When they finish, pick the best response — its file writes are applied to disk and your choice is logged to data/preferences.jsonl.

errata                           # fresh session
errata --continue                # resume most recent session
errata --resume <id>             # resume a specific session
Key commands
Command What it does
/config Browse and edit the session recipe (models, tools, constraints, system prompt)
/stats Show model win counts and session cost
/compact Summarize conversation history to free context window
/resume Re-run only interrupted models from a cancelled run
/rewind Undo the last run (revert writes and remove from context)
/save [path] Save the session recipe to disk
/load <path> Load a recipe file into the session
/export [path] Export output report
/publish Publish recipe to errata.app
/pull <author/slug> Pull recipe from errata.app
/verbose Toggle showing model text alongside tool events
/help Show all commands
Data and output

Errata stores all data under data/:

Path Contents
data/preferences.jsonl Every model selection (append-only)
data/outputs/ JSON reports from headless runs and /export
data/sessions/ Per-session history, feed, and recipe state
data/prompt_history.jsonl Prompt recall for Up-arrow / Ctrl-R

View a summary with /stats in the TUI or errata stats from the command line. Filter by recipe with errata stats --recipe <name>.

Querying with jq examples

Win counts per model:

jq -s 'group_by(.selected) | map({model: .[0].selected, wins: length}) | sort_by(-.wins)' data/preferences.jsonl

All models that passed every criterion in a headless report:

jq '.tasks[].criteria_results | to_entries[] | select(.value | all(.passed)) | .key' data/outputs/*.json

Per-model pass rate from a headless report:

jq '.summary.per_model | to_entries[] | "\(.key): \(.value.criteria_passed)/\(.value.criteria_total)"' data/outputs/*.json

Recipe Sharing

Share recipes via errata.app. Authenticate with GitHub, then publish and pull recipes from the command line or the TUI.

errata login                     # authenticate via GitHub
errata whoami                    # show current user
errata publish                   # publish the session recipe
errata pull alice/code-review    # download a community recipe
errata logout                    # revoke and delete token

In the TUI, use /publish and /pull <author/slug> for the same functionality.

Community

Directories

Path Synopsis
cmd
errata command
internal
api
Package api provides an HTTP client for the errata.app backend API.
Package api provides an HTTP client for the errata.app backend API.
capabilities
Package capabilities provides hardcoded model capability defaults and merging logic for user-provided overrides.
Package capabilities provides hardcoded model capability defaults and merging logic for user-provided overrides.
checkpoint
Package checkpoint provides save/load for interrupted run state, enabling resume of partially-completed agent runs.
Package checkpoint provides save/load for interrupted run state, enabling resume of partially-completed agent runs.
commands
Package commands defines the canonical list of Errata slash commands.
Package commands defines the canonical list of Errata slash commands.
config
Package config loads Errata settings from environment variables and .env.
Package config loads Errata settings from environment variables and .env.
criteria
Package criteria parses and evaluates success criteria from Errata recipe files.
Package criteria parses and evaluates success criteria from Errata recipe files.
datastore
Package datastore provides a unified data layer for session-scoped persistence.
Package datastore provides a unified data layer for session-scoped persistence.
diff
Package diff computes unified-style diffs for proposed file writes.
Package diff computes unified-style diffs for proposed file writes.
headless
Package headless runs Errata recipe tasks without user interaction.
Package headless runs Errata recipe tasks without user interaction.
hooks
Package hooks provides lifecycle event hooks for the agentic loop.
Package hooks provides lifecycle event hooks for the agentic loop.
jsonutil
Package jsonutil provides generic helpers for atomic JSON file I/O.
Package jsonutil provides generic helpers for atomic JSON file I/O.
logging
Package logging provides optional per-run logging for all model adapter calls.
Package logging provides optional per-run logging for all model adapter calls.
mcp
Package mcp implements a minimal MCP (Model Context Protocol) client.
Package mcp implements a minimal MCP (Model Context Protocol) client.
models
Package models defines the ModelAdapter interface and shared data types.
Package models defines the ModelAdapter interface and shared data types.
output
Package output generates structured JSON reports after each Errata run.
Package output generates structured JSON reports after each Errata run.
paths
Package paths provides a single source of truth for all data directory paths.
Package paths provides a single source of truth for all data directory paths.
prompthistory
Package prompthistory persists the user's submitted prompts so they can be recalled across sessions (Up-arrow cycling, Ctrl-R search).
Package prompthistory persists the user's submitted prompts so they can be recalled across sessions (Up-arrow cycling, Ctrl-R search).
reminders
Package reminders provides conditional mid-conversation prompt injection.
Package reminders provides conditional mid-conversation prompt injection.
runner
Package runner fans out prompts to multiple model adapters concurrently.
Package runner fans out prompts to multiple model adapters concurrently.
sandbox
Package sandbox provides OS-level process sandboxing for bash subprocesses spawned by Errata's agentic tool loop.
Package sandbox provides OS-level process sandboxing for bash subprocesses spawned by Errata's agentic tool loop.
session
Package session manages ephemeral session lifecycle: IDs, per-session directory paths, metadata persistence, and feed serialization for replay.
Package session manages ephemeral session lifecycle: IDs, per-session directory paths, metadata persistence, and feed serialization for replay.
subagent
Package subagent implements sub-agent spawning for the spawn_agent tool.
Package subagent implements sub-agent spawning for the spawn_agent tool.
tooloutput
Package tooloutput provides deterministic truncation of tool output before it is fed back into the model context.
Package tooloutput provides deterministic truncation of tool output before it is fed back into the model context.
tools
Package tools defines the canonical tool schemas and file I/O executors.
Package tools defines the canonical tool schemas and file I/O executors.
ui
Package ui implements the bubbletea TUI for Errata.
Package ui implements the bubbletea TUI for Errata.
uid
Package uid provides type-prefixed UUID v7 generation for all Errata IDs.
Package uid provides type-prefixed UUID v7 generation for all Errata IDs.
pkg
recipe
Package recipe parses and resolves Errata recipe.md configuration files.
Package recipe parses and resolves Errata recipe.md configuration files.
recipestore
Package recipestore provides a content-addressed store for recipe/configuration snapshots.
Package recipestore provides a content-addressed store for recipe/configuration snapshots.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL