e2e/

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

Links

README ¶

E2E Tests

End-to-end tests for the entire CLI against real agents (Claude Code, Gemini CLI, OpenCode, Cursor, Factory AI Droid, Copilot CLI).

Commands

mise run test:e2e [filter]                          # run filtered (or omit filter for all agents)
mise run test:e2e --agent claude-code [filter]       # Claude Code only
mise run test:e2e --agent gemini-cli [filter]        # Gemini CLI only
mise run test:e2e --agent opencode [filter]          # OpenCode only
mise run test:e2e --agent cursor [filter]            # Cursor only
mise run test:e2e --agent factoryai-droid [filter]   # Factory AI Droid only
mise run test:e2e --agent copilot-cli [filter]       # Copilot CLI only
go build ./...                                      # compile check (no agent CLI needed)

Do NOT run E2E tests proactively. They make real API calls that consume tokens and cost money. Only run when explicitly asked.

Structure

e2e/
├── agents/       # Agent abstraction (Agent interface, tmux sessions, concurrency gates)
├── bootstrap/    # CI pre-test setup (auth config, warmup)
├── entire/       # `entire` CLI wrapper (enable, rewind, etc.)
├── exploratory/  # Experimental tests, not run by CI
├── tests/        # Blessed test files (run by CI)
└── testutil/     # Repo setup, assertions, artifact capture

Key Patterns

Every test uses testutil.ForEachAgent which runs it per registered agent with repo setup, concurrency gating, and timeout scaling.
All operations go through RepoState (s.RunPrompt, s.Git) so they're logged to console.log.
Use the entire package for CLI interactions, not raw exec.Command.
Skip tests pending CLI fixes with t.Skip("ENT-XXX: reason").

Adding a New Agent

Create agents/<name>.go implementing the Agent interface.
Register it in init() with Register(&YourAgent{}).
Add a Bootstrap() method for any CI-specific setup (auth config, warmup).
Add a RegisterGate("<name>", N) call if concurrency needs limiting.
Ensure the agent name is accepted by mise run test:e2e --agent <name>.
Add the agent to .github/workflows/e2e.yml matrix and e2e-isolated.yml options.

Environment Variables

Variable	Description	Default
`E2E_AGENT`	Agent to test (`claude-code`, `gemini-cli`, `opencode`, `cursor`, `factoryai-droid`, `copilot-cli`)	all registered
`E2E_ENTIRE_BIN`	Path to a pre-built `entire` binary	builds from source
`E2E_TIMEOUT`	Timeout per prompt	`2m`
`E2E_KEEP_REPOS`	Set to `1` to preserve temp repos after test	unset
`E2E_ARTIFACT_DIR`	Override artifact output directory	`e2e/artifacts/<timestamp>`
`ANTHROPIC_API_KEY`	Required for Claude Code	—
`GEMINI_API_KEY`	Required for Gemini CLI	—
`COPILOT_GITHUB_TOKEN`	Required for Copilot CLI (or `gh auth login`)	—

Debugging Failures

Artifacts are captured to e2e/artifacts/ on every run (git-log, git-tree, console.log, checkpoint metadata, entire logs). Set E2E_KEEP_REPOS=1 to preserve the temp repo — a symlink appears in the artifact dir pointing to it.

Use the debug-e2e skill (.claude/skills/debug-e2e/) for a structured workflow when investigating failures.

Reading artifacts

console.log — full operation transcript including agent stdout/stderr
git-log.txt — commit history at time of failure
git-tree.txt — working tree state
entire-logs/ — internal CLI logs

Fixing flaky tests

When a test passes on retry but failed once, the problem is usually agent non-determinism, not a CLI bug. Common patterns:

Agent asked for confirmation instead of acting: The model output contains "Does this look right?" or "Should I proceed?". Fix: append "Do not ask for confirmation, just make the change." to the prompt.
Agent wrote to wrong path or created extra files: Fix: be more explicit about exact file paths and what not to do.
Agent committed when it shouldn't have: Fix: add "Do not commit" to the prompt.
Checkpoint wait timeout: WaitForCheckpoint or WaitForCheckpointAdvanceFrom exceeded deadline. Fix: increase the timeout argument.

To diagnose: read console.log in the failing test's artifact directory. Compare what the agent actually did vs what the test expected.

CI Workflows

.github/workflows/e2e.yml — Runs full suite on push to main. Matrix: [claude-code, opencode, gemini-cli, cursor-cli, factoryai-droid, copilot-cli].
.github/workflows/e2e-isolated.yml — Manual dispatch for debugging a single test. Inputs: agent + test name filter.

Both workflows run go run ./e2e/bootstrap before tests to handle agent-specific CI setup (auth config, warmup).

Directories ¶

Path	Synopsis
agents
bootstrap Package main provides a pre-test bootstrap command that runs agent-specific setup (auth config, warmup) before E2E tests.	Package main provides a pre-test bootstrap command that runs agent-specific setup (auth config, warmup) before E2E tests.
cmd
testreport command
entire
testutil
vogon vogon is a deterministic agent binary for E2E canary tests.	vogon is a deterministic agent binary for E2E canary tests.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL