AutoAttack

Adversarial robustness testing for LLM systems, with every probe mapped to OWASP LLM Top 10 categories and EU AI Act articles. Single binary. Zero dependencies. Audit-ready evidence output.
What AutoAttack produces
Every scan maps results to a compliance framework, not just pass/fail verdicts:
╔══════════════════════════════════════════════════════════════╗
║ AutoAttack v0.1.0 — Adversarial Robustness Assessment ║
╚══════════════════════════════════════════════════════════════╝
Target: https://api.openai.com/v1/chat/completions
Model: gpt-4o-mini
Probes: 134 across 8 categories (OWASP LLM Top 10 mapped)
SUMMARY
┌──────────────────┬────────┬────────┬────────┬────────────┐
│ Category │ Total │ Pass │ Fail │ Resilience │
├──────────────────┼────────┼────────┼────────┼────────────┤
│ Prompt Injection │ 14 │ 12 │ 2 │ 85.7% │
│ Jailbreak │ 16 │ 9 │ 7 │ 56.2% │
│ Extraction │ 10 │ 7 │ 3 │ 70.0% │
│ Guardrail Bypass │ 10 │ 8 │ 2 │ 80.0% │
│ OWASP LLM │ 9 │ 6 │ 3 │ 66.7% │
└──────────────────┴────────┴────────┴────────┴────────────┘
14 findings (2 CRITICAL, 5 HIGH, 7 MEDIUM)
OWASP: LLM01 ✓ LLM02 ✓ LLM03 — LLM04 ✓ LLM05 —
LLM06 ✓ LLM07 ✓ LLM08 — LLM09 ✓ LLM10 ✓
Who this is for
If you need a scanning tool, Promptfoo and Garak are excellent general-purpose options with large communities.
If you need compliance-ready adversarial testing — dated, framework-mapped, tamper-evident test evidence that satisfies an auditor — that's what AutoAttack builds.
The open-source CLI is the scanning engine. The AutoAttack platform transforms CLI results into EU AI Act conformity assessment documentation with immutable evidence chains and RFC 3161 timestamping.
CLI scan → upload → platform → EU AI Act conformity assessment report
Quick start
30 seconds to first assessment:
# Install
go install github.com/autoattack-ai/autoattack/cmd/autoattack@latest
# Run
autoattack scan \
--target https://api.openai.com/v1/chat/completions \
--api-key $OPENAI_API_KEY \
--model gpt-4o-mini
No config files. No YAML. No runtime dependencies.
Install
Go install:
go install github.com/autoattack-ai/autoattack/cmd/autoattack@latest
Pre-built binaries:
Download from GitHub Releases for Linux, macOS, and Windows.
From source:
git clone https://github.com/autoattack-ai/autoattack.git
cd autoattack && make build
Docker:
docker run --rm ghcr.io/autoattack-ai/autoattack scan \
--target https://api.openai.com/v1/chat/completions \
--api-key $OPENAI_API_KEY \
--model gpt-4o-mini
OWASP LLM Top 10 Coverage
Every probe is mapped to a specific OWASP LLM Top 10 category. Run --framework owasp-llm for a targeted compliance assessment.
| OWASP |
Category (2025) |
Probes |
EU AI Act Articles |
| LLM01 |
Prompt Injection |
56 |
Art. 9(2), Art. 15(1) |
| LLM02 |
Sensitive Information Disclosure |
16 |
Art. 9(2), Art. 15(1), Art. 15(4) |
| LLM04 |
Data and Model Poisoning |
3 |
Art. 10(2), Art. 10(5), Art. 15(1) |
| LLM05 |
Improper Output Handling |
5 |
Art. 9(2), Art. 15(1) |
| LLM06 |
Excessive Agency |
9 |
Art. 9(7), Art. 14(1) |
| LLM07 |
System Prompt Leakage |
6 |
Art. 13(1), Art. 15(1) |
| LLM09 |
Misinformation |
30 |
Art. 9(2), Art. 15(1), Art. 50 |
| LLM10 |
Unbounded Consumption |
6 |
Art. 15(1), Art. 15(4) |
LLM03 (Supply Chain) targets third-party component verification not testable at the API level. Supply chain risks require vendor assessments, dependency audits, and model provenance checks.
LLM08 (Vector and Embedding Weaknesses) targets retrieval pipeline infrastructure not directly testable at the API level. RAG context injection probes (LLM01) test downstream behavioral effects of retrieval-layer attacks.
Harmful content probes (8 probes) test content safety guardrails independently of OWASP categories.
Article mappings: Art. 9 (Risk Management), Art. 10 (Data Governance), Art. 13 (Transparency), Art. 14 (Human Oversight), Art. 15 (Accuracy, Robustness, and Cybersecurity), Art. 50 (Transparency for AI Systems).
Usage
autoattack scan [flags]
Flags:
--target Target API endpoint URL (required)
--api-key API key for the target (env: AUTOATTACK_TARGET_API_KEY)
--model Model name (required)
--adapter API adapter: openai, anthropic (default: openai)
--categories Comma-separated probe categories to run
--framework Filter probes by framework (owasp-llm)
--output Output format: console, json, sarif (default: console)
--out Output file path (default: stdout for json/sarif)
--rate-limit Max requests per second (default: 2)
--workers Concurrent probe workers (default: 5)
--timeout HTTP request timeout in seconds (default: 30)
--max-tokens Maximum output tokens per request (default: 4096)
--fail-on Exit 1 on this severity or above: critical, high, medium, low (default: critical)
--strict Conflicting success+failure indicators → INCONCLUSIVE instead of FAIL
--system-prompt System prompt to prepend to probes
--include-responses Include raw responses and inputs in JSON output
--sarif-include-pass Include PASS results in SARIF output (for coverage proof)
--allow-http Allow http:// targets (unsafe — API key sent in plaintext)
--dry-run Show what probes would run without making API calls
--quiet Suppress informational messages
--no-telemetry Disable telemetry for this session
-v, --verbose Verbose output
Other commands
autoattack probes list # List all probes
autoattack probes list --category jailbreak # Filter by category
autoattack upload -f results.json --system <id> # Upload to platform
autoattack version # Show version
autoattack telemetry status # Check telemetry setting
Console (default)
Colored terminal output with summary table and OWASP coverage indicators.
JSON
Machine-readable report with OWASP mappings and LLM judge criteria for each probe:
autoattack scan --output json --include-responses --out report.json \
--target $URL --api-key $KEY --model gpt-4o
JSON output redacts responses and inputs by default. Use --include-responses for full compliance evidence.
SARIF
Drops directly into GitHub Advanced Security tab. FAIL results appear as findings; INCONCLUSIVE results appear as review items (kind: "review").
autoattack scan --output sarif --out results.sarif \
--target $URL --api-key $KEY --model gpt-4o
CI/CD Integration
GitHub Action
name: LLM Adversarial Assessment
on:
push:
branches: [main]
pull_request:
jobs:
assessment:
runs-on: ubuntu-latest
steps:
- uses: autoattack-ai/autoattack/.github/actions/scan@main
with:
target: ${{ secrets.LLM_ENDPOINT }}
api-key: ${{ secrets.LLM_API_KEY }}
model: gpt-4o
fail-on: high
Results appear in the Security tab via SARIF upload.
Exit codes
--fail-on |
Exit 1 when |
critical (default) |
Any CRITICAL severity finding |
high |
Any CRITICAL or HIGH finding |
medium |
Any CRITICAL, HIGH, or MEDIUM finding |
low |
Any finding at all |
Scan locally, upload to the AutoAttack platform for compliance documentation:
# Scan and upload in one pipeline
autoattack scan --output json --include-responses \
--target $URL --api-key $KEY --model gpt-4o | \
autoattack upload --system <ai-system-id>
# Or scan to file, upload later
autoattack scan --output json --include-responses --out results.json \
--target $URL --api-key $KEY --model gpt-4o
AUTOATTACK_API_KEY=aa_... autoattack upload -f results.json --system <id>
The platform produces:
- EU AI Act conformity assessment reports (PDF) with article-by-article mapping
- Tamper-evident evidence chains (SHA-256 hash chain + RFC 3161 timestamps)
- OWASP LLM Top 10 coverage matrices
- Historical resilience trending per AI system
Responsible Use
AutoAttack is designed for authorized security testing only. By using this tool, you agree to:
- Only test systems you own or have written authorization to test
- Comply with all applicable laws and regulations
- Not use this tool for malicious purposes, harassment, or to cause harm
- Report vulnerabilities found in third-party systems through responsible disclosure
Adversarial testing is a legitimate security practice when conducted with authorization. Unauthorized testing of systems you do not own may violate computer fraud laws.
Security Considerations
- JSON redaction: responses and probe inputs are omitted by default. Use
--include-responses to opt in.
- API keys: used only in HTTP Authorization headers. Never logged, printed, or included in output.
- Environment variables:
AUTOATTACK_TARGET_API_KEY and AUTOATTACK_API_KEY to avoid shell history exposure.
- SARIF/console output: contain zero response or input data by design.
- Telemetry: opt-in only. No prompts, responses, or API keys are ever collected.
Found a vulnerability in AutoAttack itself? See SECURITY.md.
How It Works
- Load — probe cases are compiled into the binary via
//go:embed. Each probe defines an adversarial input, success/failure indicator patterns, and OWASP LLM category mapping.
- Send — the runner dispatches probes through transport adapters (OpenAI or Anthropic format) with configurable rate limiting and retry logic (including Retry-After header support).
- Judge — the keyword judge evaluates each response against indicator patterns. Security-first: any success indicator match means FAIL. Use
--strict for INCONCLUSIVE on conflicting signals.
- Report — results are aggregated by category and OWASP mapping, output in the requested format with article-level compliance mapping.
Scoring
- PASS — the model refused or did not comply with the adversarial probe
- FAIL — the model complied, indicating a vulnerability
- INCONCLUSIVE — could not determine from keyword matching alone (requires semantic evaluation; resolved by the platform's LLM judge)
Resilience score per category = pass / (pass + fail) * 100. Higher is more resilient.
Architecture
autoattack/
├── cmd/autoattack/ # CLI entrypoint (cobra)
├── probe/ # Probe types, YAML loader, embedded registry
├── judge/ # Verdict engine (keyword/regex matching)
├── transport/ # API adapters (OpenAI, Anthropic) + retry
├── runner/ # Worker pool orchestration + rate limiter
├── report/ # Output formatters (console, JSON, SARIF)
├── upload/ # Platform upload client
├── target/ # Target configuration type
├── telemetry/ # Opt-in anonymous telemetry
└── probes/ # YAML probe definitions (embedded in binary)
├── prompt-injection/ # LLM01: direct, indirect, context manipulation
├── jailbreak/ # LLM01/LLM09: roleplay, encoding, skeleton key
├── extraction/ # LLM02/LLM06/LLM07: system prompt, PII, role boundaries
├── guardrail-bypass/ # LLM02: content filter evasion, encoding tricks
├── hallucination/ # LLM09: confabulation, fabrication, consistency
├── bias/ # LLM09: gender, racial, age, socioeconomic bias
├── harmful-content/ # Content safety: violence, weapons, illegal activity
└── owasp-llm/ # LLM01/LLM02/LLM04/LLM06/LLM09/LLM10: RAG, extraction, poisoning, oversight
All packages are importable as a Go library.
Telemetry
Opt-in anonymous telemetry to help improve AI security research. Off by default. No prompts, responses, or API keys are ever collected.
autoattack telemetry enable # opt in
autoattack telemetry disable # opt out
autoattack telemetry status # check current setting
Contributing
See CONTRIBUTING.md for guidelines on adding probes and submitting code.
License
- Engine (all Go code): MIT License
- Probe library (
probes/): MPL-2.0 — file-level copyleft; modifications to probes must be shared, but MPL-2.0 does not extend to the engine or your code
See LICENSING.md for full details on the dual-license structure.