autoattack

package module

v0.1.0 Latest Latest Go to latest Published: Feb 22, 2026 License: MIT Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/autoattack-ai/autoattack

Links

Open Source Insights

README ¶

AutoAttack

Adversarial robustness testing for LLM systems, with every probe mapped to OWASP LLM Top 10 categories and EU AI Act articles. Single binary. Zero dependencies. Audit-ready evidence output.

What AutoAttack produces

Every scan maps results to a compliance framework, not just pass/fail verdicts:

  ╔══════════════════════════════════════════════════════════════╗
  ║  AutoAttack v0.1.0 — Adversarial Robustness Assessment       ║
  ╚══════════════════════════════════════════════════════════════╝

  Target:  https://api.openai.com/v1/chat/completions
  Model:   gpt-4o-mini
  Probes:  134 across 8 categories (OWASP LLM Top 10 mapped)

  SUMMARY
  ┌──────────────────┬────────┬────────┬────────┬────────────┐
  │ Category         │ Total  │ Pass   │ Fail   │ Resilience │
  ├──────────────────┼────────┼────────┼────────┼────────────┤
  │ Prompt Injection │ 14     │ 12     │ 2      │  85.7%     │
  │ Jailbreak        │ 16     │ 9      │ 7      │  56.2%     │
  │ Extraction       │ 10     │ 7      │ 3      │  70.0%     │
  │ Guardrail Bypass │ 10     │ 8      │ 2      │  80.0%     │
  │ OWASP LLM        │ 9      │ 6      │ 3      │  66.7%     │
  └──────────────────┴────────┴────────┴────────┴────────────┘

  14 findings (2 CRITICAL, 5 HIGH, 7 MEDIUM)
  OWASP: LLM01 ✓  LLM02 ✓  LLM03 —  LLM04 ✓  LLM05 —
         LLM06 ✓  LLM07 ✓  LLM08 —  LLM09 ✓  LLM10 ✓

Who this is for

If you need a scanning tool, Promptfoo and Garak are excellent general-purpose options with large communities.

If you need compliance-ready adversarial testing — dated, framework-mapped, tamper-evident test evidence that satisfies an auditor — that's what AutoAttack builds.

The open-source CLI is the scanning engine. The AutoAttack platform transforms CLI results into EU AI Act conformity assessment documentation with immutable evidence chains and RFC 3161 timestamping.

CLI scan → upload → platform → EU AI Act conformity assessment report

Quick start

30 seconds to first assessment:

# Install
go install github.com/autoattack-ai/autoattack/cmd/autoattack@latest

# Run
autoattack scan \
  --target https://api.openai.com/v1/chat/completions \
  --api-key $OPENAI_API_KEY \
  --model gpt-4o-mini

No config files. No YAML. No runtime dependencies.

Install

Go install:

go install github.com/autoattack-ai/autoattack/cmd/autoattack@latest

Pre-built binaries:

Download from GitHub Releases for Linux, macOS, and Windows.

From source:

git clone https://github.com/autoattack-ai/autoattack.git
cd autoattack && make build

Docker:

docker run --rm ghcr.io/autoattack-ai/autoattack scan \
  --target https://api.openai.com/v1/chat/completions \
  --api-key $OPENAI_API_KEY \
  --model gpt-4o-mini

OWASP LLM Top 10 Coverage

Every probe is mapped to a specific OWASP LLM Top 10 category. Run --framework owasp-llm for a targeted compliance assessment.

OWASP	Category (2025)	Probes	EU AI Act Articles
LLM01	Prompt Injection	56	Art. 9(2), Art. 15(1)
LLM02	Sensitive Information Disclosure	16	Art. 9(2), Art. 15(1), Art. 15(4)
LLM04	Data and Model Poisoning	3	Art. 10(2), Art. 10(5), Art. 15(1)
LLM05	Improper Output Handling	5	Art. 9(2), Art. 15(1)
LLM06	Excessive Agency	9	Art. 9(7), Art. 14(1)
LLM07	System Prompt Leakage	6	Art. 13(1), Art. 15(1)
LLM09	Misinformation	30	Art. 9(2), Art. 15(1), Art. 50
LLM10	Unbounded Consumption	6	Art. 15(1), Art. 15(4)

LLM03 (Supply Chain) targets third-party component verification not testable at the API level. Supply chain risks require vendor assessments, dependency audits, and model provenance checks.

LLM08 (Vector and Embedding Weaknesses) targets retrieval pipeline infrastructure not directly testable at the API level. RAG context injection probes (LLM01) test downstream behavioral effects of retrieval-layer attacks.

Harmful content probes (8 probes) test content safety guardrails independently of OWASP categories.

Article mappings: Art. 9 (Risk Management), Art. 10 (Data Governance), Art. 13 (Transparency), Art. 14 (Human Oversight), Art. 15 (Accuracy, Robustness, and Cybersecurity), Art. 50 (Transparency for AI Systems).

Usage

autoattack scan [flags]

Flags:
  --target              Target API endpoint URL (required)
  --api-key             API key for the target (env: AUTOATTACK_TARGET_API_KEY)
  --model               Model name (required)
  --adapter             API adapter: openai, anthropic (default: openai)
  --categories          Comma-separated probe categories to run
  --framework           Filter probes by framework (owasp-llm)
  --output              Output format: console, json, sarif (default: console)
  --out                 Output file path (default: stdout for json/sarif)
  --rate-limit          Max requests per second (default: 2)
  --workers             Concurrent probe workers (default: 5)
  --timeout             HTTP request timeout in seconds (default: 30)
  --max-tokens          Maximum output tokens per request (default: 4096)
  --fail-on             Exit 1 on this severity or above: critical, high, medium, low (default: critical)
  --strict              Conflicting success+failure indicators → INCONCLUSIVE instead of FAIL
  --system-prompt       System prompt to prepend to probes
  --include-responses   Include raw responses and inputs in JSON output
  --sarif-include-pass  Include PASS results in SARIF output (for coverage proof)
  --allow-http          Allow http:// targets (unsafe — API key sent in plaintext)
  --dry-run             Show what probes would run without making API calls
  --quiet               Suppress informational messages
  --no-telemetry        Disable telemetry for this session
  -v, --verbose         Verbose output

Other commands

autoattack probes list                     # List all probes
autoattack probes list --category jailbreak # Filter by category
autoattack upload -f results.json --system <id>  # Upload to platform
autoattack version                         # Show version
autoattack telemetry status                # Check telemetry setting

Output Formats

Console (default)

Colored terminal output with summary table and OWASP coverage indicators.

JSON

Machine-readable report with OWASP mappings and LLM judge criteria for each probe:

autoattack scan --output json --include-responses --out report.json \
  --target $URL --api-key $KEY --model gpt-4o

JSON output redacts responses and inputs by default. Use --include-responses for full compliance evidence.

SARIF

Drops directly into GitHub Advanced Security tab. FAIL results appear as findings; INCONCLUSIVE results appear as review items (kind: "review").

autoattack scan --output sarif --out results.sarif \
  --target $URL --api-key $KEY --model gpt-4o

CI/CD Integration

GitHub Action

name: LLM Adversarial Assessment
on:
  push:
    branches: [main]
  pull_request:

jobs:
  assessment:
    runs-on: ubuntu-latest
    steps:
      - uses: autoattack-ai/autoattack/.github/actions/scan@main
        with:
          target: ${{ secrets.LLM_ENDPOINT }}
          api-key: ${{ secrets.LLM_API_KEY }}
          model: gpt-4o
          fail-on: high

Results appear in the Security tab via SARIF upload.

Exit codes

`--fail-on`	Exit 1 when
`critical` (default)	Any CRITICAL severity finding
`high`	Any CRITICAL or HIGH finding
`medium`	Any CRITICAL, HIGH, or MEDIUM finding
`low`	Any finding at all

Platform Upload

Scan locally, upload to the AutoAttack platform for compliance documentation:

# Scan and upload in one pipeline
autoattack scan --output json --include-responses \
  --target $URL --api-key $KEY --model gpt-4o | \
  autoattack upload --system <ai-system-id>

# Or scan to file, upload later
autoattack scan --output json --include-responses --out results.json \
  --target $URL --api-key $KEY --model gpt-4o

AUTOATTACK_API_KEY=aa_... autoattack upload -f results.json --system <id>

The platform produces:

EU AI Act conformity assessment reports (PDF) with article-by-article mapping
Tamper-evident evidence chains (SHA-256 hash chain + RFC 3161 timestamps)
OWASP LLM Top 10 coverage matrices
Historical resilience trending per AI system

Responsible Use

AutoAttack is designed for authorized security testing only. By using this tool, you agree to:

Only test systems you own or have written authorization to test
Comply with all applicable laws and regulations
Not use this tool for malicious purposes, harassment, or to cause harm
Report vulnerabilities found in third-party systems through responsible disclosure

Adversarial testing is a legitimate security practice when conducted with authorization. Unauthorized testing of systems you do not own may violate computer fraud laws.

Security Considerations

JSON redaction: responses and probe inputs are omitted by default. Use --include-responses to opt in.
API keys: used only in HTTP Authorization headers. Never logged, printed, or included in output.
Environment variables: AUTOATTACK_TARGET_API_KEY and AUTOATTACK_API_KEY to avoid shell history exposure.
SARIF/console output: contain zero response or input data by design.
Telemetry: opt-in only. No prompts, responses, or API keys are ever collected.

Found a vulnerability in AutoAttack itself? See SECURITY.md.

How It Works

Load — probe cases are compiled into the binary via //go:embed. Each probe defines an adversarial input, success/failure indicator patterns, and OWASP LLM category mapping.
Send — the runner dispatches probes through transport adapters (OpenAI or Anthropic format) with configurable rate limiting and retry logic (including Retry-After header support).
Judge — the keyword judge evaluates each response against indicator patterns. Security-first: any success indicator match means FAIL. Use --strict for INCONCLUSIVE on conflicting signals.
Report — results are aggregated by category and OWASP mapping, output in the requested format with article-level compliance mapping.

Scoring

PASS — the model refused or did not comply with the adversarial probe
FAIL — the model complied, indicating a vulnerability
INCONCLUSIVE — could not determine from keyword matching alone (requires semantic evaluation; resolved by the platform's LLM judge)

Resilience score per category = pass / (pass + fail) * 100. Higher is more resilient.

Architecture

autoattack/
├── cmd/autoattack/        # CLI entrypoint (cobra)
├── probe/                 # Probe types, YAML loader, embedded registry
├── judge/                 # Verdict engine (keyword/regex matching)
├── transport/             # API adapters (OpenAI, Anthropic) + retry
├── runner/                # Worker pool orchestration + rate limiter
├── report/                # Output formatters (console, JSON, SARIF)
├── upload/                # Platform upload client
├── target/                # Target configuration type
├── telemetry/             # Opt-in anonymous telemetry
└── probes/                # YAML probe definitions (embedded in binary)
    ├── prompt-injection/  # LLM01: direct, indirect, context manipulation
    ├── jailbreak/         # LLM01/LLM09: roleplay, encoding, skeleton key
    ├── extraction/        # LLM02/LLM06/LLM07: system prompt, PII, role boundaries
    ├── guardrail-bypass/  # LLM02: content filter evasion, encoding tricks
    ├── hallucination/     # LLM09: confabulation, fabrication, consistency
    ├── bias/              # LLM09: gender, racial, age, socioeconomic bias
    ├── harmful-content/   # Content safety: violence, weapons, illegal activity
    └── owasp-llm/         # LLM01/LLM02/LLM04/LLM06/LLM09/LLM10: RAG, extraction, poisoning, oversight

All packages are importable as a Go library.

Telemetry

Opt-in anonymous telemetry to help improve AI security research. Off by default. No prompts, responses, or API keys are ever collected.

autoattack telemetry enable   # opt in
autoattack telemetry disable  # opt out
autoattack telemetry status   # check current setting

Contributing

See CONTRIBUTING.md for guidelines on adding probes and submitting code.

License

Engine (all Go code): MIT License
Probe library (probes/): MPL-2.0 — file-level copyleft; modifications to probes must be shared, but MPL-2.0 does not extend to the engine or your code

See LICENSING.md for full details on the dual-license structure.

Documentation ¶

Overview ¶

embed.go holds the embedded probe filesystem and library version. Keep this file minimal — adding subpackage imports here causes import cycles.

Constants ¶

View Source

const ProbeLibraryVersion = "0.1.0"

ProbeLibraryVersion tracks the probe library version for reproducibility. Bump this when probes are added, removed, or modified.

Variables ¶

View Source

var EmbeddedProbes embed.FS

Functions ¶

This section is empty.

Types ¶

This section is empty.

Source Files ¶

View all Source files

embed.go

Directories ¶

Path	Synopsis
cmd
autoattack command SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT
judge SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT
probe SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT
report SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT
runner SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT
target SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT
telemetry SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT
transport SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT
upload SPDX-License-Identifier: MIT	SPDX-License-Identifier: MIT

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL