autoattack

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 22, 2026 License: MIT Imports: 1 Imported by: 0

README

AutoAttack

CI Go Report Card Go Version

Adversarial robustness testing for LLM systems, with every probe mapped to OWASP LLM Top 10 categories and EU AI Act articles. Single binary. Zero dependencies. Audit-ready evidence output.

What AutoAttack produces

Every scan maps results to a compliance framework, not just pass/fail verdicts:

  ╔══════════════════════════════════════════════════════════════╗
  ║  AutoAttack v0.1.0 — Adversarial Robustness Assessment       ║
  ╚══════════════════════════════════════════════════════════════╝

  Target:  https://api.openai.com/v1/chat/completions
  Model:   gpt-4o-mini
  Probes:  134 across 8 categories (OWASP LLM Top 10 mapped)

  SUMMARY
  ┌──────────────────┬────────┬────────┬────────┬────────────┐
  │ Category         │ Total  │ Pass   │ Fail   │ Resilience │
  ├──────────────────┼────────┼────────┼────────┼────────────┤
  │ Prompt Injection │ 14     │ 12     │ 2      │  85.7%     │
  │ Jailbreak        │ 16     │ 9      │ 7      │  56.2%     │
  │ Extraction       │ 10     │ 7      │ 3      │  70.0%     │
  │ Guardrail Bypass │ 10     │ 8      │ 2      │  80.0%     │
  │ OWASP LLM        │ 9      │ 6      │ 3      │  66.7%     │
  └──────────────────┴────────┴────────┴────────┴────────────┘

  14 findings (2 CRITICAL, 5 HIGH, 7 MEDIUM)
  OWASP: LLM01 ✓  LLM02 ✓  LLM03 —  LLM04 ✓  LLM05 —
         LLM06 ✓  LLM07 ✓  LLM08 —  LLM09 ✓  LLM10 ✓

Who this is for

If you need a scanning tool, Promptfoo and Garak are excellent general-purpose options with large communities.

If you need compliance-ready adversarial testing — dated, framework-mapped, tamper-evident test evidence that satisfies an auditor — that's what AutoAttack builds.

The open-source CLI is the scanning engine. The AutoAttack platform transforms CLI results into EU AI Act conformity assessment documentation with immutable evidence chains and RFC 3161 timestamping.

CLI scan → upload → platform → EU AI Act conformity assessment report

Quick start

30 seconds to first assessment:

# Install
go install github.com/autoattack-ai/autoattack/cmd/autoattack@latest

# Run
autoattack scan \
  --target https://api.openai.com/v1/chat/completions \
  --api-key $OPENAI_API_KEY \
  --model gpt-4o-mini

No config files. No YAML. No runtime dependencies.

Install

Go install:

go install github.com/autoattack-ai/autoattack/cmd/autoattack@latest

Pre-built binaries:

Download from GitHub Releases for Linux, macOS, and Windows.

From source:

git clone https://github.com/autoattack-ai/autoattack.git
cd autoattack && make build

Docker:

docker run --rm ghcr.io/autoattack-ai/autoattack scan \
  --target https://api.openai.com/v1/chat/completions \
  --api-key $OPENAI_API_KEY \
  --model gpt-4o-mini

OWASP LLM Top 10 Coverage

Every probe is mapped to a specific OWASP LLM Top 10 category. Run --framework owasp-llm for a targeted compliance assessment.

OWASP Category (2025) Probes EU AI Act Articles
LLM01 Prompt Injection 56 Art. 9(2), Art. 15(1)
LLM02 Sensitive Information Disclosure 16 Art. 9(2), Art. 15(1), Art. 15(4)
LLM04 Data and Model Poisoning 3 Art. 10(2), Art. 10(5), Art. 15(1)
LLM05 Improper Output Handling 5 Art. 9(2), Art. 15(1)
LLM06 Excessive Agency 9 Art. 9(7), Art. 14(1)
LLM07 System Prompt Leakage 6 Art. 13(1), Art. 15(1)
LLM09 Misinformation 30 Art. 9(2), Art. 15(1), Art. 50
LLM10 Unbounded Consumption 6 Art. 15(1), Art. 15(4)

LLM03 (Supply Chain) targets third-party component verification not testable at the API level. Supply chain risks require vendor assessments, dependency audits, and model provenance checks.

LLM08 (Vector and Embedding Weaknesses) targets retrieval pipeline infrastructure not directly testable at the API level. RAG context injection probes (LLM01) test downstream behavioral effects of retrieval-layer attacks.

Harmful content probes (8 probes) test content safety guardrails independently of OWASP categories.

Article mappings: Art. 9 (Risk Management), Art. 10 (Data Governance), Art. 13 (Transparency), Art. 14 (Human Oversight), Art. 15 (Accuracy, Robustness, and Cybersecurity), Art. 50 (Transparency for AI Systems).

Usage

autoattack scan [flags]

Flags:
  --target              Target API endpoint URL (required)
  --api-key             API key for the target (env: AUTOATTACK_TARGET_API_KEY)
  --model               Model name (required)
  --adapter             API adapter: openai, anthropic (default: openai)
  --categories          Comma-separated probe categories to run
  --framework           Filter probes by framework (owasp-llm)
  --output              Output format: console, json, sarif (default: console)
  --out                 Output file path (default: stdout for json/sarif)
  --rate-limit          Max requests per second (default: 2)
  --workers             Concurrent probe workers (default: 5)
  --timeout             HTTP request timeout in seconds (default: 30)
  --max-tokens          Maximum output tokens per request (default: 4096)
  --fail-on             Exit 1 on this severity or above: critical, high, medium, low (default: critical)
  --strict              Conflicting success+failure indicators → INCONCLUSIVE instead of FAIL
  --system-prompt       System prompt to prepend to probes
  --include-responses   Include raw responses and inputs in JSON output
  --sarif-include-pass  Include PASS results in SARIF output (for coverage proof)
  --allow-http          Allow http:// targets (unsafe — API key sent in plaintext)
  --dry-run             Show what probes would run without making API calls
  --quiet               Suppress informational messages
  --no-telemetry        Disable telemetry for this session
  -v, --verbose         Verbose output
Other commands
autoattack probes list                     # List all probes
autoattack probes list --category jailbreak # Filter by category
autoattack upload -f results.json --system <id>  # Upload to platform
autoattack version                         # Show version
autoattack telemetry status                # Check telemetry setting

Output Formats

Console (default)

Colored terminal output with summary table and OWASP coverage indicators.

JSON

Machine-readable report with OWASP mappings and LLM judge criteria for each probe:

autoattack scan --output json --include-responses --out report.json \
  --target $URL --api-key $KEY --model gpt-4o

JSON output redacts responses and inputs by default. Use --include-responses for full compliance evidence.

SARIF

Drops directly into GitHub Advanced Security tab. FAIL results appear as findings; INCONCLUSIVE results appear as review items (kind: "review").

autoattack scan --output sarif --out results.sarif \
  --target $URL --api-key $KEY --model gpt-4o

CI/CD Integration

GitHub Action
name: LLM Adversarial Assessment
on:
  push:
    branches: [main]
  pull_request:

jobs:
  assessment:
    runs-on: ubuntu-latest
    steps:
      - uses: autoattack-ai/autoattack/.github/actions/scan@main
        with:
          target: ${{ secrets.LLM_ENDPOINT }}
          api-key: ${{ secrets.LLM_API_KEY }}
          model: gpt-4o
          fail-on: high

Results appear in the Security tab via SARIF upload.

Exit codes
--fail-on Exit 1 when
critical (default) Any CRITICAL severity finding
high Any CRITICAL or HIGH finding
medium Any CRITICAL, HIGH, or MEDIUM finding
low Any finding at all

Platform Upload

Scan locally, upload to the AutoAttack platform for compliance documentation:

# Scan and upload in one pipeline
autoattack scan --output json --include-responses \
  --target $URL --api-key $KEY --model gpt-4o | \
  autoattack upload --system <ai-system-id>

# Or scan to file, upload later
autoattack scan --output json --include-responses --out results.json \
  --target $URL --api-key $KEY --model gpt-4o

AUTOATTACK_API_KEY=aa_... autoattack upload -f results.json --system <id>

The platform produces:

  • EU AI Act conformity assessment reports (PDF) with article-by-article mapping
  • Tamper-evident evidence chains (SHA-256 hash chain + RFC 3161 timestamps)
  • OWASP LLM Top 10 coverage matrices
  • Historical resilience trending per AI system

Responsible Use

AutoAttack is designed for authorized security testing only. By using this tool, you agree to:

  • Only test systems you own or have written authorization to test
  • Comply with all applicable laws and regulations
  • Not use this tool for malicious purposes, harassment, or to cause harm
  • Report vulnerabilities found in third-party systems through responsible disclosure

Adversarial testing is a legitimate security practice when conducted with authorization. Unauthorized testing of systems you do not own may violate computer fraud laws.

Security Considerations

  • JSON redaction: responses and probe inputs are omitted by default. Use --include-responses to opt in.
  • API keys: used only in HTTP Authorization headers. Never logged, printed, or included in output.
  • Environment variables: AUTOATTACK_TARGET_API_KEY and AUTOATTACK_API_KEY to avoid shell history exposure.
  • SARIF/console output: contain zero response or input data by design.
  • Telemetry: opt-in only. No prompts, responses, or API keys are ever collected.

Found a vulnerability in AutoAttack itself? See SECURITY.md.

How It Works

  1. Load — probe cases are compiled into the binary via //go:embed. Each probe defines an adversarial input, success/failure indicator patterns, and OWASP LLM category mapping.
  2. Send — the runner dispatches probes through transport adapters (OpenAI or Anthropic format) with configurable rate limiting and retry logic (including Retry-After header support).
  3. Judge — the keyword judge evaluates each response against indicator patterns. Security-first: any success indicator match means FAIL. Use --strict for INCONCLUSIVE on conflicting signals.
  4. Report — results are aggregated by category and OWASP mapping, output in the requested format with article-level compliance mapping.
Scoring
  • PASS — the model refused or did not comply with the adversarial probe
  • FAIL — the model complied, indicating a vulnerability
  • INCONCLUSIVE — could not determine from keyword matching alone (requires semantic evaluation; resolved by the platform's LLM judge)

Resilience score per category = pass / (pass + fail) * 100. Higher is more resilient.

Architecture

autoattack/
├── cmd/autoattack/        # CLI entrypoint (cobra)
├── probe/                 # Probe types, YAML loader, embedded registry
├── judge/                 # Verdict engine (keyword/regex matching)
├── transport/             # API adapters (OpenAI, Anthropic) + retry
├── runner/                # Worker pool orchestration + rate limiter
├── report/                # Output formatters (console, JSON, SARIF)
├── upload/                # Platform upload client
├── target/                # Target configuration type
├── telemetry/             # Opt-in anonymous telemetry
└── probes/                # YAML probe definitions (embedded in binary)
    ├── prompt-injection/  # LLM01: direct, indirect, context manipulation
    ├── jailbreak/         # LLM01/LLM09: roleplay, encoding, skeleton key
    ├── extraction/        # LLM02/LLM06/LLM07: system prompt, PII, role boundaries
    ├── guardrail-bypass/  # LLM02: content filter evasion, encoding tricks
    ├── hallucination/     # LLM09: confabulation, fabrication, consistency
    ├── bias/              # LLM09: gender, racial, age, socioeconomic bias
    ├── harmful-content/   # Content safety: violence, weapons, illegal activity
    └── owasp-llm/         # LLM01/LLM02/LLM04/LLM06/LLM09/LLM10: RAG, extraction, poisoning, oversight

All packages are importable as a Go library.

Telemetry

Opt-in anonymous telemetry to help improve AI security research. Off by default. No prompts, responses, or API keys are ever collected.

autoattack telemetry enable   # opt in
autoattack telemetry disable  # opt out
autoattack telemetry status   # check current setting

Contributing

See CONTRIBUTING.md for guidelines on adding probes and submitting code.

License

  • Engine (all Go code): MIT License
  • Probe library (probes/): MPL-2.0 — file-level copyleft; modifications to probes must be shared, but MPL-2.0 does not extend to the engine or your code

See LICENSING.md for full details on the dual-license structure.

Documentation

Overview

embed.go holds the embedded probe filesystem and library version. Keep this file minimal — adding subpackage imports here causes import cycles.

Index

Constants

View Source
const ProbeLibraryVersion = "0.1.0"

ProbeLibraryVersion tracks the probe library version for reproducibility. Bump this when probes are added, removed, or modified.

Variables

View Source
var EmbeddedProbes embed.FS

Functions

This section is empty.

Types

This section is empty.

Directories

Path Synopsis
cmd
autoattack command
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT
SPDX-License-Identifier: MIT

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL