contentguard

package
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2026 License: Apache-2.0 Imports: 14 Imported by: 0

README

contentguard

Content trust verification for agent tool calls. Protects against prompt injection by tracking content with trust metadata and verifying tool calls through a staged pipeline.

Usage

guard, err := contentguard.New(
    []contentguard.Stage{
        contentguard.NewScreener(cheapModel),
        contentguard.NewReviewer(capableModel),
    },
    contentguard.Escalatory(),
    contentguard.Config{
        Context:  map[string]string{"scope": "authorized pentest of lab network"},
        Patterns: []string{"exfil:send.*external"},
        Keywords: []string{"custom_secret"},
        Skip:     []string{"read", "list_files"},
    },
)
defer guard.Close()

// Track content as it enters the system
guard.Ingest(contentguard.Untrusted, contentguard.Data, true, html, "web_fetch")

// Track derived content with lineage
guard.IngestWithLineage(contentguard.Untrusted, contentguard.Data, true, derived, "llm:response", []string{parentID})

// Verify a tool call
result, err := guard.Check(ctx, "bash", args, originalGoal)
switch result.Verdict {
case contentguard.Allow:  // proceed
case contentguard.Deny:   // blocked — result.Rationale explains why
case contentguard.Modify: // blocked — result.Rationale has the suggested alternative
}

How It Works

  1. Deterministic check (built-in, always runs) — detects untrusted content, pattern matches, keyword scanning
  2. Configurable stages — run through the pipeline per the chosen workflow

Workflows

Workflow Behavior
Escalatory() Stop on first allow/deny/modify. Only escalate passes to next stage.
Paranoid() ALL stages must run. Deny if ANY denies. Allow only if all pass.

Stages

Stages implement the Stage interface:

type Stage interface {
    Evaluate(ctx context.Context, req Request) (*Finding, error)
}

Built-in stages:

  • NewScreener(model) — quick LLM triage (YES/NO)
  • NewReviewer(model) — full LLM review (ALLOW/DENY/MODIFY)

Custom stages (rule engine, human approval, etc.) implement the same interface.

Context

Context flows from the guard into every stage's Request.Context:

contentguard.New(stages, workflow, contentguard.Config{
    Context: map[string]string{"scope": "authorized pentest"},
})

Stages read context to adjust behavior (e.g., research scope modifies LLM prompts).

Verdicts

Verdict Meaning
Allow Tool call is safe
Deny Tool call is blocked
Modify Tool call needs changes (rationale has the suggestion)
Escalate Stage can't decide, pass to next (only in findings, never in final result)

Trust

Level Meaning
Trusted Framework-generated (system prompts)
Vetted Human-authored (goals)
Untrusted External content (web fetches, tool results)

Documentation

Overview

Package contentguard provides prompt injection defense through tracked content and staged verification.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	Context  map[string]string // flows to stages (e.g., research scope)
	Patterns []string          // custom "name:regex" injection patterns
	Keywords []string          // custom sensitive keywords
	Skip     []string          // tools that skip verification
}

Config holds optional configuration for the guard. Use Defaults() for zero-value config.

func Defaults

func Defaults() Config

Defaults returns a zero-value Config.

type Content

type Content struct {
	ID      string
	Trust   Trust
	Kind    Kind
	Mutable bool
	Text    string
	Source  string
	Origins []*Content // parent content that influenced this
}

Content represents a piece of tracked content with security metadata.

type Finding

type Finding struct {
	Verdict   Verdict
	Rationale string // why (deny), what instead (modify), why unsure (escalate)
	Source    string // which stage produced this
}

Finding is what one stage concluded about a tool call.

type Guard

type Guard struct {
	// contains filtered or unexported fields
}

Guard verifies tool calls against ingested content through a staged pipeline.

func New

func New(stages []Stage, workflow Workflow, cfg Config) (*Guard, error)

New creates a content guard.

func (*Guard) Check

func (g *Guard) Check(ctx context.Context, toolName string, args map[string]any, originalGoal string) (res *Result, err error)

Check runs the verification pipeline for a tool call.

func (*Guard) ClearContext

func (g *Guard) ClearContext()

ClearContext removes all tracked content.

func (*Guard) Close

func (g *Guard) Close()

Close cleans up resources.

func (*Guard) Find

func (g *Guard) Find(id string) *Content

Find returns tracked content by ID, or nil if not found.

func (*Guard) Ingest

func (g *Guard) Ingest(trust Trust, kind Kind, mutable bool, text, source string) *Content

Ingest adds content to the guard's tracking.

func (*Guard) IngestWithLineage

func (g *Guard) IngestWithLineage(trust Trust, kind Kind, mutable bool, text, source string, originIDs []string) *Content

IngestWithLineage adds content with explicit parent content IDs. Use this when the content was derived from other tracked content (e.g., an LLM response influenced by a web fetch result).

func (*Guard) UntrustedIDs

func (g *Guard) UntrustedIDs() []string

UntrustedIDs returns IDs of all untrusted content in context.

type Kind

type Kind string

Kind represents how content should be interpreted.

const (
	// Instruction means content contains executable instructions.
	Instruction Kind = "instruction"
	// Data means content is data only, never to be interpreted as instructions.
	Data Kind = "data"
)

type Request

type Request struct {
	ToolName      string
	ToolArgs      map[string]any
	Untrusted     []*Content
	OriginalGoal  string
	PriorFindings []*Finding        // what earlier stages found
	Context       map[string]string // guard-level context (e.g., research scope)
}

Request carries all information stages need to make a decision.

type Result

type Result struct {
	Verdict   Verdict
	Rationale string
	ToolName  string
	Findings  []*Finding // all findings, deterministic first
}

Result is the guard's final answer on a tool call.

type Reviewer

type Reviewer struct {
	// contains filtered or unexported fields
}

Reviewer is a Stage that performs full LLM-based security review.

func NewReviewer

func NewReviewer(provider llm.Model) *Reviewer

NewReviewer creates a Stage backed by a capable LLM for full review.

func (*Reviewer) Evaluate

func (r *Reviewer) Evaluate(ctx context.Context, req Request) (*Finding, error)

Evaluate implements Stage.

type Screener

type Screener struct {
	// contains filtered or unexported fields
}

Screener is a Stage that performs quick LLM-based triage.

func NewScreener

func NewScreener(provider llm.Model) *Screener

Screener creates a Stage backed by a cheap LLM for quick triage.

func (*Screener) Evaluate

func (s *Screener) Evaluate(ctx context.Context, req Request) (*Finding, error)

Evaluate implements Stage.

type Stage

type Stage interface {
	Evaluate(ctx context.Context, req Request) (*Finding, error)
}

Stage is one step in the verification pipeline.

type Trust

type Trust string

Trust represents the origin-based authenticity of content.

const (
	// Trusted is for framework-generated content (system prompt, supervisor messages).
	Trusted Trust = "trusted"
	// Vetted is for human-authored content (Agentfile goals, signed packages).
	Vetted Trust = "vetted"
	// Untrusted is for external content (tool results, file reads, web fetches).
	Untrusted Trust = "untrusted"
)

type Verdict

type Verdict string

Verdict is the outcome of a stage evaluation or the guard's final decision.

const (
	Allow    Verdict = "allow"
	Deny     Verdict = "deny"
	Modify   Verdict = "modify"
	Escalate Verdict = "escalate" // only in Finding, never in Result
)

type Workflow

type Workflow interface {
	Execute(ctx context.Context, stages []Stage, req Request) *Result
}

Workflow defines how stages are executed in the verification pipeline.

func Escalatory

func Escalatory() Workflow

Escalatory returns a Workflow that stops on the first allow/deny/modify verdict. Only escalate passes to the next stage. If all stages escalate, fail-safe deny.

func Paranoid

func Paranoid() Workflow

Paranoid returns a Workflow that runs ALL stages regardless of individual verdicts. Deny if ANY stage denies. Allow only if ALL stages allow or escalate.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL