Documentation
¶
Overview ¶
Package danger classifies shell commands by risk level and provides a configurable approval system for dangerous operations.
Classification is token-based (not regex) — it respects quotes, pipes, redirects, compound commands (&&, ||, ;), and multi-line input. Each command is classified into one of 9 risk classes, and the user can configure which actions (allow/prompt/deny) apply to each class.
The gate fails CLOSED. A command whose program name is recognised but used benignly classifies as Safe (allow); a command whose verb is NOT recognised classifies as Unknown and is denied by default. The set of recognised-safe commands (safeCommands) is therefore an explicit read-only allowlist — extend it, or the per-profile allowlist, to permit a tool rather than relying on it slipping through unclassified.
Threat model ¶
The classifier is an adversarial filter, not a parser for well-behaved input. It assumes a prompt-injected agent is actively trying to make a dangerous command read as harmless so it slips past the approval gate. The design therefore errs toward the worse class when in doubt, and is built in layers that each close a category of evasion:
Normalisation (see normalize) rewrites the command so token-level analysis can see through shell tricks before classification runs: - $'…' ANSI-C escapes decodeANSIC ($'\x72\x6d' → rm) - $IFS word-splitting expandIFS (rm$IFS-rf$IFS/ → rm -rf /) - {a,b,c} brace expansion expandBraces ({rm,-rf,/} → rm -rf /) - $(…)/`…`/<(…)/>(…) subst. extractSubstitutions (bodies classified too) - command/exec/builtin stripCommandWrappers - \-escapes (r\m, \rm) collapseUnquotedBackslashes - absolute paths (/bin/rm) basenameFirstToken + commandName The tokenizer additionally treats quote boundaries as NON word boundaries, so empty/adjacent quotes like r""m and "rm" still resolve to the single word `rm`.
Structural decomposition. A command is split into segments (on ;, &&, ||), each segment into pipe stages (on |), and EVERY stage is classified — not just the head — so `true | dd of=/dev/sda` and `echo x | sudo rm -rf /home` are seen for what their later stages do. The worst class across all parts wins (see rank).
Wrapper unwrapping (unwrapWrappers). Leading execution wrappers (env, xargs, nohup, nice, setsid, timeout, …) are stripped so the real command underneath is classified; privileged wrappers (sudo, doas, pkexec) additionally impose a system_write floor and then let the inner command escalate further (sudo rm -rf /var → destructive).
Verb-independent resource scanning (classifyResourceToken). Some resources are dangerous regardless of the command touching them: /dev/tcp and /dev/udp pseudo-devices (reverse-shell channels) and sensitive credential paths (~/.ssh, /etc/shadow, ~/.aws/credentials, /proc/self/environ, …). These are flagged wherever they appear.
Payload re-classification. Shell -c strings (bash -c '…') and the bodies of command/process substitutions are themselves classified by re-entering Classify, so nested commands cannot hide a level deeper.
Limitations ¶
This is a heuristic defence-in-depth layer, NOT a sandbox or a complete shell interpreter. It does not, and cannot, catch everything:
- Variable indirection: `X=rm; $X -rf /` — the value of $X is not tracked. Note the fail-closed default turns this from a silent bypass into a denial: the unrecognised `$X` verb classifies as Unknown.
- Fully dynamic construction from runtime data, command output, or environment the classifier cannot evaluate.
- Arbitrary value transformations beyond the enumerated encodings (e.g. a secret piped through gzip/openssl before exfiltration).
- Interpreter escape hatches we do not special-case (awk 'BEGIN{system()}', editor `!` shells, language-specific eval paths). These read as a known command (awk/vim/…) used benignly, so they classify Safe — the known verb is the gap, not an unknown one.
Because these gaps exist, the classifier is paired with other controls: non-interactive denial, output redaction (internal/redact), and — for strong isolation — the container sandbox. When tuning, remember that over-classification only costs an extra prompt, while under-classification can let a destructive or exfiltrating command through silently; prefer the former.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func IsSafe ¶
IsSafe returns true if no injection threats are detected in content. This is the primary gate used before injecting untrusted content into the system prompt.
Types ¶
type Action ¶
type Action string
Action represents what to do when a command of a given risk class is detected.
type Approver ¶
type Approver interface {
// PromptCommand asks the user to approve or deny a shell command.
// cls is the risk class (system_write, network_egress, etc.).
// Returns nil on approve, error on deny or timeout.
PromptCommand(cls RiskClass, cmd, description string) error
// PromptOperation asks the user to approve or deny a native tool operation
// (read_file on /etc, browser to external URL, etc.).
PromptOperation(op ToolOperation) error
}
Approver is the interface for user approval of dangerous operations. Two implementations exist:
- TTYApprover — opens /dev/tty for interactive approval (CLI mode)
- WSApprover — sends approval requests via WebSocket (serve mode)
When nil (no approver configured), calls fall back to non-interactive behavior (NonInteractiveAction). Tools MUST inject an approver to get interactive approval in any mode.
type DangerousConfig ¶
type DangerousConfig struct {
// Classes maps risk classes to their configured action.
// Only overrides for non-default values need to be set.
Classes map[RiskClass]Action `json:"classes,omitempty"`
// Allowlist is a list of command strings that are always allowed,
// regardless of their risk classification. Exact match only.
// Takes priority over Denylist.
Allowlist []string `json:"allowlist,omitempty"`
// Denylist is a list of command strings that are always denied,
// regardless of their risk classification. Prefix match (after trimming).
Denylist []string `json:"denylist,omitempty"`
// DefaultAction is the global default action applied to ALL risk classes
// when set. Per-class overrides in Classes still win.
// "allow" → YOLO mode (everything runs without prompt)
// "deny" → lockdown (everything denied unless explicitly allowed)
// Not set → uses built-in defaults per class
DefaultAction *string `json:"action,omitempty"`
// NonInteractive specifies what to do when running without a TTY.
// "allow" (default) — run everything, "deny" — block all prompted ops.
NonInteractive *string `json:"non_interactive,omitempty"`
// Approver handles interactive approval prompts for dangerous operations.
// When set, all Prompt-class operations use this instead of /dev/tty.
// Tools can inject their own approver (e.g., WebSocket-based for odek serve).
// When nil, CheckOperation falls back to /dev/tty (CLI-compatible default).
Approver Approver `json:"-"`
}
DangerousConfig defines how dangerous operations are handled. Configurable via the standard 4-layer odek config chain.
Default behavior per class (no sandbox):
safe → allow, local_write → allow, system_write → prompt, destructive → deny, network_egress → prompt, code_execution → prompt, install → prompt, blocked → deny, unknown → deny
The classifier fails closed: a command whose program name is not recognised classifies as Unknown and is denied by default. Set "unknown": "prompt" (or add trusted tools to the allowlist) to soften this for a given profile.
func (*DangerousConfig) ActionFor ¶
func (c *DangerousConfig) ActionFor(cls RiskClass) Action
ActionFor returns the configured action for the given risk class. Per-class overrides in Classes win first, then the global default action (the "action" field), then built-in defaults, then Prompt.
func (*DangerousConfig) ActionForCommand ¶
func (c *DangerousConfig) ActionForCommand(cmd string) Action
ActionForCommand returns the action for a specific command string. Allowlist and denylist are checked first (exact match for allowlist, prefix match for denylist), then falls back to the risk-class-based action.
func (*DangerousConfig) CheckOperation ¶
func (c *DangerousConfig) CheckOperation(op ToolOperation, trustedClasses map[RiskClass]bool) error
CheckOperation checks whether a tool operation is allowed, denied, or needs approval. Returns nil on allow, error on deny, and prompts the user on prompt. Uses the configured Approver when set; falls back to /dev/tty (TTYApprover) when no approver is configured.
func (*DangerousConfig) NonInteractiveAction ¶
func (c *DangerousConfig) NonInteractiveAction() Action
NonInteractiveAction returns the action to use when no TTY is available.
type InjectionPattern ¶
InjectionPattern groups a compiled regex with a human-readable label describing what threat it detects.
type RiskClass ¶
type RiskClass string
RiskClass represents the risk level of a shell command.
const ( Safe RiskClass = "safe" LocalWrite RiskClass = "local_write" SystemWrite RiskClass = "system_write" Destructive RiskClass = "destructive" NetworkEgress RiskClass = "network_egress" CodeExecution RiskClass = "code_execution" Install RiskClass = "install" Blocked RiskClass = "blocked" // Unknown is the fall-through class for a command whose program name the // classifier does not recognise. It defaults to Deny (same as // Destructive): the gate fails CLOSED rather than open, so a novel or // obfuscated verb that dodged every known-dangerous check cannot run // unprompted. Recognised-but-benign usage classifies as Safe instead. Unknown RiskClass = "unknown" )
func Classify ¶
Classify determines the risk class of a shell command using token-level heuristics. Returns the highest-severity class detected.
Priority (highest to lowest): blocked > destructive > system_write > code_execution > network_egress > install > local_write > safe
Pipeline (see the package doc for the full evasion model):
raw cmd ─▶ isRawBlocked ─▶ normalize ─┬─▶ classifyOne(main) ─┐
└─▶ Classify(sub) ⟳ ───┴─▶ worst wins
normalize neutralises shell evasion tricks (ANSI-C/$IFS/brace expansion, $(…)/`…`/<(…) substitutions, command/exec wrappers, backslash escapes, absolute-path basenames) and returns the rewritten command plus any substitution bodies. classifyOne then splits into segments and pipe stages and classifies each (see classifyPipeline/classifyStage). Every extracted sub-expression is re-classified through Classify so nested commands cannot hide one level deeper; the worst class across the whole tree is returned.
func ClassifyPath ¶
ClassifyPath returns a RiskClass for a filesystem path. /tmp/*, working directory → local_write; /etc/*, /root/* → system_write; /boot/*, /dev/*, /sys/* → destructive; home sensitive dirs → system_write.
func ClassifyURL ¶
ClassifyURL returns a RiskClass for a browser URL. Internal IPs → system_write; external → network_egress. Uses proper IP parsing (handles decimal, octal, hex, IPv6 compressed, short forms like 127.1, and all other representations that browsers accept via inet_aton-style parsing) instead of string prefix matching which was trivially bypassable.
type ScanResult ¶
type ScanResult struct {
Label string // human-readable threat label
Pattern string // the regexp pattern that matched (for debugging)
}
ScanResult describes a single detected injection threat.
func ScanInjection ¶
func ScanInjection(content string) []ScanResult
ScanInjection checks content for prompt injection attempts. Returns nil if no threats detected, or a list of found threats. Each threat includes a label describing what was found.
type TTYApprover ¶
type TTYApprover struct {
DangerousConfig *DangerousConfig
TrustedClasses map[RiskClass]bool
TTYPath string // overridden in tests
// Approval-fatigue mitigation. After FrictionThreshold approvals of
// the same class within FrictionWindow, the next prompt requires
// the user to type the literal word "approve" (no single-letter
// shortcut) and prints a 1.5s pause before accepting input. This
// breaks reflexive click-through and gives the user a moment to
// notice they have approved an unusual number of dangerous calls.
FrictionThreshold int
FrictionWindow time.Duration
// contains filtered or unexported fields
}
TTYApprover implements Approver by reading from /dev/tty. This is the default approver used in CLI mode (odek run, odek repl). When /dev/tty is not available (piped stdin, CI), it falls back to the configured NonInteractiveAction.
func NewTTYApprover ¶
func NewTTYApprover(cfg *DangerousConfig) *TTYApprover
NewTTYApprover creates a TTYApprover with the given config.
func (*TTYApprover) PromptCommand ¶
func (a *TTYApprover) PromptCommand(cls RiskClass, cmd, description string) error
func (*TTYApprover) PromptOperation ¶
func (a *TTYApprover) PromptOperation(op ToolOperation) error
func (*TTYApprover) SetTrustAll ¶
func (a *TTYApprover) SetTrustAll(enabled bool)
SetTrustAll enables or disables blanket trust for all risk classes. When enabled, PromptCommand returns nil for every call (used by batch approval).
func (*TTYApprover) SetTrustedClasses ¶
func (a *TTYApprover) SetTrustedClasses(m map[RiskClass]bool)
SetTrustedClasses atomically sets the trusted classes map. Takes ownership of the provided map — caller must not write to it after calling.
type ToolOperation ¶
ToolOperation describes a native tool call for approval checking.