flow

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package flow is m-cli's control-flow / dataflow infrastructure for the path-sensitive lint rules (spec §3.1; the Python tool's "Phase 7"). It builds a per-label control-flow graph over the m-parse tree and provides dataflow passes (currently LOCK-held state, driving M-MOD-025). The CFG is built structurally — walking top-level `line` nodes and attaching each command's dot-block flag — so it needs no parent pointers from the parser.

Faithful to the reference model: per-label CFG with an entry block, one block per `command` node in source order, and an exit block; edges are fall/branch/skip/exit/if-skip, with QUIT inside a dot-block modeled as a dot-block exit (fall-through) rather than a label exit.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AnalyzeTaint

func AnalyzeTaint(cfg CFG, src []byte, formals []string, config TaintConfig) map[int]map[string]bool

AnalyzeTaint runs the forward union-meet taint dataflow over cfg, returning {blockID: tainted-at-entry}. The entry block's IN is the label's formals when config.FormalsTainted; every other block's IN is the union of its predecessors' out-sets.

func DefinitelyDefined

func DefinitelyDefined(cfg CFG, src []byte, formals []string) map[int]map[string]bool

DefinitelyDefined is a forward MUST-analysis (definite assignment) over cfg: in[B] is the set of local names guaranteed to have been DEF'd on EVERY path from the label entry to B. It drives M-MOD-024 (read of a local before it is definitely assigned).

Differs from classical reaching-definitions in two ways: the lattice element is a set of variable *names* (we only need "is X definitely defined", not which SET did it), and the meet is intersection (defined on every path).

formals are definitely defined at entry. Transfer for a block whose command ran is (in - kills) ∪ defs, or ∅ on kills_all; skip / if-skip edges (command did not run) carry the in-set through unchanged. Unreachable blocks → ∅.

func DepthAtExit

func DepthAtExit(cfg CFG, src []byte) int

DepthAtExit returns the worst-case (maximum) open-transaction nesting depth on any path reaching the label's exit. A non-zero result means at least one path leaves a transaction open — the TSTART-leak signal (M-MOD-026).

func EtrapProtection

func EtrapProtection(cfg CFG, src []byte) map[int]bool

EtrapProtection is a forward MUST-analysis (AND meet): in_state[B] is true iff `NEW $ETRAP` (or `NEW $ET`) has executed on EVERY path from the label entry to B. The entry starts unprotected; other blocks start at the lattice top (true, the AND identity) and are refined downward as predecessors propagate.

func FormalParams

func FormalParams(root parse.Node, src []byte) map[int][]string

FormalParams maps each label's start row (0-based, matching CFG.LabelRow) to the names of its declared formal parameters. LBL(A,B) → {row: ["A", "B"]}. Labels with no formals are absent. The structural walk mirrors the CFG build, since m-parse exposes no parent pointers.

func HeldAtExit

func HeldAtExit(cfg CFG, src []byte) []string

HeldAtExit returns the set of lock names still held when control reaches the label's exit on any path — the LOCK-leak signal (M-MOD-025). The result is sorted for determinism.

Types

type Block

type Block struct {
	ID      int
	Kind    string // "entry" | "command" | "exit"
	Command parse.Node
	HasCmd  bool
	Succ    []int
	Edges   []string
	Line    int // 1-based
}

Block is a node in a per-label CFG. Succ and Edges are parallel.

type CFG

type CFG struct {
	LabelName string
	LabelRow  int // 0-based
	LabelCol  int // 0-based
	Blocks    []Block
}

CFG is one label's control-flow graph. Block 0 is the entry; the last block is the exit.

func BuildCFGs

func BuildCFGs(root parse.Node, src []byte) []CFG

BuildCFGs builds one CFG per label in the routine.

func (CFG) ExitID

func (c CFG) ExitID() int

ExitID is the id of the exit block.

type Effects

type Effects struct {
	Defs     map[string]bool
	Kills    map[string]bool
	KillsAll bool
	Uses     []VarUse
}

Effects are the local-variable effects of a command (or a single argument). Defs/Kills are name sets; KillsAll captures the argumentless KILL/NEW semantics (every local in the current frame); Uses is ordered so a diagnostic can point at a specific read site.

type EtrapLeak

type EtrapLeak struct {
	Label  string
	Line   int
	Col    int
	EndCol int
}

EtrapLeak is a `SET $ETRAP=...` site not guarded by a `NEW $ETRAP` on every path from the label entry — the new error handler escapes the label into the caller's stack. Positions are 1-based; Col/EndCol span the offending command.

func EtrapLeaks

func EtrapLeaks(cfg CFG, src []byte) []EtrapLeak

EtrapLeaks returns every unguarded SET $ETRAP site in the label (M-MOD-027). It runs the protection MUST-analysis once, then reports each SET-$ETRAP block whose entry is not protected.

type StaleTestRead

type StaleTestRead struct {
	Label  string
	Line   int
	Col    int
	EndCol int
}

StaleTestRead is a read of $TEST ($T) at a point where no $TEST-setting command is guaranteed to have run on every path from the label entry — the value may be left over from a much earlier command (M-MOD-017). Positions are 1-based; Col/EndCol span the special variable.

func StaleTestReads

func StaleTestReads(cfg CFG, src []byte) []StaleTestRead

StaleTestReads returns the stale $TEST reads in the label, one per source line (consecutive reads collapse). It runs the freshness MUST-analysis, then reports $TEST/$T reads in blocks whose entry is not fresh.

type TaintConfig

type TaintConfig struct {
	// FormalsTainted taints the label's formal parameters at entry. Default
	// true — public-label formals are attack surface.
	FormalsTainted bool
	// Sanitizers are uppercased intrinsic-function keywords whose output is
	// treated as clean regardless of input taint (e.g. $LENGTH returns a
	// number).
	Sanitizers map[string]bool
}

TaintConfig configures the taint analyzer.

func DefaultTaintConfig

func DefaultTaintConfig() TaintConfig

DefaultTaintConfig taints formals and treats $L/$LENGTH/$A/$ASCII as sanitizers — matching the Python reference defaults.

type TaintFlow

type TaintFlow struct {
	Label    string
	Name     string // the first tainted variable name in the sink subtree
	SinkKind string // "indirection (@…)" | "XECUTE argument"
	Line     int
	Col      int
	EndLine  int
	EndCol   int
}

TaintFlow is one sink reached by a tainted local variable (the raw M-MOD-036 signal, before the lint layer dedups per (label, var)). Positions are 1-based.

func TaintFlows

func TaintFlows(cfg CFG, src []byte, formals []string, config TaintConfig) []TaintFlow

TaintFlows returns, in source order, every indirection or XECUTE-argument sink in cfg whose subtree references a tainted variable, naming the first such var. No dedup is applied here — the lint layer collapses to one finding per (label, var).

type UndefinedRead

type UndefinedRead struct {
	Label  string
	Name   string
	Line   int
	Col    int
	EndCol int
}

UndefinedRead is one read of a local variable that may not be definitely assigned on every path from the label entry (the raw M-MOD-024 signal, before the lint layer applies the Kernel allowlist and per-label dedup). Positions are 1-based; Col/EndCol span the variable name.

func UndefinedReads

func UndefinedReads(cfg CFG, src []byte, formals []string) []UndefinedRead

UndefinedReads returns, in source order, every local-variable read in cfg that is not guaranteed defined on every prior path. It runs the definite-assignment analysis, tracks running defs within each command (so S A=1,B=A sees A defined for B's RHS), and suppresses reads protected by the IF $G(X)="" SET X=default idiom. No dedup or allowlist is applied here.

Known limitations (faithful to the reference): GOTO targets within the routine are over-approximated as exits; FOR loops have no back-edge, so a first-iteration read may be under-reported; OPEN device-parameter syntax can parse as local variables and over-report on I/O code.

type VarUse

type VarUse struct {
	Name string
	Node parse.Node
	Line int
	Col  int
}

VarUse is a single read of a local variable, anchored at its AST node. Positions are 1-based.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL