cfg

package
v1.5.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2026 License: MIT Imports: 2 Imported by: 0

Documentation

Overview

Package cfg provides control flow graph representation for dataflow analysis.

A control flow graph (CFG) models the flow of control through a function body. Each node represents a program point (assignment, branch, call, etc.) and edges represent possible control flow between points.

Core Types

Point: An index identifying a location in the CFG. Points are local to each function's CFG and serve as keys for type state maps.

Node: Contains metadata about a program point including its kind, target variable (for assignments), callee (for calls), and condition info (for branches).

Edge: A directed edge from one point to another, with a condition flag indicating whether it's the then-branch (true) or else-branch (false) of a conditional.

Graph: Interface for CFG access, implemented by CFG. Provides node lookup, edge traversal, predecessor/successor queries, and reverse post-order iteration.

SSA-Like Identity

The cfg package provides SymbolID for SSA-style value identity. Unlike SSA which creates new versions at each assignment, SymbolID identifies the lexical variable binding. Combined with Point, this enables tracking which value a name refers to at each program point.

Analysis Support

The CFG supports forward dataflow analysis via RPO (reverse post-order) traversal. Branch conditions are exposed for constraint extraction, and join points are identified for type merging.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BoundIdent

type BoundIdent struct {
	Symbol SymbolID
	Point  Point
	// contains filtered or unexported fields
}

BoundIdent represents an identifier that has been resolved to a symbol.

A BoundIdent is created when a name reference in the AST is resolved to its declaration. It captures both the SymbolID (for lookup) and the Point (for flow-sensitive analysis).

By using BoundIdent instead of raw strings, the type system enforces that variable identity comes from proper binding resolution, not string comparison. Two variables with the same name in different scopes have different BoundIdents.

func NewBoundIdent

func NewBoundIdent(name string, symbol SymbolID, point Point) BoundIdent

NewBoundIdent creates a BoundIdent from a name, symbol, and point. Returns zero value if name is empty or symbol is zero.

func (BoundIdent) IsValid

func (b BoundIdent) IsValid() bool

IsValid returns true if this represents a valid binding.

func (BoundIdent) Name

func (b BoundIdent) Name() string

Name returns the identifier name for display purposes only.

This method is for error messages and debugging output. For identity comparisons, use the Symbol field, which distinguishes between different variables that happen to have the same name.

type CFG

type CFG struct {
	Nodes []Node
	// contains filtered or unexported fields
}

CFG represents the control flow graph for a function.

A CFG is built during AST analysis and consumed by the type checker for flow-sensitive analysis. The graph is immutable after construction.

Structure:

  • Nodes: flat array indexed by Point
  • Edges: directed edges with predecessor/successor maps for efficient lookup
  • Entry/Exit: special nodes for function boundaries

The CFG is built incrementally via AddNode/AddEdge, then traversed via RPO for forward dataflow analysis.

func New

func New() *CFG

New creates an empty CFG.

func (*CFG) AddBranch

func (c *CFG) AddBranch(condVar SymbolID, condCheck CondCheck) Point

AddBranch adds a branch node with condition info.

func (*CFG) AddEdge

func (c *CFG) AddEdge(from, to Point, cond bool)

AddEdge adds an edge.

func (*CFG) AddNode

func (c *CFG) AddNode(kind NodeKind, target SymbolID, callee string) Point

AddNode adds a node and returns its point.

func (*CFG) EdgeCond

func (c *CFG) EdgeCond(from, to Point) (bool, bool)

EdgeCond returns the condition value for edge from->to. Returns (true, ok) for then-branch, (false, ok) for else-branch.

func (*CFG) Edges

func (c *CFG) Edges() []Edge

Edges returns all edges.

func (*CFG) Entry

func (c *CFG) Entry() Point

Entry returns the entry point.

func (*CFG) Exit

func (c *CFG) Exit() Point

Exit returns the exit point.

func (*CFG) ID

func (c *CFG) ID() uint64

ID returns the stable identifier for this CFG instance.

func (*CFG) IsBranch

func (c *CFG) IsBranch(p Point) bool

IsBranch returns true if p has multiple successors.

func (*CFG) IsJoin

func (c *CFG) IsJoin(p Point) bool

IsJoin returns true if p has multiple predecessors.

func (*CFG) Node

func (c *CFG) Node(p Point) *Node

Node returns the node at point p.

func (*CFG) Predecessor

func (c *CFG) Predecessor(p Point) Point

Predecessor returns single predecessor (for non-join nodes).

func (*CFG) Predecessors

func (c *CFG) Predecessors(p Point) []Point

Predecessors returns all predecessors of p.

func (*CFG) RPO

func (c *CFG) RPO() []Point

RPO returns nodes in reverse post-order for forward dataflow analysis.

Reverse post-order (RPO) is the optimal traversal order for forward dataflow analysis: predecessors are visited before successors (except for back-edges in loops). This ensures that type state propagates correctly through the control flow graph.

Only nodes reachable from the entry point are included. Unreachable code (after unconditional return, for example) is excluded.

func (*CFG) Reachable

func (c *CFG) Reachable() map[Point]bool

Reachable returns a set of all nodes reachable from entry.

func (*CFG) RemoveOutgoing

func (c *CFG) RemoveOutgoing(from Point)

RemoveOutgoing removes all outgoing edges from a node.

func (*CFG) Size

func (c *CFG) Size() int

Size returns node count.

func (*CFG) Successor

func (c *CFG) Successor(p Point) Point

Successor returns single successor (for non-branch nodes).

func (*CFG) Successors

func (c *CFG) Successors(p Point) []Point

Successors returns all successors of p.

func (*CFG) UnreachablePoints

func (c *CFG) UnreachablePoints() []Point

UnreachablePoints returns all nodes not reachable from entry.

func (*CFG) ValidateEdges

func (c *CFG) ValidateEdges() error

ValidateEdges checks that all edges reference valid node indices. Returns nil if valid, otherwise returns an error describing the issue.

type CondCheck

type CondCheck struct {
	Kind     CondCheckKind
	TypeName string // Only for CheckTypeEqual/CheckTypeNot
}

CondCheck represents a condition check in a branch node.

type CondCheckKind

type CondCheckKind uint8

CondCheckKind identifies the type of condition check in a branch.

The type checker uses CondCheckKind to extract constraints from branch conditions without re-analyzing the expression. This enables efficient type narrowing in then/else branches.

const (
	CheckNone      CondCheckKind = iota // Complex expression, no simple constraint
	CheckTruthy                         // if x: narrows x to truthy values
	CheckFalsy                          // if not x: narrows x to falsy values
	CheckNil                            // x == nil: narrows x to nil
	CheckNotNil                         // x ~= nil: narrows x to non-nil
	CheckLimit                          // Numeric for loop limit (i <= n)
	CheckTypeEqual                      // type(x) == "typename": narrows to that type
	CheckTypeNot                        // type(x) ~= "typename": excludes that type
)

Condition check kind constants represent recognizable branch patterns.

type Edge

type Edge struct {
	From Point
	To   Point
	Cond bool // true = then branch, false = else branch
}

Edge represents a directed control flow edge between two points.

For branch nodes (NodeBranch), the Cond field distinguishes branches:

  • Cond=true: the "then" branch, taken when condition is truthy
  • Cond=false: the "else" branch, taken when condition is falsy

For non-branch nodes, Cond is typically true (single successor).

type EdgeError

type EdgeError struct {
	From   Point
	To     Point
	Reason string
}

EdgeError describes an invalid edge.

func (*EdgeError) Error

func (e *EdgeError) Error() string

type Graph

type Graph interface {
	ID() uint64                           // Unique identifier for this CFG instance
	Entry() Point                         // Function entry point (always node 0)
	Exit() Point                          // Function exit point
	Node(p Point) *Node                   // Node metadata at point p
	RPO() []Point                         // Reverse post-order for forward analysis
	Predecessors(p Point) []Point         // Incoming edges
	Successor(p Point) Point              // Single successor (non-branch nodes)
	Successors(p Point) []Point           // All successors (branch nodes have 2)
	Edges() []Edge                        // All edges in the graph
	Size() int                            // Number of nodes
	EdgeCond(from, to Point) (bool, bool) // Branch condition for an edge
	IsJoin(p Point) bool                  // True if p has multiple predecessors
	IsBranch(p Point) bool                // True if p has multiple successors
}

Graph is the interface for control flow graphs.

This interface abstracts over different CFG implementations, allowing the type checker to work with various CFG builders.

Key methods for dataflow analysis:

  • RPO(): Returns nodes in reverse post-order for forward analysis
  • Predecessors/Successors: Enable backward/forward traversal
  • EdgeCond: Retrieves branch condition for type narrowing
  • IsJoin/IsBranch: Identifies merge points and decision points

type Node

type Node struct {
	Point      Point
	Kind       NodeKind
	Target     SymbolID  // Variable for assignments (0 = none or unresolved)
	Callee     string    // Function for calls (global/external name)
	CondVar    SymbolID  // Variable being tested (0 = none or complex expression)
	CondCheck  CondCheck // Condition check type and optional type name
	LoopVars   []SymbolID
	LoopLocals []SymbolID
	// LoopPreheader is the unique predecessor that enters a loop from outside
	// (i.e., not a back-edge). For loops with multiple entry points or complex
	// goto patterns, this points to the primary loop entry.
	LoopPreheader    Point
	LoopPreheaderSet bool
}

Node represents a CFG node with metadata about the program point.

Different node kinds use different fields:

  • NodeAssign: Target holds the assigned variable's SymbolID
  • NodeCall: Callee holds the function name for global/external calls
  • NodeBranch: CondVar and CondCheck describe the condition for narrowing
  • NodeJoin: LoopVars, LoopLocals, LoopPreheader describe loop structure

The Point field is the node's index in the CFG's Nodes slice.

type NodeKind

type NodeKind uint8

NodeKind identifies the type of CFG node.

const (
	NodeEntry      NodeKind = iota // Function entry point, always first node
	NodeExit                       // Function exit point, always last node
	NodeAssign                     // Assignment statement (local x = expr)
	NodeCall                       // Function call expression
	NodeBranch                     // Conditional branch (if, while, for condition)
	NodeJoin                       // Join point where multiple paths merge
	NodeReturn                     // Return statement
	NodeScopeEnter                 // Lexical scope entry (function, block, loop body)
	NodeScopeExit                  // Lexical scope exit (end of block)
	NodeTypeDef                    // Type definition (type annotation)
)

Node kind constants identify the type of each CFG node.

type ParamProvider

type ParamProvider interface {
	// ParamNames returns the function parameter names in order.
	// Returns nil for block graphs or functions with no parameters.
	ParamNames() []string

	// ParamSymbols returns the function parameter symbol IDs in order.
	// Returns nil for block graphs or functions with no parameters.
	ParamSymbols() []SymbolID

	// ParamDeclPoints returns the CFG points where parameters are declared.
	// Returns nil for block graphs or functions with no parameters.
	ParamDeclPoints() []Point
}

ParamProvider provides function parameter information.

This interface enables the type checker to access function parameter metadata for signature checking and type initialization.

type PhiNode

type PhiNode struct {
	Point    Point        // CFG point where the phi is located
	Target   Version      // The new version created by this phi
	Operands []PhiOperand // Incoming versions from each predecessor
}

PhiNode represents a phi function at a control flow join point.

Phi nodes are the mechanism for merging variable versions after control flow divergence. When an if/else or loop creates multiple definitions of a variable, a phi node at the join point unifies them into a single version.

Example:

if cond then
    x = "hello"  -- x@1
else
    x = "world"  -- x@2
end
-- phi node: x@3 = phi(x@1 from then-branch, x@2 from else-branch)
print(x)  -- uses x@3

For type checking, the phi node's type is typically the union of all operand types.

type PhiOperand

type PhiOperand struct {
	From    Point   // Predecessor point where this version comes from
	Version Version // The SSA version from that predecessor
}

PhiOperand represents one incoming value for a phi node.

At a join point where multiple control flow paths merge, a phi node needs to know which version of a variable comes from each predecessor. Each PhiOperand pairs a predecessor point with the version visible at that point.

type Point

type Point uint32

Point represents a location in a control flow graph. Each Point is an index into the CFG's node array, local to that CFG.

type SSAVersioned

type SSAVersioned interface {
	// VisibleVersion returns the SSA version of a symbol visible at a point.
	// Returns a zero Version if the symbol is not defined on all paths to this point.
	VisibleVersion(p Point, sym SymbolID) Version

	// AllVisibleVersions returns all symbol versions visible at a point.
	// The returned map should not be modified by callers.
	AllVisibleVersions(p Point) map[SymbolID]Version

	// PhiNodes returns all phi nodes in the graph.
	// Used to determine type at join points by unioning operand types.
	PhiNodes() []PhiNode
}

SSAVersioned provides SSA versioning data for a CFG.

This interface abstracts over SSA construction, allowing the type checker to query which version of a variable is visible at each program point without knowing the construction details.

The versioning information enables flow-sensitive type checking: different assignments can have different types, and the checker uses the version to look up the correct type.

type SymbolID

type SymbolID uint64

SymbolID uniquely identifies a variable declaration across the program.

SymbolID provides declaration-level identity, distinguishing between variables that happen to have the same name but are different bindings:

local x = 1        -- SymbolID 100
if cond then
    local x = 2    -- SymbolID 101 (different binding)
    print(x)       -- refers to SymbolID 101
end
print(x)           -- refers to SymbolID 100

The combination of SymbolID and Version forms a complete SSA identity, where SymbolID identifies which variable and Version identifies which definition of that variable.

SymbolID 0 is reserved for unresolved or unknown references.

func NextSymbolID

func NextSymbolID() SymbolID

NextSymbolID generates a unique symbol ID. Thread-safe via atomic operations.

func ReserveSymbolIDs

func ReserveSymbolIDs(n int) SymbolID

ReserveSymbolIDs reserves a contiguous block of symbol IDs and returns the first ID in the block. Returns 0 when n <= 0.

type SymbolKind

type SymbolKind int

SymbolKind classifies how a symbol was declared.

The declaration kind affects type checking behavior:

  • Param: Initialized by caller, type from function signature
  • Local: Initialized at declaration point
  • Global: Initialized from global environment
  • Upvalue: Captured from enclosing scope, may have multiple definitions
const (
	// SymbolUnknown indicates the symbol kind is not known.
	SymbolUnknown SymbolKind = iota
	// SymbolParam indicates a function parameter.
	SymbolParam
	// SymbolLocal indicates a local variable.
	SymbolLocal
	// SymbolGlobal indicates a global variable.
	SymbolGlobal
	// SymbolUpvalue indicates an upvalue (captured variable).
	SymbolUpvalue
)

type SymbolScopeProvider

type SymbolScopeProvider interface {
	// SymbolAt returns the SymbolID for a variable name at a specific CFG point.
	// Returns (0, false) if the name is not visible at that point.
	SymbolAt(p Point, name string) (SymbolID, bool)

	// AllSymbolsAt returns all visible symbols at a CFG point.
	// Returns a map of variable names to their SymbolIDs.
	AllSymbolsAt(p Point) map[string]SymbolID

	// DeclarationPoint returns the CFG point where a symbol was declared.
	// Returns (0, false) if the symbol is unknown.
	DeclarationPoint(sym SymbolID) (Point, bool)

	// NameOf returns the variable name for a symbol (for display purposes).
	// Returns empty string if the symbol is unknown.
	NameOf(sym SymbolID) string

	// SymbolKind returns the kind of a symbol (Param, Local, or Global).
	// Returns (SymbolUnknown, false) if the symbol is not known.
	SymbolKind(sym SymbolID) (SymbolKind, bool)
}

SymbolScopeProvider provides symbol visibility information at CFG points.

This interface enables the type checker to resolve variable names to their SymbolIDs based on lexical scoping rules at each program point.

type Version

type Version struct {
	Root   string   // Variable name (for display and lookup)
	Symbol SymbolID // Declaration identity (distinguishes same-named variables in different scopes)
	ID     int      // Version number within the function (0 = undefined/uninitialized)
}

Version represents a stable SSA version of a variable.

In SSA (Static Single Assignment) form, each assignment creates a new version of a variable. The Version type captures both the variable identity and which assignment created this particular value:

local x = 1     -- x@1
x = x + 1       -- x@2 (new version)
if cond then
    x = 10      -- x@3 in then-branch
else
    x = 20      -- x@4 in else-branch
end
-- x@5 = phi(x@3, x@4) at join point

Version components:

  • Root: The variable name for display purposes
  • Symbol: The SymbolID of the declaration (distinguishes shadowed names)
  • ID: The version number (0 = undefined, 1+ = specific assignment)

func (Version) IsZero

func (v Version) IsZero() bool

IsZero returns true if this is an undefined/uninitialized version. Zero versions indicate the variable has not been assigned on this path.

func (Version) Key

func (v Version) Key() string

Key returns a unique string key for this version, suitable for use as a map key. Format: "name#symbol@version" or "name@version" if symbol is zero. Note: This is an SSA-level key, not a PathKey. For PathKey-based lookups, use pathkey.Resolver.KeyAtVersion().

func (Version) String

func (v Version) String() string

String returns a human-readable representation of the version.

type VersionedGraph

type VersionedGraph interface {
	Graph
	SSAVersioned
	SymbolScopeProvider
	ParamProvider
}

VersionedGraph combines a CFG graph with SSA versioning and symbol scope info.

This is the full interface required by the type checker for flow-sensitive analysis. It combines:

  • Graph: control flow structure for traversal
  • SSAVersioned: version information for flow-sensitive types
  • SymbolScopeProvider: name resolution at each point
  • ParamProvider: function signature information

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL