Documentation
¶
Overview ¶
Package lexer implements lexical analysis for Buildfiles.
The lexer tokenizes Buildfile source code into a stream of tokens, handling:
- Indentation tracking with consistent space/tab enforcement
- Interpolation boundary detection ({var} syntax)
- Multiple lexing modes (line start, normal, value, interpolation)
- Escape sequences ({{ and }})
Lexer Modes ¶
The lexer operates in different modes depending on context:
- ModeLineStart: At beginning of line, consuming indentation
- ModeNormal: Normal token recognition (targets, identifiers, paths)
- ModeValue: After = or :, consuming value content as strings
- ModeInterp: Inside {} interpolation, lexing identifier and modifier
Mode transitions are managed with a stack to handle nested contexts, particularly for interpolations within values.
Token Types ¶
Token types are categorized as:
- Special: EOF, NEWLINE, INDENT, COMMENT, ERROR
- Dot Keywords: .shell, .parallel, .default, .include, etc.
- Keywords: lazy, if, elif, else, end, ifdef, ifndef, block
- Operators: =, :, ==, !=, (, ), ,
- Identifiers: IDENTIFIER, AT_IDENTIFIER, PATH, STRING
- Interpolation: INTERP_START, INTERP_END, INTERP_MOD, escapes
- Functions: shell, glob, basename, dirname, replace
Indentation Tracking ¶
The IndentTracker enforces consistent indentation across a Buildfile:
- First indented line establishes the indent unit (e.g., 4 spaces or 1 tab)
- Subsequent indents must be exact multiples of this unit
- Mixed tabs and spaces within a single indent string is an error
Interpolation Boundary Detection ¶
A { character is recognized as interpolation start if and only if:
- Preceded by a boundary character (whitespace, :, =, /, quotes, parens, etc.)
- Followed by a valid identifier start character (letter or underscore)
This distinguishes interpolations from shell variable syntax ($var) and literal braces in commands.
Package lexer implements the lexical analysis phase for Buildfile parsing.
Index ¶
- func IsInterpBoundary(prev byte, atSOL bool) bool
- func IsValidIdentifierChar(c byte) bool
- func IsValidIdentifierStart(c byte) bool
- func ScanEscapedCloseBrace(input string, pos int) (bool, int)
- type IndentChar
- type IndentError
- type IndentTracker
- type InterpResult
- type InterpResultKind
- type Lexer
- func (l *Lexer) IndentTracker() *IndentTracker
- func (l *Lexer) NextToken() Token
- func (l *Lexer) PeekIsVariableLine() bool
- func (l *Lexer) PeekNextIsBlock() bool
- func (l *Lexer) PeekNextIsDotKeyword() bool
- func (l *Lexer) SetCommandMode()
- func (l *Lexer) SetNormalMode()
- func (l *Lexer) SetValueMode()
- type LexerMode
- type SourceLocation
- type Token
- type TokenType
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func IsInterpBoundary ¶
IsInterpBoundary returns true if prev is a valid boundary character for interpolation recognition.
Per DESIGN.md Section 2.3.2 (extended for practical use): `{` is recognized as INTERP_START if preceded by:
- whitespace (space or tab)
- start-of-line (atSOL=true)
- `:` or `=` (operators)
- `/` (path separator, for patterns like {dir}/{file})
- `"` or `'` (quotes, for strings like "{var}")
- `(`, `)`, `,` (function call syntax)
- `>`, `<` (shell redirections)
func IsValidIdentifierChar ¶
IsValidIdentifierChar returns true if c can be part of an identifier. Per spec: letter, digit, underscore, or dot (for target.dir, target.file).
func IsValidIdentifierStart ¶
IsValidIdentifierStart returns true if c can start an identifier. Per spec: letter or underscore.
Types ¶
type IndentChar ¶
type IndentChar int
IndentChar represents the type of character used for indentation.
const ( // IndentUnknown indicates indentation character not yet determined. IndentUnknown IndentChar = iota // IndentSpace indicates spaces are used for indentation. IndentSpace // IndentTab indicates tabs are used for indentation. IndentTab )
func (IndentChar) String ¶
func (c IndentChar) String() string
String returns the string representation of the indent character type.
type IndentError ¶
IndentError represents an indentation-related error.
func (*IndentError) Error ¶
func (e *IndentError) Error() string
Error implements the error interface.
type IndentTracker ¶
type IndentTracker struct {
// contains filtered or unexported fields
}
IndentTracker tracks indentation state across lines in a Buildfile.
Per DESIGN.md Section 2.3.3:
- First indented line establishes the indent unit (e.g., 4 spaces)
- Subsequent indents must be multiples of this unit
- Mixed tabs/spaces after first line is an error
- Level 0 = column 0, Level 1 = one unit, Level 2 = two units
func NewIndentTracker ¶
func NewIndentTracker() *IndentTracker
NewIndentTracker creates a new IndentTracker with no established indent unit.
func (*IndentTracker) Char ¶
func (t *IndentTracker) Char() IndentChar
Char returns the established indentation character type.
func (*IndentTracker) Process ¶
func (t *IndentTracker) Process(indent string) (int, error)
Process analyzes an indentation string and returns the logical indentation level. The input should be the leading whitespace of a line (spaces and/or tabs only).
Returns:
- level: The logical indentation level (0, 1, 2, ...)
- error: Non-nil if the indentation is invalid (mixed chars, inconsistent width)
Behavior:
- Empty string always returns level 0
- First non-empty indent establishes the character type and unit width
- Subsequent indents must use the same character and be exact multiples of the unit
func (*IndentTracker) Reset ¶
func (t *IndentTracker) Reset()
Reset clears the tracker state, allowing a new indent unit to be established.
func (*IndentTracker) Width ¶
func (t *IndentTracker) Width() int
Width returns the width of one indentation unit.
type InterpResult ¶
type InterpResult struct {
Kind InterpResultKind
Name string // The identifier name (for InterpValid)
Raw bool // Whether :raw modifier was present
Error string // Error message (for InterpError)
}
InterpResult holds the result of scanning an interpolation.
func ScanInterpolation ¶
ScanInterpolation attempts to scan an interpolation starting at pos. The character at input[pos] must be '{'.
Returns:
- result: The scan result (valid interpolation, escape, not-interp, or error)
- end: The position after the scanned content
For InterpNotInterp, end equals pos (nothing consumed). For InterpEscapedOpen, end is pos+2 (consumed "{{"). For InterpValid/InterpError, end is position after the closing '}'.
type InterpResultKind ¶
type InterpResultKind int
InterpResultKind indicates the result of scanning for an interpolation.
const ( // InterpValid indicates a valid interpolation was found. InterpValid InterpResultKind = iota // InterpEscapedOpen indicates an escaped open brace {{ was found. InterpEscapedOpen // InterpNotInterp indicates the { is not an interpolation start. InterpNotInterp // InterpError indicates a malformed interpolation. InterpError )
func (InterpResultKind) String ¶
func (k InterpResultKind) String() string
String returns the string representation of the result kind.
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer performs lexical analysis on Buildfile source code.
func (*Lexer) IndentTracker ¶
func (l *Lexer) IndentTracker() *IndentTracker
IndentTracker returns the lexer's indentation tracker.
func (*Lexer) PeekIsVariableLine ¶
PeekIsVariableLine checks if the current line is a variable definition. Returns true if '=' appears before ':' (or there is no ':' at all). This peeks ahead without consuming any tokens.
func (*Lexer) PeekNextIsBlock ¶
PeekNextIsBlock returns true if the next non-whitespace content is the block keyword.
func (*Lexer) PeekNextIsDotKeyword ¶
PeekNextIsDotKeyword returns true if the next non-whitespace character starts a dot keyword (like .shell, .after). This is used by the parser to decide whether to use command mode or normal mode for recipe lines.
func (*Lexer) SetCommandMode ¶
func (l *Lexer) SetCommandMode()
SetCommandMode switches the lexer to command mode where ALL spaces are preserved. This is used by the parser when lexing command lines in recipes.
func (*Lexer) SetNormalMode ¶
func (l *Lexer) SetNormalMode()
SetNormalMode switches the lexer back to normal mode.
func (*Lexer) SetValueMode ¶
func (l *Lexer) SetValueMode()
SetValueMode switches the lexer to value mode where leading spaces are skipped but internal spaces are preserved.
type LexerMode ¶
type LexerMode int
LexerMode represents the current lexing mode.
const ( // ModeLineStart is at the beginning of a line, consuming indentation. ModeLineStart LexerMode = iota // ModeNormal is normal token recognition. ModeNormal // ModeValue is consuming the rest of line as a value (after = or :). ModeValue // ModeInterp is inside an interpolation {}. ModeInterp // ModeCommand is for recipe command lines where all spaces are significant. ModeCommand )
type SourceLocation ¶
type SourceLocation struct {
File string // Source file path
Line int // 1-based line number
Column int // 1-based column number
}
SourceLocation represents a position in the source file.
func (SourceLocation) String ¶
func (l SourceLocation) String() string
String returns a human-readable representation of the source location.
type Token ¶
type Token struct {
Type TokenType
Literal string
Location SourceLocation
}
Token represents a lexical token with its type, literal value, and location.
type TokenType ¶
type TokenType int
TokenType represents the type of a lexical token.
const ( // Special tokens EOF TokenType = iota NEWLINE INDENT COMMENT ERROR // Dot keywords (directives) DOT_SHELL DOT_PARALLEL DOT_DEFAULT DOT_INCLUDE DOT_ENVIRONMENT DOT_USING DOT_SOURCE DOT_ARGS DOT_REQUIRES DOT_AFTER DOT_AUTODEPS // Keywords LAZY IF ELIF ELSE END IFDEF IFNDEF BLOCK // Operators EQUALS // = COLON // : DOUBLE_EQUALS // == NOT_EQUALS // != LPAREN // ( RPAREN // ) COMMA // , // Identifiers and literals IDENTIFIER // alphanumeric + underscore AT_IDENTIFIER // @name (phony target) PATH // file path STRING // general string content // Interpolation INTERP_START // { INTERP_END // } INTERP_MOD // :raw ESCAPE_LBRACE // {{ ESCAPE_RBRACE // }} // Built-in functions FUNC_SHELL FUNC_GLOB FUNC_BASENAME FUNC_DIRNAME FUNC_REPLACE )
func LookupDotKeyword ¶
LookupDotKeyword checks if a string is a dot keyword. Returns the token type and true if found, EOF and false otherwise.
func LookupKeyword ¶
LookupKeyword checks if an identifier is a keyword or function name. Returns IDENTIFIER if not found.
func (TokenType) IsDotKeyword ¶
IsDotKeyword returns true if the token type is a dot keyword (directive).
func (TokenType) IsFunction ¶
IsFunction returns true if the token type is a built-in function.