lexer

package

v0.1.1 Latest Latest Go to latest Published: Dec 19, 2025 License: Apache-2.0 Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/vinayprograms/build

Links

Open Source Insights

Documentation ¶

Overview ¶

Package lexer implements lexical analysis for Buildfiles.

The lexer tokenizes Buildfile source code into a stream of tokens, handling:

Indentation tracking with consistent space/tab enforcement
Interpolation boundary detection ({var} syntax)
Multiple lexing modes (line start, normal, value, interpolation)
Escape sequences ({{ and }})

Lexer Modes ¶

The lexer operates in different modes depending on context:

ModeLineStart: At beginning of line, consuming indentation
ModeNormal: Normal token recognition (targets, identifiers, paths)
ModeValue: After = or :, consuming value content as strings
ModeInterp: Inside {} interpolation, lexing identifier and modifier

Mode transitions are managed with a stack to handle nested contexts, particularly for interpolations within values.

Token Types ¶

Token types are categorized as:

Special: EOF, NEWLINE, INDENT, COMMENT, ERROR
Dot Keywords: .shell, .parallel, .default, .include, etc.
Keywords: lazy, if, elif, else, end, ifdef, ifndef, block
Operators: =, :, ==, !=, (, ), ,
Identifiers: IDENTIFIER, AT_IDENTIFIER, PATH, STRING
Interpolation: INTERP_START, INTERP_END, INTERP_MOD, escapes
Functions: shell, glob, basename, dirname, replace

Indentation Tracking ¶

The IndentTracker enforces consistent indentation across a Buildfile:

First indented line establishes the indent unit (e.g., 4 spaces or 1 tab)
Subsequent indents must be exact multiples of this unit
Mixed tabs and spaces within a single indent string is an error

Interpolation Boundary Detection ¶

A { character is recognized as interpolation start if and only if:

Preceded by a boundary character (whitespace, :, =, /, quotes, parens, etc.)
Followed by a valid identifier start character (letter or underscore)

This distinguishes interpolations from shell variable syntax ($var) and literal braces in commands.

Package lexer implements the lexical analysis phase for Buildfile parsing.

Index ¶

func IsInterpBoundary(prev byte, atSOL bool) bool
func IsValidIdentifierChar(c byte) bool
func IsValidIdentifierStart(c byte) bool
func ScanEscapedCloseBrace(input string, pos int) (bool, int)
type IndentChar
- func (c IndentChar) String() string
type IndentError
- func (e *IndentError) Error() string
type IndentTracker
- func NewIndentTracker() *IndentTracker
- func (t *IndentTracker) Char() IndentChar
- func (t *IndentTracker) Process(indent string) (int, error)
- func (t *IndentTracker) Reset()
- func (t *IndentTracker) Width() int
type InterpResult
- func ScanInterpolation(input string, pos int, prev byte, atSOL bool) (InterpResult, int)
type InterpResultKind
- func (k InterpResultKind) String() string
type Lexer
- func New(file, input string) *Lexer
- func (l *Lexer) IndentTracker() *IndentTracker
- func (l *Lexer) NextToken() Token
- func (l *Lexer) PeekIsVariableLine() bool
- func (l *Lexer) PeekNextIsBlock() bool
- func (l *Lexer) PeekNextIsDotKeyword() bool
- func (l *Lexer) SetCommandMode()
- func (l *Lexer) SetNormalMode()
- func (l *Lexer) SetValueMode()
type LexerMode
type SourceLocation
- func (l SourceLocation) String() string
type Token
- func (t Token) String() string
type TokenType
- func LookupDotKeyword(s string) (TokenType, bool)
- func LookupKeyword(ident string) TokenType
- func (t TokenType) IsDotKeyword() bool
- func (t TokenType) IsFunction() bool
- func (t TokenType) IsKeyword() bool
- func (t TokenType) String() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IsInterpBoundary ¶

func IsInterpBoundary(prev byte, atSOL bool) bool

IsInterpBoundary returns true if prev is a valid boundary character for interpolation recognition.

Per DESIGN.md Section 2.3.2 (extended for practical use): `{` is recognized as INTERP_START if preceded by:

whitespace (space or tab)
start-of-line (atSOL=true)
`:` or `=` (operators)
`/` (path separator, for patterns like {dir}/{file})
`"` or `'` (quotes, for strings like "{var}")
`(`, `)`, `,` (function call syntax)
`>`, `<` (shell redirections)

func IsValidIdentifierChar ¶

func IsValidIdentifierChar(c byte) bool

IsValidIdentifierChar returns true if c can be part of an identifier. Per spec: letter, digit, underscore, or dot (for target.dir, target.file).

func IsValidIdentifierStart ¶

func IsValidIdentifierStart(c byte) bool

IsValidIdentifierStart returns true if c can start an identifier. Per spec: letter or underscore.

func ScanEscapedCloseBrace ¶

func ScanEscapedCloseBrace(input string, pos int) (bool, int)

ScanEscapedCloseBrace checks if there's an escaped close brace }} at pos. Returns (true, pos+2) if escape found, (false, pos) otherwise.

Types ¶

type IndentChar ¶

type IndentChar int

IndentChar represents the type of character used for indentation.

const (
	// IndentUnknown indicates indentation character not yet determined.
	IndentUnknown IndentChar = iota
	// IndentSpace indicates spaces are used for indentation.
	IndentSpace
	// IndentTab indicates tabs are used for indentation.
	IndentTab
)

func (IndentChar) String ¶

func (c IndentChar) String() string

String returns the string representation of the indent character type.

type IndentError ¶

type IndentError struct {
	Message string
	Line    int
	Column  int
}

IndentError represents an indentation-related error.

func (*IndentError) Error ¶

func (e *IndentError) Error() string

Error implements the error interface.

type IndentTracker ¶

type IndentTracker struct {
	// contains filtered or unexported fields
}

IndentTracker tracks indentation state across lines in a Buildfile.

Per DESIGN.md Section 2.3.3:

First indented line establishes the indent unit (e.g., 4 spaces)
Subsequent indents must be multiples of this unit
Mixed tabs/spaces after first line is an error
Level 0 = column 0, Level 1 = one unit, Level 2 = two units

func NewIndentTracker ¶

func NewIndentTracker() *IndentTracker

NewIndentTracker creates a new IndentTracker with no established indent unit.

func (*IndentTracker) Char ¶

func (t *IndentTracker) Char() IndentChar

Char returns the established indentation character type.

func (*IndentTracker) Process ¶

func (t *IndentTracker) Process(indent string) (int, error)

Process analyzes an indentation string and returns the logical indentation level. The input should be the leading whitespace of a line (spaces and/or tabs only).

Returns:

level: The logical indentation level (0, 1, 2, ...)
error: Non-nil if the indentation is invalid (mixed chars, inconsistent width)

Behavior:

Empty string always returns level 0
First non-empty indent establishes the character type and unit width
Subsequent indents must use the same character and be exact multiples of the unit

func (*IndentTracker) Reset ¶

func (t *IndentTracker) Reset()

Reset clears the tracker state, allowing a new indent unit to be established.

func (*IndentTracker) Width ¶

func (t *IndentTracker) Width() int

Width returns the width of one indentation unit.

type InterpResult ¶

type InterpResult struct {
	Kind  InterpResultKind
	Name  string // The identifier name (for InterpValid)
	Raw   bool   // Whether :raw modifier was present
	Error string // Error message (for InterpError)
}

InterpResult holds the result of scanning an interpolation.

func ScanInterpolation ¶

func ScanInterpolation(input string, pos int, prev byte, atSOL bool) (InterpResult, int)

ScanInterpolation attempts to scan an interpolation starting at pos. The character at input[pos] must be '{'.

Returns:

result: The scan result (valid interpolation, escape, not-interp, or error)
end: The position after the scanned content

For InterpNotInterp, end equals pos (nothing consumed). For InterpEscapedOpen, end is pos+2 (consumed "{{"). For InterpValid/InterpError, end is position after the closing '}'.

type InterpResultKind ¶

type InterpResultKind int

InterpResultKind indicates the result of scanning for an interpolation.

const (
	// InterpValid indicates a valid interpolation was found.
	InterpValid InterpResultKind = iota
	// InterpEscapedOpen indicates an escaped open brace {{ was found.
	InterpEscapedOpen
	// InterpNotInterp indicates the { is not an interpolation start.
	InterpNotInterp
	// InterpError indicates a malformed interpolation.
	InterpError
)

func (InterpResultKind) String ¶

func (k InterpResultKind) String() string

String returns the string representation of the result kind.

type Lexer ¶

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer performs lexical analysis on Buildfile source code.

func New ¶

func New(file, input string) *Lexer

New creates a new Lexer for the given source.

func (*Lexer) IndentTracker ¶

func (l *Lexer) IndentTracker() *IndentTracker

IndentTracker returns the lexer's indentation tracker.

func (*Lexer) NextToken ¶

func (l *Lexer) NextToken() Token

NextToken returns the next token from the input.

func (*Lexer) PeekIsVariableLine ¶

func (l *Lexer) PeekIsVariableLine() bool

PeekIsVariableLine checks if the current line is a variable definition. Returns true if '=' appears before ':' (or there is no ':' at all). This peeks ahead without consuming any tokens.

func (*Lexer) PeekNextIsBlock ¶

func (l *Lexer) PeekNextIsBlock() bool

PeekNextIsBlock returns true if the next non-whitespace content is the block keyword.

func (*Lexer) PeekNextIsDotKeyword ¶

func (l *Lexer) PeekNextIsDotKeyword() bool

PeekNextIsDotKeyword returns true if the next non-whitespace character starts a dot keyword (like .shell, .after). This is used by the parser to decide whether to use command mode or normal mode for recipe lines.

func (*Lexer) SetCommandMode ¶

func (l *Lexer) SetCommandMode()

SetCommandMode switches the lexer to command mode where ALL spaces are preserved. This is used by the parser when lexing command lines in recipes.

func (*Lexer) SetNormalMode ¶

func (l *Lexer) SetNormalMode()

SetNormalMode switches the lexer back to normal mode.

func (*Lexer) SetValueMode ¶

func (l *Lexer) SetValueMode()

SetValueMode switches the lexer to value mode where leading spaces are skipped but internal spaces are preserved.

type LexerMode ¶

type LexerMode int

LexerMode represents the current lexing mode.

const (
	// ModeLineStart is at the beginning of a line, consuming indentation.
	ModeLineStart LexerMode = iota
	// ModeNormal is normal token recognition.
	ModeNormal
	// ModeValue is consuming the rest of line as a value (after = or :).
	ModeValue
	// ModeInterp is inside an interpolation {}.
	ModeInterp
	// ModeCommand is for recipe command lines where all spaces are significant.
	ModeCommand
)

type SourceLocation ¶

type SourceLocation struct {
	File   string // Source file path
	Line   int    // 1-based line number
	Column int    // 1-based column number
}

SourceLocation represents a position in the source file.

func (SourceLocation) String ¶

func (l SourceLocation) String() string

String returns a human-readable representation of the source location.

type Token ¶

type Token struct {
	Type     TokenType
	Literal  string
	Location SourceLocation
}

Token represents a lexical token with its type, literal value, and location.

func (Token) String ¶

func (t Token) String() string

String returns a human-readable representation of the token.

type TokenType ¶

type TokenType int

TokenType represents the type of a lexical token.

const (
	// Special tokens
	EOF TokenType = iota
	NEWLINE
	INDENT
	COMMENT
	ERROR

	// Dot keywords (directives)
	DOT_SHELL
	DOT_PARALLEL
	DOT_DEFAULT
	DOT_INCLUDE
	DOT_ENVIRONMENT
	DOT_USING
	DOT_SOURCE
	DOT_ARGS
	DOT_REQUIRES
	DOT_AFTER
	DOT_AUTODEPS

	// Keywords
	LAZY
	IF
	ELIF
	ELSE
	END
	IFDEF
	IFNDEF
	BLOCK

	// Operators
	EQUALS        // =
	COLON         // :
	DOUBLE_EQUALS // ==
	NOT_EQUALS    // !=
	LPAREN        // (
	RPAREN        // )
	COMMA         // ,

	// Identifiers and literals
	IDENTIFIER    // alphanumeric + underscore
	AT_IDENTIFIER // @name (phony target)
	PATH          // file path
	STRING        // general string content

	// Interpolation
	INTERP_START  // {
	INTERP_END    // }
	INTERP_MOD    // :raw
	ESCAPE_LBRACE // {{
	ESCAPE_RBRACE // }}

	// Built-in functions
	FUNC_SHELL
	FUNC_GLOB
	FUNC_BASENAME
	FUNC_DIRNAME
	FUNC_REPLACE
)

func LookupDotKeyword ¶

func LookupDotKeyword(s string) (TokenType, bool)

LookupDotKeyword checks if a string is a dot keyword. Returns the token type and true if found, EOF and false otherwise.

func LookupKeyword ¶

func LookupKeyword(ident string) TokenType

LookupKeyword checks if an identifier is a keyword or function name. Returns IDENTIFIER if not found.

func (TokenType) IsDotKeyword ¶

func (t TokenType) IsDotKeyword() bool

IsDotKeyword returns true if the token type is a dot keyword (directive).

func (TokenType) IsFunction ¶

func (t TokenType) IsFunction() bool

IsFunction returns true if the token type is a built-in function.

func (TokenType) IsKeyword ¶

func (t TokenType) IsKeyword() bool

IsKeyword returns true if the token type is a control keyword.

func (TokenType) String ¶

func (t TokenType) String() string

String returns the string representation of the token type.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL