lexer

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2025 License: Apache-2.0 Imports: 1 Imported by: 0

Documentation

Overview

Package lexer implements lexical analysis for Buildfiles.

The lexer tokenizes Buildfile source code into a stream of tokens, handling:

  • Indentation tracking with consistent space/tab enforcement
  • Interpolation boundary detection ({var} syntax)
  • Multiple lexing modes (line start, normal, value, interpolation)
  • Escape sequences ({{ and }})

Lexer Modes

The lexer operates in different modes depending on context:

  • ModeLineStart: At beginning of line, consuming indentation
  • ModeNormal: Normal token recognition (targets, identifiers, paths)
  • ModeValue: After = or :, consuming value content as strings
  • ModeInterp: Inside {} interpolation, lexing identifier and modifier

Mode transitions are managed with a stack to handle nested contexts, particularly for interpolations within values.

Token Types

Token types are categorized as:

  • Special: EOF, NEWLINE, INDENT, COMMENT, ERROR
  • Dot Keywords: .shell, .parallel, .default, .include, etc.
  • Keywords: lazy, if, elif, else, end, ifdef, ifndef, block
  • Operators: =, :, ==, !=, (, ), ,
  • Identifiers: IDENTIFIER, AT_IDENTIFIER, PATH, STRING
  • Interpolation: INTERP_START, INTERP_END, INTERP_MOD, escapes
  • Functions: shell, glob, basename, dirname, replace

Indentation Tracking

The IndentTracker enforces consistent indentation across a Buildfile:

  • First indented line establishes the indent unit (e.g., 4 spaces or 1 tab)
  • Subsequent indents must be exact multiples of this unit
  • Mixed tabs and spaces within a single indent string is an error

Interpolation Boundary Detection

A { character is recognized as interpolation start if and only if:

  • Preceded by a boundary character (whitespace, :, =, /, quotes, parens, etc.)
  • Followed by a valid identifier start character (letter or underscore)

This distinguishes interpolations from shell variable syntax ($var) and literal braces in commands.

Package lexer implements the lexical analysis phase for Buildfile parsing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsInterpBoundary

func IsInterpBoundary(prev byte, atSOL bool) bool

IsInterpBoundary returns true if prev is a valid boundary character for interpolation recognition.

Per DESIGN.md Section 2.3.2 (extended for practical use): `{` is recognized as INTERP_START if preceded by:

  • whitespace (space or tab)
  • start-of-line (atSOL=true)
  • `:` or `=` (operators)
  • `/` (path separator, for patterns like {dir}/{file})
  • `"` or `'` (quotes, for strings like "{var}")
  • `(`, `)`, `,` (function call syntax)
  • `>`, `<` (shell redirections)

func IsValidIdentifierChar

func IsValidIdentifierChar(c byte) bool

IsValidIdentifierChar returns true if c can be part of an identifier. Per spec: letter, digit, underscore, or dot (for target.dir, target.file).

func IsValidIdentifierStart

func IsValidIdentifierStart(c byte) bool

IsValidIdentifierStart returns true if c can start an identifier. Per spec: letter or underscore.

func ScanEscapedCloseBrace

func ScanEscapedCloseBrace(input string, pos int) (bool, int)

ScanEscapedCloseBrace checks if there's an escaped close brace }} at pos. Returns (true, pos+2) if escape found, (false, pos) otherwise.

Types

type IndentChar

type IndentChar int

IndentChar represents the type of character used for indentation.

const (
	// IndentUnknown indicates indentation character not yet determined.
	IndentUnknown IndentChar = iota
	// IndentSpace indicates spaces are used for indentation.
	IndentSpace
	// IndentTab indicates tabs are used for indentation.
	IndentTab
)

func (IndentChar) String

func (c IndentChar) String() string

String returns the string representation of the indent character type.

type IndentError

type IndentError struct {
	Message string
	Line    int
	Column  int
}

IndentError represents an indentation-related error.

func (*IndentError) Error

func (e *IndentError) Error() string

Error implements the error interface.

type IndentTracker

type IndentTracker struct {
	// contains filtered or unexported fields
}

IndentTracker tracks indentation state across lines in a Buildfile.

Per DESIGN.md Section 2.3.3:

  • First indented line establishes the indent unit (e.g., 4 spaces)
  • Subsequent indents must be multiples of this unit
  • Mixed tabs/spaces after first line is an error
  • Level 0 = column 0, Level 1 = one unit, Level 2 = two units

func NewIndentTracker

func NewIndentTracker() *IndentTracker

NewIndentTracker creates a new IndentTracker with no established indent unit.

func (*IndentTracker) Char

func (t *IndentTracker) Char() IndentChar

Char returns the established indentation character type.

func (*IndentTracker) Process

func (t *IndentTracker) Process(indent string) (int, error)

Process analyzes an indentation string and returns the logical indentation level. The input should be the leading whitespace of a line (spaces and/or tabs only).

Returns:

  • level: The logical indentation level (0, 1, 2, ...)
  • error: Non-nil if the indentation is invalid (mixed chars, inconsistent width)

Behavior:

  • Empty string always returns level 0
  • First non-empty indent establishes the character type and unit width
  • Subsequent indents must use the same character and be exact multiples of the unit

func (*IndentTracker) Reset

func (t *IndentTracker) Reset()

Reset clears the tracker state, allowing a new indent unit to be established.

func (*IndentTracker) Width

func (t *IndentTracker) Width() int

Width returns the width of one indentation unit.

type InterpResult

type InterpResult struct {
	Kind  InterpResultKind
	Name  string // The identifier name (for InterpValid)
	Raw   bool   // Whether :raw modifier was present
	Error string // Error message (for InterpError)
}

InterpResult holds the result of scanning an interpolation.

func ScanInterpolation

func ScanInterpolation(input string, pos int, prev byte, atSOL bool) (InterpResult, int)

ScanInterpolation attempts to scan an interpolation starting at pos. The character at input[pos] must be '{'.

Returns:

  • result: The scan result (valid interpolation, escape, not-interp, or error)
  • end: The position after the scanned content

For InterpNotInterp, end equals pos (nothing consumed). For InterpEscapedOpen, end is pos+2 (consumed "{{"). For InterpValid/InterpError, end is position after the closing '}'.

type InterpResultKind

type InterpResultKind int

InterpResultKind indicates the result of scanning for an interpolation.

const (
	// InterpValid indicates a valid interpolation was found.
	InterpValid InterpResultKind = iota
	// InterpEscapedOpen indicates an escaped open brace {{ was found.
	InterpEscapedOpen
	// InterpNotInterp indicates the { is not an interpolation start.
	InterpNotInterp
	// InterpError indicates a malformed interpolation.
	InterpError
)

func (InterpResultKind) String

func (k InterpResultKind) String() string

String returns the string representation of the result kind.

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer performs lexical analysis on Buildfile source code.

func New

func New(file, input string) *Lexer

New creates a new Lexer for the given source.

func (*Lexer) IndentTracker

func (l *Lexer) IndentTracker() *IndentTracker

IndentTracker returns the lexer's indentation tracker.

func (*Lexer) NextToken

func (l *Lexer) NextToken() Token

NextToken returns the next token from the input.

func (*Lexer) PeekIsVariableLine

func (l *Lexer) PeekIsVariableLine() bool

PeekIsVariableLine checks if the current line is a variable definition. Returns true if '=' appears before ':' (or there is no ':' at all). This peeks ahead without consuming any tokens.

func (*Lexer) PeekNextIsBlock

func (l *Lexer) PeekNextIsBlock() bool

PeekNextIsBlock returns true if the next non-whitespace content is the block keyword.

func (*Lexer) PeekNextIsDotKeyword

func (l *Lexer) PeekNextIsDotKeyword() bool

PeekNextIsDotKeyword returns true if the next non-whitespace character starts a dot keyword (like .shell, .after). This is used by the parser to decide whether to use command mode or normal mode for recipe lines.

func (*Lexer) SetCommandMode

func (l *Lexer) SetCommandMode()

SetCommandMode switches the lexer to command mode where ALL spaces are preserved. This is used by the parser when lexing command lines in recipes.

func (*Lexer) SetNormalMode

func (l *Lexer) SetNormalMode()

SetNormalMode switches the lexer back to normal mode.

func (*Lexer) SetValueMode

func (l *Lexer) SetValueMode()

SetValueMode switches the lexer to value mode where leading spaces are skipped but internal spaces are preserved.

type LexerMode

type LexerMode int

LexerMode represents the current lexing mode.

const (
	// ModeLineStart is at the beginning of a line, consuming indentation.
	ModeLineStart LexerMode = iota
	// ModeNormal is normal token recognition.
	ModeNormal
	// ModeValue is consuming the rest of line as a value (after = or :).
	ModeValue
	// ModeInterp is inside an interpolation {}.
	ModeInterp
	// ModeCommand is for recipe command lines where all spaces are significant.
	ModeCommand
)

type SourceLocation

type SourceLocation struct {
	File   string // Source file path
	Line   int    // 1-based line number
	Column int    // 1-based column number
}

SourceLocation represents a position in the source file.

func (SourceLocation) String

func (l SourceLocation) String() string

String returns a human-readable representation of the source location.

type Token

type Token struct {
	Type     TokenType
	Literal  string
	Location SourceLocation
}

Token represents a lexical token with its type, literal value, and location.

func (Token) String

func (t Token) String() string

String returns a human-readable representation of the token.

type TokenType

type TokenType int

TokenType represents the type of a lexical token.

const (
	// Special tokens
	EOF TokenType = iota
	NEWLINE
	INDENT
	COMMENT
	ERROR

	// Dot keywords (directives)
	DOT_SHELL
	DOT_PARALLEL
	DOT_DEFAULT
	DOT_INCLUDE
	DOT_ENVIRONMENT
	DOT_USING
	DOT_SOURCE
	DOT_ARGS
	DOT_REQUIRES
	DOT_AFTER
	DOT_AUTODEPS

	// Keywords
	LAZY
	IF
	ELIF
	ELSE
	END
	IFDEF
	IFNDEF
	BLOCK

	// Operators
	EQUALS        // =
	COLON         // :
	DOUBLE_EQUALS // ==
	NOT_EQUALS    // !=
	LPAREN        // (
	RPAREN        // )
	COMMA         // ,

	// Identifiers and literals
	IDENTIFIER    // alphanumeric + underscore
	AT_IDENTIFIER // @name (phony target)
	PATH          // file path
	STRING        // general string content

	// Interpolation
	INTERP_START  // {
	INTERP_END    // }
	INTERP_MOD    // :raw
	ESCAPE_LBRACE // {{
	ESCAPE_RBRACE // }}

	// Built-in functions
	FUNC_SHELL
	FUNC_GLOB
	FUNC_BASENAME
	FUNC_DIRNAME
	FUNC_REPLACE
)

func LookupDotKeyword

func LookupDotKeyword(s string) (TokenType, bool)

LookupDotKeyword checks if a string is a dot keyword. Returns the token type and true if found, EOF and false otherwise.

func LookupKeyword

func LookupKeyword(ident string) TokenType

LookupKeyword checks if an identifier is a keyword or function name. Returns IDENTIFIER if not found.

func (TokenType) IsDotKeyword

func (t TokenType) IsDotKeyword() bool

IsDotKeyword returns true if the token type is a dot keyword (directive).

func (TokenType) IsFunction

func (t TokenType) IsFunction() bool

IsFunction returns true if the token type is a built-in function.

func (TokenType) IsKeyword

func (t TokenType) IsKeyword() bool

IsKeyword returns true if the token type is a control keyword.

func (TokenType) String

func (t TokenType) String() string

String returns the string representation of the token type.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL