lexer

package

v0.0.0-...-db13919 Latest Latest Go to latest Published: Apr 23, 2025 License: ISC Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/weirdhostel/algo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package lexer defines abstractions and data types for constructing lexers.

A lexer, also known as a lexical analyzer or scanner, is responsible for tokenizing input source code. It processes a stream of characters and converts them into a stream of tokens, which represent meaningful units of the language. These tokens are subsequently passed to a parser for syntax analysis and the construction of parse trees.

Lexical analysis (scanning) belongs to a different domain than syntax analysis (parsing). Lexical analysis deals with regular languages and grammars (Type 3), while syntax analysis deals with context-free languages and grammars (Type 2). A lexical analyzer is, in principle, a deterministic finite automaton (DFA) with additional functionality built on top of it. Lexers can be implemented either by hand or auto-generated.

Index ¶

type Lexer
type Position
type Token
- func (t Token) Equal(rhs Token) bool
- func (t Token) String() string

Examples ¶

Lexer

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Lexer ¶

type Lexer interface {
	// NextToken reads characters from the input source and returns the next token.
	// It may also return an error if there is an issue during tokenization.
	NextToken() (Token, error)
}

Lexer defines the interface for a lexical analyzer.

Example ¶

package main

import (
	"fmt"
	"strings"
	"text/scanner"

	"github.com/weirdhostel/algo/grammar"
	"github.com/weirdhostel/algo/lexer"
)

func main() {
	src := strings.NewReader(`Lorem ipsum dolor sit amet, consectetur adipiscing elit,
		sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.`)

	var s scanner.Scanner
	s.Init(src)

	for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
		token := lexer.Token{
			Terminal: grammar.Terminal("WORD"),
			Lexeme:   s.TokenText(),
			Pos: lexer.Position{
				Filename: "lorem_ipsum",
				Offset:   s.Position.Offset,
				Line:     s.Position.Line,
				Column:   s.Position.Column,
			},
		}

		fmt.Println(token)
	}
}

type Position ¶

type Position struct {
	Filename string // The name of the input source file (optional).
	Offset   int    // The byte offset from the beginning of the file.
	Line     int    // The line number (1-based).
	Column   int    // The column number on the line (1-based).
}

Position represents a specific location in an input source.

func (Position) Equal ¶

func (p Position) Equal(rhs Position) bool

Equal determines whether or not two positions are the same.

func (Position) IsZero ¶

func (p Position) IsZero() bool

IsZero checks if a position is a zero (empty) value.

func (Position) String ¶

func (p Position) String() string

String implements the fmt.Stringer interface.

It returns a formatted string representation of the position.

type Token ¶

type Token struct {
	grammar.Terminal
	Lexeme string
	Pos    Position
}

Token represents a unit of the input language.

A token consists of a terminal symbol, along with additional information such as the lexeme (the actual value of the token in the input) and its position in the input stream.

For example, identifiers in a programming language may have different names, but their token type (terminal symbol) is typically "ID". Similarly, the token "NUM" can have various lexeme values, representing different numerical values in the input.

func (Token) Equal ¶

func (t Token) Equal(rhs Token) bool

Equal determines whether or not two tokens are the same.

func (Token) String ¶

func (t Token) String() string

String implements the fmt.Stringer interface.

It returns a formatted string representation of the token.

Source Files ¶

View all Source files

lexer.go

Directories ¶

Path	Synopsis
input Package input implements a two-buffer input reader.	Package input implements a two-buffer input reader.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL