lexer

package
v0.0.0-...-db13919 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2025 License: ISC Imports: 3 Imported by: 0

Documentation

Overview

Package lexer defines abstractions and data types for constructing lexers.

A lexer, also known as a lexical analyzer or scanner, is responsible for tokenizing input source code. It processes a stream of characters and converts them into a stream of tokens, which represent meaningful units of the language. These tokens are subsequently passed to a parser for syntax analysis and the construction of parse trees.

Lexical analysis (scanning) belongs to a different domain than syntax analysis (parsing). Lexical analysis deals with regular languages and grammars (Type 3), while syntax analysis deals with context-free languages and grammars (Type 2). A lexical analyzer is, in principle, a deterministic finite automaton (DFA) with additional functionality built on top of it. Lexers can be implemented either by hand or auto-generated.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Lexer

type Lexer interface {
	// NextToken reads characters from the input source and returns the next token.
	// It may also return an error if there is an issue during tokenization.
	NextToken() (Token, error)
}

Lexer defines the interface for a lexical analyzer.

Example
package main

import (
	"fmt"
	"strings"
	"text/scanner"

	"github.com/weirdhostel/algo/grammar"
	"github.com/weirdhostel/algo/lexer"
)

func main() {
	src := strings.NewReader(`Lorem ipsum dolor sit amet, consectetur adipiscing elit,
		sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.`)

	var s scanner.Scanner
	s.Init(src)

	for tok := s.Scan(); tok != scanner.EOF; tok = s.Scan() {
		token := lexer.Token{
			Terminal: grammar.Terminal("WORD"),
			Lexeme:   s.TokenText(),
			Pos: lexer.Position{
				Filename: "lorem_ipsum",
				Offset:   s.Position.Offset,
				Line:     s.Position.Line,
				Column:   s.Position.Column,
			},
		}

		fmt.Println(token)
	}
}

type Position

type Position struct {
	Filename string // The name of the input source file (optional).
	Offset   int    // The byte offset from the beginning of the file.
	Line     int    // The line number (1-based).
	Column   int    // The column number on the line (1-based).
}

Position represents a specific location in an input source.

func (Position) Equal

func (p Position) Equal(rhs Position) bool

Equal determines whether or not two positions are the same.

func (Position) IsZero

func (p Position) IsZero() bool

IsZero checks if a position is a zero (empty) value.

func (Position) String

func (p Position) String() string

String implements the fmt.Stringer interface.

It returns a formatted string representation of the position.

type Token

type Token struct {
	grammar.Terminal
	Lexeme string
	Pos    Position
}

Token represents a unit of the input language.

A token consists of a terminal symbol, along with additional information such as the lexeme (the actual value of the token in the input) and its position in the input stream.

For example, identifiers in a programming language may have different names, but their token type (terminal symbol) is typically "ID". Similarly, the token "NUM" can have various lexeme values, representing different numerical values in the input.

func (Token) Equal

func (t Token) Equal(rhs Token) bool

Equal determines whether or not two tokens are the same.

func (Token) String

func (t Token) String() string

String implements the fmt.Stringer interface.

It returns a formatted string representation of the token.

Directories

Path Synopsis
Package input implements a two-buffer input reader.
Package input implements a two-buffer input reader.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL