lexer

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2024 License: GPL-3.0 Imports: 9 Imported by: 0

README

lexer

Responsibilities

  • Define token type, token kinds
  • Turning streams of data into streams of tokens

Organization

The lexer is split into its interface and implementation:

  • Lexer: public facing lexer interface
  • fsplLexer: private implementation of Lexer, with public constructors

The lexer is bound to a data stream at the time of creation, and its Next() method may be called to read and return the next token from the stream.

Operation

fsplLexer carries state information about what rune from the data stream is currently being processed. This must always be filled out as long as there is still data in the stream to read from. All lexer routines start off by using this rune, and end by advancing to the next rune for the next routine to use.

The lexer follows this general flow:

  1. Upon creation, grab the first rune to initialize the lexer state
  2. When next is called...
  3. Create a new token
  4. Set the token's position
  5. Switch off of the current rune to set the token's kind and invoke specific lexing behavior
  6. Expand the token's position to cover the full range

When an EOF is detected, the lexer is marked as spent (eof: true) and will only return EOF tokens. The lexer will only return an error alongside an EOF token if the EOF was unexpected.

The lexer also keeps track of its current position in order to embed it into tokens, and to print errors. It is important that the lowest level operation used to advance the lexer's position is fsplLexer.nextRune(), as it contains logic for keeping the position correct and maintaining the current lexer state.

Documentation

Overview

Package lexer implements the lexical analysis stage of the FSPL compiler. Its job is to convert text into a series of tokens (lexemes) which are then passed to other parts of the compiler to be interpreted into more complex data structures. The lexer is able to read in new tokens as needed instead of reading them in all at once.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Lexer

type Lexer interface {
	// Next returns the next token. If there are no more tokens, it returns
	// an EOF token. It only returns an error on EOF if the file terminated
	// unexpectedly.
	Next() (Token, error)
}

Lexer is an object capable of producing tokens.

func LexFile

func LexFile(filename string) (Lexer, error)

LexFile creates a new default lexer that reads from the given file.

func LexReader

func LexReader(filename string, reader io.Reader) (Lexer, error)

LexReader creates a new default lexer that reads from the given reader. The filename parameter is used for token locations and error messages.

type Token

type Token struct {
	Position errors.Position // The position of the token in its file
	Kind     TokenKind       // Which kind of token it is
	Value    string          // The token's value
}

Token represents a single lexeme of an FSPL file.

func (Token) EOF

func (tok Token) EOF() bool

EOF returns whether or not the token is an EOF token.

func (Token) Is

func (tok Token) Is(kinds ...TokenKind) bool

Is returns whether or not the token kind matches any of the given kinds.

func (Token) String

func (tok Token) String() string

String returns a string representation of the token, which is of the form:

KIND 'VALUE'

or if the value is empty:

KIND

func (Token) ValueIs

func (tok Token) ValueIs(values ...string) bool

Is returns whether or not the token value matches any of the given values.

type TokenKind

type TokenKind int

TokenKind is an enumeration of all tokens the FSPL compiler recognizes.

const (
	EOF TokenKind = -(iota + 1)

	// Name      Rough regex-ish description
	Ident     // [a-z][a-zA-Z0-9]*
	TypeIdent // [A-Z][a-zA-Z0-9]*
	Int       // (0b|0x)?[0-9a-fA-F]+
	Float     // [0-9]*\.[0-9]+
	String    // \'.*\'

	Symbol      // [~!@#$%^&*-_=+\\|;,<>/?]+
	LParen      // \(
	LBrace      // \{
	LBracket    // \[
	RParen      // \)
	RBrace      // \}
	RBracket    // \]
	Colon       // :
	DoubleColon // ::
	Dot         // .
	DoubleDot   // ..
	Star        // \*
)

func (TokenKind) String

func (kind TokenKind) String() string

String returns a string representation of the token kind. The result for any kind corresponds directly to the name of the constant which defines it.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL