lexer

package module

v0.8.0 Latest Latest Go to latest Published: Jan 2, 2013 License: BSD-3-Clause Imports: 8 Imported by: 20

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/iNamik/go_lexer

Links

Open Source Insights

README ¶

go_lexer

Lexer API in Go

ABOUT

The 'go_lexer' package is an API to help you create hand-written lexers and parsers.

The package was inspired by Rob Pikes' video Lexical Scanning In Go and golang's 'template' package.

LEXER INTERFACE

Below is the interface for the main Lexer type:

// lexer.Lexer helps you tokenize bytes
type Lexer interface {

	// PeekRune allows you to look ahead at runes without consuming them
	PeekRune(int) rune

	// NetRune consumes and returns the next rune in the input
	NextRune() rune

	// BackupRune un-conumes the last rune from the input
	BackupRune()

	// BackupRunes un-consumes the last n runes from the input
	BackupRunes(int)

	// NewLine increments the line number counter, resets the column counter
	NewLine()

	// Line returns the current line number, 1-based
	Line() int

	// Column returns the current column number, 1-based
	Column() int

	// EmitToken emits a token of the specified type, consuming matched runes
	// without emitting them
	EmitToken(TokenType)

	// EmitTokenWithBytes emits a token along with all the consumed runes
	EmitTokenWithBytes(TokenType)

	// IgnoreToken ignores the consumed bytes without emitting any tokens
	IgnoreToken()

	// EmitEOF emits a token of type TokenEOF
	EmitEOF()

	// NextToken retrieves the next emmitted token from the input
	NextToken() *Token

	// Marker returns a marker that you can use to reset the lexer state later
	Marker() *Marker

	// CanReset confirms if the marker is still valid
	CanReset(*Marker) bool

	// Reset resets the lexer state to the specified marker
	Reset(*Marker)

	// MatchZeroOrOneBytes consumes the next rune if it matches, always returning true
	MatchZeroOrOneBytes([]byte) bool

	// MatchZeroOrOneRuness consumes the next rune if it matches, always returning true
	MatchZeroOrOneRunes([]rune) bool

	// MatchZeroOrOneRune consumes the next rune if it matches, always returning true
	MatchZeroOrOneRune(rune) bool

	// MatchZeroOrOneFunc consumes the next rune if it matches, always returning true
	MatchZeroOrOneFunc(MatchFn) bool

	// MatchZeroOrMoreBytes consumes a run of matching runes, always returning true
	MatchZeroOrMoreBytes([]byte) bool

	// MatchZeroOrMoreRunes consumes a run of matching runes, always returning true
	MatchZeroOrMoreRunes([]rune) bool

	// MatchZeroOrMoreFunc consumes a run of matching runes, always returning true
	MatchZeroOrMoreFunc(MatchFn) bool

	// MatchOneBytes consumes the next rune if its in the list of bytes
	MatchOneBytes([]byte) bool

	// MatchOneRune consumes the next rune if its in the list of bytes
	MatchOneRunes([]rune) bool

	// MatchOneRune consumes the next rune if it matches
	MatchOneRune(rune) bool

	// MatchOneFunc consumes the next rune if it matches
	MatchOneFunc(MatchFn) bool

	// MatchOneOrMoreBytes consumes a run of matching runes
	MatchOneOrMoreBytes([]byte) bool

	// MatchOneOrMoreRunes consumes a run of matching runes
	MatchOneOrMoreRunes([]rune) bool

	// MatchOneOrMoreFunc consumes a run of matching runes
	MatchOneOrMoreFunc(MatchFn) bool

	// MatchMinMaxBytes consumes a specified run of matching runes
	MatchMinMaxBytes([]byte, int, int) bool

	// MatchMinMaxRunes consumes a specified run of matching runes
	MatchMinMaxRunes([]rune, int, int) bool

	// MatchMinMaxFunc consumes a specified run of matching runes
	MatchMinMaxFunc(MatchFn, int, int) bool

	// NonMatchZeroOrOneBytes consumes the next rune if it does not match, always returning true
	NonMatchZeroOrOneBytes([]byte) bool

	// NonMatchZeroOrOneRunes consumes the next rune if it does not match, always returning true
	NonMatchZeroOrOneRunes([]rune) bool

	// NonMatchZeroOrOneFunc consumes the next rune if it does not match, always returning true
	NonMatchZeroOrOneFunc(MatchFn) bool

	// NonMatchZeroOrMoreBytes consumes a run of non-matching runes, always returning true
	NonMatchZeroOrMoreBytes([]byte) bool

	// NonMatchZeroOrMoreRunes consumes a run of non-matching runes, always returning true
	NonMatchZeroOrMoreRunes([]rune) bool

	// NonMatchZeroOrMoreFunc consumes a run of non-matching runes, always returning true
	NonMatchZeroOrMoreFunc(MatchFn) bool

	// NonMatchOneBytes consumes the next rune if its NOT in the list of bytes
	NonMatchOneBytes([]byte) bool

	// NonMatchOneRunes consumes the next rune if its NOT in the list of runes
	NonMatchOneRunes([]rune) bool

	// NonMatchOneFunc consumes the next rune if it does NOT match
	NonMatchOneFunc(MatchFn) bool

	// NonMatchOneOrMoreBytes consumes a run of non-matching runes
	NonMatchOneOrMoreBytes([]byte) bool

	// NonMatchOneOrMoreRunes consumes a run of non-matching runes
	NonMatchOneOrMoreRunes([]rune) bool

	// NonMatchOneOrMoreFunc consumes a run of non-matching runes
	NonMatchOneOrMoreFunc(MatchFn) bool

	// MatchEOF tries to match the next rune against RuneEOF
	MatchEOF() bool
}

EXAMPLE

Below is a sample word count program that uses the lexer API:

package main

import "os"
import "fmt"
import "github.com/iNamik/go_lexer"

// Usage : wordcount <filename>
func usage() {
	fmt.Printf("usage: %s <filename>\n", os.Args[0])
}

// We define our lexer tokens starting from the pre-defined EOF token
const (
	T_EOF lexer.TokenType = lexer.TokenTypeEOF
	T_NIL                 = lexer.TokenTypeEOF + iota
	T_SPACE
	T_NEWLINE
	T_WORD
)

// List gleaned from isspace(3) manpage
var bytesNonWord = []byte{' ', '\t', '\f', '\v', '\n', '\r'}

var bytesSpace = []byte{' ', '\t', '\f', '\v'}

const charNewLine = '\n'

const charReturn = '\r'

func main() {
	if len(os.Args) < 2 {
		usage()
		return
	}

	var file *os.File
	var error error

	file, error = os.Open(os.Args[1])

	if error != nil {
		panic(error)
	}

	var chars int = 0

	var words int = 0

	var spaces int = 0

	var lines int = 0

	// To help us track last line
	var emptyLine bool = true

	// Create our lexer
	// NewSize(startState, reader, readerBufLen, channelCap)
	lex := lexer.NewSize(lexFunc, file, 100, 1)

	var lastTokenType lexer.TokenType = T_NIL

	// Process lexer-emitted tokens
	for t := lex.NextToken(); lexer.TokenTypeEOF != t.Type(); t = lex.NextToken() {

		chars += len(t.Bytes())

		switch t.Type() {
		case T_WORD:
			if lastTokenType != T_WORD {
				words++
			}
			emptyLine = false

		case T_NEWLINE:
			lines++
			spaces++
			emptyLine = true

		case T_SPACE:
			spaces += len(t.Bytes())
			emptyLine = false

		default:
			panic("unreachable")
		}

		lastTokenType = t.Type()
	}

	// If last line not empty, up line count
	if !emptyLine {
		lines++
	}

	fmt.Printf("%d words, %d spaces, %d lines, %d chars\n", words, spaces, lines, chars)
}

func lexFunc(l lexer.Lexer) lexer.StateFn {
	// EOF
	if l.MatchEOF() {
		l.EmitEOF()
		return nil // We're done here
	}

	// Non-Space run
	if l.NonMatchOneOrMoreBytes(bytesNonWord) {
		l.EmitTokenWithBytes(T_WORD)

		// Space run
	} else if l.MatchOneOrMoreBytes(bytesSpace) {
		l.EmitTokenWithBytes(T_SPACE)

		// Line Feed
	} else if charNewLine == l.PeekRune(0) {
		l.NextRune()
		l.EmitTokenWithBytes(T_NEWLINE)
		l.NewLine()

		// Carriage-Return with optional line-feed immediately following
	} else if charReturn == l.PeekRune(0) {
		l.NextRune()
		if charNewLine == l.PeekRune(0) {
			l.NextRune()
		}
		l.EmitTokenWithBytes(T_NEWLINE)
		l.NewLine()
	} else {
		panic("unreachable")
	}

	return lexFunc
}

INSTALL

The package is built using the Go tool. Assuming you have correctly set the $GOPATH variable, you can run the folloing command:

go get github.com/iNamik/go_lexer

DEPENDENCIES

AUTHORS

David Farrell

Documentation ¶

Constants ¶

View Source

const RuneEOF = -1

Rune represending EOF

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Lexer ¶

type Lexer interface {

	// PeekRune allows you to look ahead at runes without consuming them
	PeekRune(int) rune

	// NetRune consumes and returns the next rune in the input
	NextRune() rune

	// BackupRune un-conumes the last rune from the input
	BackupRune()

	// BackupRunes un-consumes the last n runes from the input
	BackupRunes(int)

	// NewLine increments the line number counter, resets the column counter
	NewLine()

	// Line returns the current line number, 1-based
	Line() int

	// Column returns the current column number, 1-based
	Column() int

	// EmitToken emits a token of the specified type, consuming matched runes
	// without emitting them
	EmitToken(TokenType)

	// EmitTokenWithBytes emits a token along with all the consumed runes
	EmitTokenWithBytes(TokenType)

	// IgnoreToken ignores the consumed bytes without emitting any tokens
	IgnoreToken()

	// EmitEOF emits a token of type TokenEOF
	EmitEOF()

	// NextToken retrieves the next emmitted token from the input
	NextToken() *Token

	// Marker returns a marker that you can use to reset the lexer state later
	Marker() *Marker

	// CanReset confirms if the marker is still valid
	CanReset(*Marker) bool

	// Reset resets the lexer state to the specified marker
	Reset(*Marker)

	// MatchZeroOrOneBytes consumes the next rune if it matches, always returning true
	MatchZeroOrOneBytes([]byte) bool

	// MatchZeroOrOneRuness consumes the next rune if it matches, always returning true
	MatchZeroOrOneRunes([]rune) bool

	// MatchZeroOrOneRune consumes the next rune if it matches, always returning true
	MatchZeroOrOneRune(rune) bool

	// MatchZeroOrOneFunc consumes the next rune if it matches, always returning true
	MatchZeroOrOneFunc(MatchFn) bool

	// MatchZeroOrMoreBytes consumes a run of matching runes, always returning true
	MatchZeroOrMoreBytes([]byte) bool

	// MatchZeroOrMoreRunes consumes a run of matching runes, always returning true
	MatchZeroOrMoreRunes([]rune) bool

	// MatchZeroOrMoreFunc consumes a run of matching runes, always returning true
	MatchZeroOrMoreFunc(MatchFn) bool

	// MatchOneBytes consumes the next rune if its in the list of bytes
	MatchOneBytes([]byte) bool

	// MatchOneRune consumes the next rune if its in the list of bytes
	MatchOneRunes([]rune) bool

	// MatchOneRune consumes the next rune if it matches
	MatchOneRune(rune) bool

	// MatchOneFunc consumes the next rune if it matches
	MatchOneFunc(MatchFn) bool

	// MatchOneOrMoreBytes consumes a run of matching runes
	MatchOneOrMoreBytes([]byte) bool

	// MatchOneOrMoreRunes consumes a run of matching runes
	MatchOneOrMoreRunes([]rune) bool

	// MatchOneOrMoreFunc consumes a run of matching runes
	MatchOneOrMoreFunc(MatchFn) bool

	// MatchMinMaxBytes consumes a specified run of matching runes
	MatchMinMaxBytes([]byte, int, int) bool

	// MatchMinMaxRunes consumes a specified run of matching runes
	MatchMinMaxRunes([]rune, int, int) bool

	// MatchMinMaxFunc consumes a specified run of matching runes
	MatchMinMaxFunc(MatchFn, int, int) bool

	// NonMatchZeroOrOneBytes consumes the next rune if it does not match, always returning true
	NonMatchZeroOrOneBytes([]byte) bool

	// NonMatchZeroOrOneRunes consumes the next rune if it does not match, always returning true
	NonMatchZeroOrOneRunes([]rune) bool

	// NonMatchZeroOrOneFunc consumes the next rune if it does not match, always returning true
	NonMatchZeroOrOneFunc(MatchFn) bool

	// NonMatchZeroOrMoreBytes consumes a run of non-matching runes, always returning true
	NonMatchZeroOrMoreBytes([]byte) bool

	// NonMatchZeroOrMoreRunes consumes a run of non-matching runes, always returning true
	NonMatchZeroOrMoreRunes([]rune) bool

	// NonMatchZeroOrMoreFunc consumes a run of non-matching runes, always returning true
	NonMatchZeroOrMoreFunc(MatchFn) bool

	// NonMatchOneBytes consumes the next rune if its NOT in the list of bytes
	NonMatchOneBytes([]byte) bool

	// NonMatchOneRunes consumes the next rune if its NOT in the list of runes
	NonMatchOneRunes([]rune) bool

	// NonMatchOneFunc consumes the next rune if it does NOT match
	NonMatchOneFunc(MatchFn) bool

	// NonMatchOneOrMoreBytes consumes a run of non-matching runes
	NonMatchOneOrMoreBytes([]byte) bool

	// NonMatchOneOrMoreRunes consumes a run of non-matching runes
	NonMatchOneOrMoreRunes([]rune) bool

	// NonMatchOneOrMoreFunc consumes a run of non-matching runes
	NonMatchOneOrMoreFunc(MatchFn) bool

	// MatchEOF tries to match the next rune against RuneEOF
	MatchEOF() bool
}

lexer.Lexer helps you tokenize bytes

func New ¶

func New(startState StateFn, reader io.Reader, channelCap int) Lexer

New returns a new Lexer object with an unlimited read-buffer

func NewFromBytes ¶

func NewFromBytes(startState StateFn, input []byte, channelCap int) Lexer

NewFromBytes returns a new Lexer object for the specified byte array

func NewFromString ¶

func NewFromString(startState StateFn, input string, channelCap int) Lexer

NewFromString returns a new Lexer object for the specified string

func NewSize ¶

func NewSize(startState StateFn, reader io.Reader, readerBufLen int, channelCap int) Lexer

NewSize returns a new Lexer object for the specified reader and read-buffer size

type Marker ¶

type Marker struct {
	// contains filtered or unexported fields
}

Marker stores the state of the lexer to allow rewinding

type MatchFn ¶

type MatchFn func(rune) bool

MatchFn represents a callback function for matching runes that are not feasable for a range

type StateFn ¶

type StateFn func(Lexer) StateFn

StateFn represents the state of the scanner as a function that returns the next state.

type Token ¶

type Token struct {
	// contains filtered or unexported fields
}

Token represents a token (with optional text string) returned from the scanner.

func (*Token) Bytes ¶

func (t *Token) Bytes() []byte

Bytes returns the byte array associated with the token, or nil if none

func (*Token) Column ¶

func (t *Token) Column() int

Column returns the column number of the token

func (*Token) EOF ¶

func (t *Token) EOF() bool

EOF returns true if the TokenType == TokenTypeEOF

func (*Token) Line ¶

func (t *Token) Line() int

Line returns the line number of the token

func (*Token) Type ¶

func (t *Token) Type() TokenType

Type returns the TokenType of the token

type TokenType ¶

type TokenType int

TokenType identifies the type of lex tokens.

const TokenTypeEOF TokenType = -1

TokenType representing EOF

const TokenTypeUnknown TokenType = -2

TokenType representing an unknown rune(s)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples
word_count
rangeutil Package rangeutil provides services for conversion and iteration of range specifications for use with iNamik/go_lexer Currently, a 'range specifiction' is defined as a string of unicode characters, with the ability to specify a range of chararacters by using a '-' between two characters.	Package rangeutil provides services for conversion and iteration of range specifications for use with iNamik/go_lexer Currently, a 'range specifiction' is defined as a string of unicode characters, with the ability to specify a range of chararacters by using a '-' between two characters.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL