scan

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2024 License: MIT Imports: 11 Imported by: 6

README

scan

An all-purpose scanner for your lexical needs.

Go Reference

Introduction

A scanner is used to extract the fundamental and basic elements of a structured text document. These basic elements are called tokens and represent the atomic units used when parsing a document. As an example, we will use the following GeoJSON file that is featured on the front page of https://geojson.org:

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "name": "Dinagat Islands"
  }
}

Some things in this document don't matter when parsing. Whitespace is one of those things. This document is indented by two spaces but it could have been four, eight, or even zero spaces without changing its meaning.

Some things really do matter. The brackets that are used to define an object. The strings that are enclosed by double quotes. The colon that separates a property name from its value. These are the tokens of the document. Parsing a document is much easier when working with a stream of tokens instead of individual characters.

Let's look at the third line of the example. It starts with a string token with the value of geometry. It is important to capture both bits of information-- its value of geometry and its type of string. A parser for this document is interested if the next token is a string or not but is not concerned about its value. Users of the parse result will be interested in the value.

When there are errors scanning and parsing the document, good error messages are useful to help identify where problems are located. Recording the line and column number with each token is important for this reporting.

The examples directory contains a JSON scanner using the scanner provided by this package. Run it with the following:

go run cmd/scan-json/scan-json.go example.json

A table should be printed to the terminal with the stream of tokens found in the file. Its output starts with the following:

Pos                 Type  Value              Literal
example.json:1:1    {     "{"                "{"
example.json:2:5    str   "type"             '"type"'
example.json:2:11   :     ":"                ":"
example.json:2:13   str   "Feature"          '"Feature"'
example.json:2:22   ,     ","                ","
example.json:3:5    str   "geometry"         '"geometry"'
example.json:3:15   :     ":"                ":"
example.json:3:17   {     "{"                "{"
example.json:4:9    str   "type"             '"type"'

Building a scanner manually isn't difficult but it can be a bit tedious at times. This scanner provides low level functions for iterating over a stream and for building tokens. It also has higher level functions that are useful for chaining together common scanning patterns. Scanners for files that follow standard patterns can be constructed easily using the definitions found in this package. Complex or non-standard cases can stick with the lower level functions which still provide a lot of value in getting a scanner up and running quickly.

Creating a Scanner

A scanner is initialized with a filename followed by an input source which can be either an io.Reader, a string or a byte slice. If the filename is not important or doesn't apply to the input, it can set to a blank value. Use one of the corresponding new functions to create a scanner or use an init function to either initialize a zero value scanner or to reset an existing scanner.

An example of using new to read from a file:

    f, err := os.Open(filename)
    if err != nil {
        return err
    }
    defer f.Close()
    s := scan.NewScanner(filename, f)

An example of using init to read from a string:

    var s scan.Scanner
    s.InitString("", "hello world")

Basic Run Loop

When using the lower level functions, a stream is processed one rune at a time. A buffer is used to accumulate these runes until a complete token has been seen. The basic scanning loop is to look at the upcoming rune in the input and to decide to either keep the rune for the current token or to emit the token. The following fields and methods are used for this:

s.This

The rune at the current position in the input stream.

s.Next

The next rune found in the input stream. When the scanner advances, this value will become s.This

s.HasMore()

Returns true if there is more runes in the input stream. Returns false when the end of file or an error condition is encountered.

s.Keep()

Adds the current rune to the token buffer and advances the scanner to the next rune.

s.Emit()

Returns the current token and resets the token buffer.

Example

Scanning a single binary number from an input stream can be done by looping over the stream while it has data and adding to the token buffer as long as the current rune is a zero or a one:

	s := scan.NewScannerFromString("", "1010234abc!")
	for s.HasMore() {
		if s.This == '0' || s.This == '1' {
			s.Keep()
			continue
		}
		break
	}
	tok := s.Emit()
	fmt.Println(tok.Val)

	// Output:
	// 1010

Example 1

Once the scanner encounters a non-binary digit, it breaks out of the loop and emits the token accumulated so far. The remaining contents of the stream, "234abc!", are not read.

Classes

Now we will update the example to scan for an unsigned integer in base 10. The only change necessary is to now check for digits 0 through 9 instead of just 0 and 1. This check can be placed into a separate function that takes a rune as input and returns a bool indicating if the rune is one of those digits. This type of function is a scan.Class function and it is used to determine if a rune is member of that class. Example:

	isDigit09 := func(r rune) bool {
		return r >= '0' && r <= '9'
	}

	s := scan.NewScannerFromString("", "1010234abc!")
	for s.HasMore() && isDigit09(s.This) {
		s.Keep()
	}
	tok := s.Emit()
	fmt.Println(tok.Val)

	// Output:
	// 1010234

Example 2

Class Functions

The scanner package provides some functions to easily construct various rune classes:

scan.Rune()

Creates a scan.Class function that returns true if it matches any of the specified runes. An example of a rune class that checks for common whitespace:

    isSpace := scan.Rune(' ', '\n', '\t', '\f', '\r')
scan.Range()

Creates a scan.Class function that returns true if it matches a range of runes as defined by their code point values. An example of a rune class that checks for digits between 0 and 9:

    isDigit09 := scan.Range('0', '9')
scan.Or()

Creates a scan.Class function that returns true if any of the specified classes returns true. An example of a rune class that checks for hexadecimal digits:

    isDigit0F := scan.Or(
        scan.Range('0', '9'),
        scan.Range('A', 'F'),
        scan.Range('a', 'f'),
    )
scan.Not()

Creates a scan.Class function that negates another class. For example, to create a class that accepts all non-whitespace runes:

    isNotSpace := scan.Not(isSpace)
Predefined Classes

Common rune classes are provided by this package and can be used instead. These classes return true under the following conditions:

  • scan.IsAny: always, as long as there are more runes available in the stream
  • scan.IsCurrency: currency symbol as defined by Unicode
  • scan.IsDigit: digit as defined by Unicode
  • scan.IsDigit01: binary digits
  • scan.IsDigit07: octal digits
  • scan.IsDigit09: decimal digits
  • scan.IsDigit0F: hexadecimal digits
  • scan.IsLetter: letter as defined by Unicode
  • scan.IsLetterAZ: letters A through Z and a through z
  • scan.IsLetterUnder: letter or an underscore
  • scan.IsLetterDigitUnder: letter, digit, or an underscore
  • scan.IsNone: never, always returns false
  • scan.IsPrintable: printable as defined by Unicode
  • scan.IsRune8: a code point that can fit in an 8-bit number
  • scan.IsRune16: a code point that can fit in a 16-bit number
  • scan.IsWhitespace: whitespace as defined by Unicode.
Updated Example

The previous example can now be updated using the predefined class for base 10 numbers and the s.Is() method:

	s := scan.NewScannerFromString("", "1010234abc!")
	for s.HasMore() && scan.IsDigit09(s.This) {
		s.Keep()
	}
	tok := s.Emit()
	fmt.Println(tok.Val)

	// Output:
	// 1010234

Example 3

Loop Functions

A common loop while scanning is to iterate over the stream while there is more data and then break out of the loop once a condition is met. The following functions can be used in simple loop operations:

scan.While()

Consume input from the stream while characters belong to a certain class. The function takes a *scan.Scanner, a scan.Class used to check membership, and a function that should be called for each matching character. For example, to add characters to the current token while digits are seen:

    scan.While(s, scan.Digit09, s.Keep)
scan.Until()

This function is similar to scan.While() but consumes characters from the stream until a character belongs to the provided class. For example, to add any non-whitespace characters to the curren token:

    scan.Until(s, scan.Whitespace, s.Keep)
scan.Repeat()

Sometimes it is necessary to call a function like scan.Keep() a fixed number of times to advance the scanner. Use the scan.Repeat() function for this:

    scan.Repeat(s.Keep, 3)
Updated Example

The example can now be updated to remove the loop with a call to scan.While():

	s := scan.NewScannerFromString("", "1010234abc!")
	scan.While(s, scan.IsDigit09, s.Keep)
	tok := s.Emit()
	fmt.Println(tok.Val)

	// Output:
	// 1010234

Example 4

Discarding

For runes like whitespace, it useful to simply discard them without creating tokens.

s.Discard()

Resets the current token buffer, discards the current rune, and advances the scanner to the next rune. The example below now discards any whitespace found at the beginning of the stream:

	s := scan.NewScannerFromString("", " \t 1010234abc!")
	scan.While(s, scan.IsSpace, s.Discard)
	scan.While(s, scan.IsDigit09, s.Keep)
	tok := s.Emit()
	fmt.Println(tok.Val)

	// Output:
	// 1010234

Example 5

Types

The next example is going to scan the entire stream and create tokens for each integer and word that is found. A word is going to be any sequence of letters found in the stream. Integers and words must be separated by whitespace. We will now place each token type will into a separate function and each token will now be identified with a type. A type can be any string value but there are some constants already defined in the scan package. The functions return a boolean value to indicate if there was a match for that token.

The function for the integer is as follows:

    func scanInt(s *scan.Scanner) bool {
        if !scan.IsDigit09(s.This) {
            return false
        }
        s.Type = scan.IntType
        scan.While(s, scan.IsDigit09, s.Keep)
        return true
    }

Example 6

The function for the word is similar and is as follows:

    func scanWord(s *scan.Scanner) bool {
        if !scan.IsLetter(s.This) {
            return false
        }
        s.Type = scan.WordType
        scan.While(s, scan.IsLetter, s.Keep)
        return true
    }

Example 6

Since whitespace is now significant, it will be returned as a token instead of being discarded:

    func scanSpace(s *scan.Scanner) bool {
        if !scan.IsSpace(s.This) {
            return false
        }
        s.Type = scan.SpaceType
        scan.While(s, scan.IsSpace, s.Keep)
        return true
    }

Example 6

Now create a main loop that checks each function for a match and then collects all of the tokens seen along the way:

	scanFuncs := []func(*scan.Scanner) bool{
		scanSpace,
		scanWord,
		scanInt,
	}

	var toks []scan.Token
	s := scan.NewScannerFromString("example6", "abc 123 !@#  \tdef456")
	for s.HasMore() {
		match := false
		for _, fn := range scanFuncs {
			if match = fn(s); match {
				toks = append(toks, s.Emit())
				break
			}
		}
		if !match {
			scan.Until(s, scan.IsSpace, s.Keep)
			s.Illegal("unexpected %v", scan.Quote(s.Val.String()))
			toks = append(toks, s.Emit())
		}
	}
	fmt.Println(scan.FormatTokenTable(toks))

	// Output:
	//
	// Pos            Type     Value         Literal
	// example6:1:1   word     "abc"         "abc"
	// example6:1:4   space    " "           " "
	// example6:1:5   int      "123"         "123"
	// example6:1:8   space    " "           " "
	// example6:1:9   illegal  "!@#"         "!@#"
	// example6:1:12: error: unexpected "!@#"
	// example6:1:12  space    "  {!ch:\t}"  "  {!ch:\t}"
	// example6:1:15  word     "def"         "def"
	// example6:1:18  int      "456"         "456"

Example 6

For each token that is generated, we can check the tok.Type field to see if it is an int, word, space, or an illegal token. The tok.Pos field contains the name of the file (or item) being scanned, the line number, and the column number of where the token starts.

Note that the role of the scanner is to simply emit tokens of the various types seen in the input stream. The "def456" string in the stream is illegal by this example's definition because a space must separate the word and the integer. That check should be handled by the parser that is processing the tokens, not by the scanner itself.

There are a few new functions or methods introduced in this example:

s.Illegal()

This function has the same arguments as fmt.Printf() and attaches an error to the token which can be found in tok.Errs. It also sets the token type field to scan.IllegalType.

scan.Quote()

This function surrounds a value with double quotes to make it clear where the value starts and ends. If the value contains double quotes, then single quotes will be used instead. If the value contains both double and single quotes, then backticks are used. In the rare event that all quotes are in use, the value is prefixed with {!quote: and ends with }.

Non-printable characters found in values are escaped with a prefix of {!ch: and ends with }. For characters that are normally escaped, they will be displayed like {!ch:\n} otherwise they are shown with a hexadecimal number like {!ch:a0}

scan.FormatTokenTable()

This function takes a slice of tokens and returns a string with the tokens formatted in a table to be used for display. For each token, it shows the position of where the token starts, the type of token, and the value and literal. In this example, the value and literal are always the same but that does not always have to be the case. If there are errors attached to a token, those are listed on subsequent lines.

Rules

Scanning for integers, words, and spaces are common things to scan for in a text document. In the next example, the scanning functions are modified to allow configuration of what is considered a digit, a letter, and a space. Instead of defining a function, a struct will now be used that contains the configuration and the scanning function will now be a method called Eval.

These structs will be called rules and they satisfy the scan.Rule interface. Changing the integer function to a rule looks like this:

    type IntRule struct {
        isDigit scan.Class
    }

    func NewIntRule(isDigit scan.Class) IntRule {
        return IntRule{isDigit: isDigit}
    }

    func (r IntRule) Eval(s *scan.Scanner) bool {
        if !r.isDigit(s.This) {
            return false
        }
        s.Type = scan.IntType
        scan.While(s, r.isDigit, s.Keep)
        return true
    }

Example 7

With this change, this one rule can be used for scanning decimal, binary, octal, and hexadecimal numbers simply by calling new with the corresponding digit class .The rules for word and whitespace looks similar and aren't show here. The full source code for the example can be found in the Example 7 links.

Now a function is needed to handle the case where a token does not match. This also has to be configurable to allow different definitions of whitespace. This will be a function that returns a function that scans the unexpected value until the specified class is found:

func UnexpectedUntil(c scan.Class) func(*scan.Scanner) {
    return func(s *scan.Scanner) {
        scan.Until(s, c, s.Keep)
        s.Illegal("unexpected %s", scan.Quote(s.Lit.String()))
    }
}

These new rules and the no match function can now be bundled up with a scan.RuleSet. Then a scan.Runner can be created with the scanner and rule set to iterate through the tokens using runner.HasMore and runner.Next. Or all the tokens can be obtained by calling runner.All():

    rules := scan.NewRuleSet(
        NewSpaceRule(scan.Whitespace),
        NewWordRule(scan.Letter),
        NewIntRule(scan.Digit),
    ).WithNoMatchFunc(UnexpectedUntil(scan.Whitespace))

    s := scan.NewScannerFromString("example7", "abc 123 !@#  \tdef456")
    runner := scan.NewRunner(s, rules)
    toks := runner.All()
    fmt.Println(scan.FormatTokenTable(toks))

    // Output:
    //
    // Pos            Type     Value         Literal
    // example7:1:1   word     "abc"         "abc"
    // example7:1:4   space    " "           " "
    // example7:1:5   int      "123"         "123"
    // example7:1:8   space    " "           " "
    // example7:1:9   illegal  "!@#"         "!@#"
    // example7:1:12: error: unexpected "!@#"
    // example7:1:12  space    "  {!ch:\t}"  "  {!ch:\t}"
    // example7:1:15  word     "def"         "def"
    // example7:1:18  int      "456"         "456"

Example 7

Preprocessing

Sometimes it is convenient to perform some light preprocessing of the token value before passing it off to the parser. A good example of this is allowing digit separators in an integer value such as "1,000". The parser is most likely going to pass this responsibility over to strconv.ParseInt() and the comma gets in the way. Another use case is to convert runes to another rune. For example, converting all whitespace runes to a single space value or changing newlines to semicolons. The following can be used for these cases:

s.Skip()

Adds the current rune to the token's literal but not to the token's value. The scanner then advances to the next rune.

s.Value

This is a string builder containing the token value seen so far. If newlines need to be converted to semicolons, this can be done in the following way:

    if s.This == '\n' {
        s.Skip()
        s.Value.WriteRune(';')
    }

Now the integer rule can be updated to include a digit separator. The digit class is mandatory but the digit separator is optional so only the digit class is found in the constructor. The digit separator can be configured with a chaining function:

type IntRule2 struct {
	isDigit    scan.Class
	isDigitSep scan.Class
}

func NewIntRule2(isDigit scan.Class) IntRule2 {
	return IntRule2{
		isDigit:    isDigit,
		isDigitSep: scan.IsNone,
	}
}

func (r IntRule2) WithDigitSep(isDigitSep scan.Class) IntRule2 {
	r.isDigitSep = isDigitSep
	return r
}

func (r IntRule2) Eval(s *scan.Scanner) bool {
	if !r.isDigit(s.This) {
		return false
	}
	s.Keep()
	for s.HasMore() {
		if r.isDigit(s.This) {
			s.Keep()
		} else if r.isDigitSep(s.This) {
			s.Skip()
		} else {
			break
		}
	}
	return true
}

Example 8

Now update the rule set with the new rule and modify the input to include some digit separators:

	rules := scan.NewRuleSet(
		NewSpaceRule(scan.IsSpace),
		NewWordRule(scan.IsLetter),
		NewIntRule2(scan.IsDigit).
			WithDigitSep(scan.Rune(',')),
	).WithNoMatchFunc(UnexpectedUntil(scan.IsSpace))

	s := scan.NewScannerFromString("example8", "abc 1,234 !@#  \tdef45,678")
	runner := scan.NewRunner(s, rules)
	toks := runner.All()
	fmt.Println(scan.FormatTokenTable(toks))

	// Output:
	//
	// Pos            Type     Value         Literal
	// example8:1:1   word     "abc"         "abc"
	// example8:1:4   space    " "           " "
	// example8:1:5   1234     "1234"        "1,234"
	// example8:1:10  space    " "           " "
	// example8:1:11  illegal  "!@#"         "!@#"
	// example8:1:14: error: unexpected "!@#"
	// example8:1:14  space    "  {!ch:\t}"  "  {!ch:\t}"
	// example8:1:17  word     "def"         "def"
	// example8:1:20  45678    "45678"       "45,678"

Example 8

Predefined Rules

The scan package already has rules defined for reading the token types used so far. The example can now be updated to use those rules instead:

	rules := scan.NewRuleSet(
		scan.KeepSpaceRule,
		scan.NewWhileRule(scan.IsLetter, scan.WordType),
		scan.IntRule.WithDigitSep(scan.Rune(',')),
	).WithNoMatchFunc(scan.UnexpectedUntil(scan.IsSpace))

	s := scan.NewScannerFromString("example9", "abc 1,234 !@#  \tdef45,678")
	runner := scan.NewRunner(s, rules)
	toks := runner.All()
	fmt.Println(scan.FormatTokenTable(toks))

	// Output:
	//
	// Pos            Type     Value         Literal
	// example9:1:1   word     "abc"         "abc"
	// example9:1:4   space    " "           " "
	// example9:1:5   int      "1234"        "1,234"
	// example9:1:10  space    " "           " "
	// example9:1:11  illegal  "!@#"         "!@#"
	// example9:1:14: error: unexpected "!@#"
	// example9:1:14  space    "  {!ch:\t}"  "  {!ch:\t}"
	// example9:1:17  word     "def"         "def"
	// example9:1:20  int      "45678"       "45,678"

Example 9

The predefined rules are as follows:

  • scan.CharEncRule: For converting character encodings such as \n to their actual values like U+000A
  • scan.ClassRule: For runes that are a member of a class.
  • scan.CommentRule: Comments which can be configured, at runtime, to emit tokens or discard the input.
  • scan.HexEncRule: For converting hex escape sequences such as \x7f to their actual values.
  • scan.IdentRule: Identifiers used in programming languages that usually need two characters classes: one for the first character, and one for the rest. Can also be configured with a list of keywords.
  • scan.LiteralRule: For matching exact strings in the input. Useful for matching against operators, punctuation, and keywords when not using an scan.IdentRule.
  • scan.NumRule: Integer and real number values. This is quite configurable for most standard needs but it doesn't cover every possibility that can be found in the wild.
  • scan.OctEncRule: For converting octal escape sequences such as \377 to their actual values.
  • scan.StrRule: For scanning strings which are bracketed by a start and ending rune. Can be configured to include escape sequences and multiline behavior.
  • scan.SpaceRule: For discarding whitespace but can be configured to emit tokens if needed.
  • scan.WhileRule: Scanning a token while a character class is true.

Some rules have predefined configurations as well. For example scan.Hex0xRule that uses a scan.NumRule configured for hexadecimal digits that are prefixed with 0x. See the API documentation for more information.

Lookahead and lookbehind

Most scanning operations can be done by looking at the current character, which is stored in s.This and the next character in the stream that is stored in s.Next. In some cases, it can be easier to look further ahead in the stream or to even speculate and advance the scanner and bail out if the stream doesn't contain what is expected. Use the following for these cases:

s.Peek()

Looks ahead in the stream. It takes an integer value which is the number of runes ahead of the current rule it should return. Using s.Peek(0) is the same as s.This and s.Peek(1) is the same as s.Next. The lookahead value can be as large as needed. If a peek operation would go past the end of the stream, the value of scan.EndOfText is returned. A negative value can be used to look back at the literal collected so far. For example, s.Peek(-1) would return the rune that was scanned before the current one. If a negative value goes past the length of the current literal, a value of scan.EndOfText is returned.

s.Undo()

When speculating with the scanner, calling this method will place the token literal seen so far back onto the input stream and then clear the token buffer.

Full Examples

There are two full examples provided with this package. The first is a JSON scanner. The scanner for JSON is quite simple but JSON does have some quirks:

  • Numbers can start with a negative sign, but not a positive sign.
  • Numbers cannot have leading zeros. "303" is okay, but "0303" is not.
  • Real numbers cannot have empty parts. "0.3" and "3.0" is valid, but ".3" and "3." is not.

There is also a Go scanner. This one is a bit more complicated but is mostly constructed from predefined rules. The exceptions are for a rule to handle imaginary numbers and a post token processor for automatic semicolon insertion.

A scanner, parser, and formatter for angles in degrees, minutes, and seconds can be found here:

https://github.com/blackchip-org/dms

Status

This package is still a work in progress and is subject to change. If you find this useful, please drop a line at zc at blackchip dot org.

License

MIT

Documentation

Index

Examples

Constants

View Source
const (
	BinType       = "bin"
	CommentType   = "comment"
	EndOfTextType = "end-of-text"
	ErrorType     = "error"
	HexType       = "hex"
	IdentType     = "ident"
	IllegalType   = "illegal"
	IntType       = "int"
	OctType       = "oct"
	RealType      = "real"
	SpaceType     = "space"
	StrType       = "str"
	WordType      = "word"
)
View Source
const (
	EndOfText = rune(-1)
)

Variables

View Source
var (
	Hex2EncRule = NewHexEncRule('x', 2)
	Hex4EncRule = NewHexEncRule('u', 4)
	Hex8EncRule = NewHexEncRule('U', 8)
)
View Source
var (
	StrDoubleQuoteRule = NewStrRule('"', '"').WithEscape('\\')
	StrSingleQuoteRule = NewStrRule('\'', '\'').WithEscape('\\')
)
View Source
var (
	SkipSpaceRule = NewWhileRule(IsSpace, SpaceType).WithKeep(false)
	KeepSpaceRule = NewWhileRule(IsSpace, SpaceType)
	WordRule      = NewWhileRule(Not(IsSpace), WordType)
)
View Source
var OctEnc = NewOctEncRule()
View Source
var TrueRule = trueRule{}

Functions

func All added in v0.2.0

func All(s *Scanner) string

func Escape

func Escape(ch rune) string

func FormatTokenTable

func FormatTokenTable(ts []Token) string

func Line added in v0.2.0

func Line(s *Scanner) string

func Quote

func Quote(s string) string

func QuoteRune

func QuoteRune(r rune) string

func Repeat

func Repeat(fn func(), n int)

Repeat calls function fn in a loop for n times.

func RunTests

func RunTests(t *testing.T, rules RuleSet, tests []Test)

func Space added in v0.2.0

func Space(s *Scanner)

func UnexpectedRune

func UnexpectedRune() func(*Scanner)

func UnexpectedUntil

func UnexpectedUntil(c Class) func(*Scanner)

func Until

func Until(s *Scanner, c Class, fn func())

Until calls function fn until this rune is a member of class c.

func UntilRule added in v0.1.0

func UntilRule(s *Scanner, r Rule, fn func())

func While

func While(s *Scanner, c Class, fn func())

While calls function fn while this rune is a member of class c.

func Word added in v0.2.0

func Word(s *Scanner) string

Types

type CharEnc

type CharEnc struct {
	From rune
	To   rune
}
var (
	AlertEnc          CharEnc = NewCharEnc('a', '\a')
	BackspaceEnc      CharEnc = NewCharEnc('b', '\b')
	BellEnc           CharEnc = AlertEnc
	CarriageReturnEnc CharEnc = NewCharEnc('r', '\r')
	FormFeedEnc       CharEnc = NewCharEnc('f', '\f')
	HorizontalTabEnc  CharEnc = NewCharEnc('t', '\t')
	LineFeedEnc       CharEnc = NewCharEnc('n', '\n')
	NewlineEnc        CharEnc = LineFeedEnc
	TabEnc            CharEnc = HorizontalTabEnc
	VerticalTabEnc    CharEnc = NewCharEnc('v', '\v')
)

func NewCharEnc

func NewCharEnc(from rune, to rune) CharEnc

type CharEncRule

type CharEncRule struct {
	// contains filtered or unexported fields
}

func NewCharEncRule

func NewCharEncRule(maps ...CharEnc) CharEncRule

func (CharEncRule) Eval

func (r CharEncRule) Eval(s *Scanner) bool

type Class

type Class func(rune) bool

Class returns true if the given rune is a member.

var (
	// IsAny returns true as long as there is a rune available in the input
	// stream.
	IsAny Class = func(r rune) bool {
		return r != EndOfText
	}

	// IsCurrency returns true when given a rune that is a currency symbol as
	// defined by Unicode.
	IsCurrency Class = func(r rune) bool {
		return unicode.Is(unicode.Sc, r)
	}

	// IsDigit returns true when given a digit as defined by Unicode.
	IsDigit Class = unicode.IsDigit

	// IsDigit01 returns true when given a valid binary digit.
	IsDigit01 Class = Rune('0', '1')

	// IsDigit07 returns true when given a valid octal digit.
	IsDigit07 Class = Range('0', '7')

	// IsDigit09 returns true when given a valid decimal digit.
	IsDigit09 Class = Range('0', '9')

	// IsDigit0F returns true when given a valid hexadecimal digit.
	IsDigit0F Class = Or(
		IsDigit09,
		Range('a', 'f'),
		Range('A', 'F'),
	)

	// IsLetter returns true when given rune is a letter as defined by Unicode.
	IsLetter Class = unicode.IsLetter

	// IsLetterAZ returns true when given letters from the Latin alphabet.
	IsLetterAZ Class = Or(
		Range('a', 'z'),
		Range('A', 'Z'),
	)

	// IsLetterUnder returns true when given letters as defined by Unicode
	// or an underscore.
	IsLetterUnder = Or(
		IsLetter,
		Rune('_'),
	)

	// IsLetterDigitUnder returns true when given letters as digits as defined
	// by Unicode or an underscore.
	IsLetterDigitUnder = Or(
		IsLetterUnder,
		IsDigit,
	)

	// IsNone always returns false.
	IsNone Class = func(r rune) bool {
		return false
	}

	// IsPrintable returns true when given a rune that is printable as defined
	// by Unicode.
	IsPrintable Class = unicode.IsPrint

	// IsRune8 returns true when given a rune that can be represented by
	// an 8-bit number.
	IsRune8 Class = Range(0, 0xff)

	// IsRune16 returns true when given a rune that can be represented by
	// a 16-bit number.
	IsRune16 Class = Range(0, 0xffff)

	// IsSign returns true when the rune is a positive (+) or negative (-)
	// numeric symbol.
	IsSign Class = Rune('+', '-')

	// IsSpace returns true when the given rune is whitespace as
	// defined by Unicode.
	IsSpace Class = unicode.IsSpace
)

func Not

func Not(c Class) Class

Not returns a Class function that returns true when given a rune that does not match class c.

Example
isA := Rune('a')
isNotA := Not(isA)
fmt.Printf("%c: %v\n", 'a', isNotA('a'))
fmt.Printf("%c: %v\n", 'b', isNotA('b'))
Output:

a: false
b: true

func Or

func Or(cs ...Class) Class

Or returns a Class that returns true when given a rune that matches any class found in cs.

Example
isLowerAZ := Range('a', 'z')
isUpperAZ := Range('A', 'Z')
isLetterAZ := Or(isLowerAZ, isUpperAZ)
fmt.Printf("%c: %v\n", 'f', isLetterAZ('f'))
fmt.Printf("%c: %v\n", 'F', isLetterAZ('F'))
fmt.Printf("%c: %v\n", '4', isLetterAZ('4'))
Output:

f: true
F: true
4: false

func Range

func Range(from rune, to rune) Class

Range returns a Class function that returns true when given a rune that is between from and to inclusive.

Example
isDigit09 := Range('0', '9')
fmt.Printf("%c: %v\n", '3', isDigit09('3'))
fmt.Printf("%c: %v\n", '6', isDigit09('6'))
fmt.Printf("%c: %v\n", 'a', isDigit09('a'))
Output:

3: true
6: true
a: false

func Rune

func Rune(rs ...rune) Class

Rune returns a Class function that returns true when given any rune found in rs.

Example
isAB := Rune('a', 'b')
fmt.Printf("%c: %v\n", 'a', isAB('a'))
fmt.Printf("%c: %v\n", 'b', isAB('b'))
fmt.Printf("%c: %v\n", 'c', isAB('c'))
Output:

a: true
b: true
c: false

type ClassRule added in v0.1.0

type ClassRule struct {
	// contains filtered or unexported fields
}

func NewClassRule added in v0.1.0

func NewClassRule(isClass Class) ClassRule

func (ClassRule) Eval added in v0.1.0

func (r ClassRule) Eval(s *Scanner) bool

func (ClassRule) WithType added in v0.1.0

func (r ClassRule) WithType(t string) ClassRule

type CommentRule

type CommentRule struct {
	// contains filtered or unexported fields
}

func NewCommentRule

func NewCommentRule(begin LiteralRule, end LiteralRule) CommentRule

func (CommentRule) Eval

func (r CommentRule) Eval(s *Scanner) bool

func (CommentRule) WithKeep

func (r CommentRule) WithKeep(keep *bool) CommentRule

type Error

type Error struct {
	Pos     Pos
	Message string
	Cause   error
}

func (Error) Equal added in v0.1.0

func (e Error) Equal(e2 Error) bool

func (Error) Error

func (e Error) Error() string

func (Error) Unwrap

func (e Error) Unwrap() error

type Errors

type Errors []Error

func (Errors) Equal added in v0.1.0

func (es Errors) Equal(es2 Errors) bool

func (Errors) Error

func (es Errors) Error() string

type HexEncRule

type HexEncRule struct {
	// contains filtered or unexported fields
}

func NewHexEncRule

func NewHexEncRule(flag rune, digits int) HexEncRule

func (HexEncRule) AsByte added in v0.0.1

func (r HexEncRule) AsByte(b bool) HexEncRule

func (HexEncRule) Eval

func (r HexEncRule) Eval(s *Scanner) bool

type IdentRule

type IdentRule struct {
	// contains filtered or unexported fields
}
var (
	StandardIdentRule IdentRule = NewIdentRule(IsLetterUnder, IsLetterDigitUnder)
)

func NewIdentRule

func NewIdentRule(isHead Class, isTail Class) IdentRule

func (IdentRule) Eval

func (r IdentRule) Eval(s *Scanner) bool

func (IdentRule) WithKeywords

func (r IdentRule) WithKeywords(ks ...string) IdentRule

type LiteralRule

type LiteralRule struct {
	// contains filtered or unexported fields
}

func Literal

func Literal(lits ...string) LiteralRule

func (LiteralRule) Eval

func (r LiteralRule) Eval(s *Scanner) bool

func (LiteralRule) WithSkip

func (r LiteralRule) WithSkip(b bool) LiteralRule

type NumRule

type NumRule struct {
	// contains filtered or unexported fields
}
var (
	BinRule           NumRule = NewNumRule(IsDigit01).WithIntType(BinType)
	Bin0bRule         NumRule = BinRule.WithPrefix(Literal("0b", "0B"))
	HexRule           NumRule = NewNumRule(IsDigit0F).WithIntType(HexType)
	Hex0xRule         NumRule = HexRule.WithPrefix(Literal("0x", "0X"))
	OctRule           NumRule = NewNumRule(IsDigit07).WithIntType(OctType)
	Oct0oRule         NumRule = OctRule.WithPrefix(Literal("0o", "0O"))
	IntRule           NumRule = NewNumRule(IsDigit09)
	RealRule          NumRule = IntRule.WithDecSep(Rune('.'))
	RealExpRule       NumRule = RealRule.WithExp(Rune('e', 'E')).WithExpSign(IsSign)
	SignedIntRule     NumRule = IntRule.WithSign(IsSign)
	SignedRealRule    NumRule = RealRule.WithSign(IsSign)
	SignedRealExpRule NumRule = RealExpRule.WithSign(IsSign)
)

func NewNumRule

func NewNumRule(digit Class) NumRule

func (NumRule) Eval

func (r NumRule) Eval(s *Scanner) bool

func (NumRule) WithDecSep

func (r NumRule) WithDecSep(c Class) NumRule

func (NumRule) WithDigitSep

func (r NumRule) WithDigitSep(c Class) NumRule

func (NumRule) WithEmptyPartsAllowed

func (r NumRule) WithEmptyPartsAllowed(b bool) NumRule

func (NumRule) WithExp

func (r NumRule) WithExp(c Class) NumRule

func (NumRule) WithExpSign

func (r NumRule) WithExpSign(c Class) NumRule

func (NumRule) WithIntType

func (r NumRule) WithIntType(t string) NumRule

func (NumRule) WithLeadingDigitSepAllowed

func (r NumRule) WithLeadingDigitSepAllowed(b bool) NumRule

func (NumRule) WithLeadingZeroAllowed

func (r NumRule) WithLeadingZeroAllowed(b bool) NumRule

func (NumRule) WithPrefix

func (r NumRule) WithPrefix(rule Rule) NumRule

func (NumRule) WithRealType

func (r NumRule) WithRealType(t string) NumRule

func (NumRule) WithSign

func (r NumRule) WithSign(c Class) NumRule

func (NumRule) WithSuffix

func (r NumRule) WithSuffix(rules ...Rule) NumRule

type OctEncRule

type OctEncRule struct {
}

func NewOctEncRule

func NewOctEncRule() OctEncRule

func (OctEncRule) Eval

func (r OctEncRule) Eval(s *Scanner) bool

type Pos

type Pos struct {
	Name string `json:"name,omitempty"`
	Line int    `json:"line"`
	Col  int    `json:"col"`
}

Pos represents a position within an input stream.

func NewPos

func NewPos(name string) Pos

NewPos returns a new position with the given name and the line number and column number set to one.

func (Pos) String

func (p Pos) String() string

type Rule

type Rule interface {
	Eval(*Scanner) bool
}

type RuleSet

type RuleSet struct {
	// contains filtered or unexported fields
}

func NewRuleSet

func NewRuleSet(rules ...Rule) RuleSet

func (RuleSet) Eval

func (r RuleSet) Eval(s *Scanner) bool

func (RuleSet) Next

func (r RuleSet) Next(s *Scanner) Token

func (RuleSet) WithNoMatchFunc

func (r RuleSet) WithNoMatchFunc(fn func(*Scanner)) RuleSet

func (RuleSet) WithPostTokenFunc

func (r RuleSet) WithPostTokenFunc(fn func(*Scanner, Token) Token) RuleSet

func (RuleSet) WithPreTokenFunc

func (r RuleSet) WithPreTokenFunc(fn func(*Scanner)) RuleSet

type Runner

type Runner struct {
	Rules RuleSet
	This  Token
	Next  Token
	// contains filtered or unexported fields
}

func NewRunner

func NewRunner(scan *Scanner, rules RuleSet) *Runner

func (*Runner) All

func (r *Runner) All() []Token

func (*Runner) HasMore

func (r *Runner) HasMore() bool

func (*Runner) Scan added in v0.1.0

func (r *Runner) Scan() Token

type Scanner

type Scanner struct {
	This rune
	Next rune
	Val  strings.Builder
	Lit  strings.Builder
	Pos  Pos
	Errs Errors
	Type string
	// contains filtered or unexported fields
}

func NewScanner

func NewScanner(name string, src io.Reader) *Scanner

func NewScannerFromBytes

func NewScannerFromBytes(name string, src []byte) *Scanner

func NewScannerFromString

func NewScannerFromString(name string, src string) *Scanner

func (*Scanner) Discard

func (s *Scanner) Discard()

Discard advances the string to the next rune without adding the current rune to the token.

func (*Scanner) Emit

func (s *Scanner) Emit() Token

Emit returns the token that has been built and resets the builder for the next token.

func (*Scanner) Eval added in v0.2.0

func (s *Scanner) Eval(r Rule) (Token, bool)

func (*Scanner) HasMore

func (s *Scanner) HasMore() bool

HasMore return true if there is more data to consume in the stream.

func (*Scanner) Illegal

func (s *Scanner) Illegal(format string, args ...any)

func (*Scanner) Init

func (s *Scanner) Init(name string, src io.Reader)

func (*Scanner) InitFromBytes

func (s *Scanner) InitFromBytes(name string, src []byte)

func (*Scanner) InitFromString

func (s *Scanner) InitFromString(name string, src string)

func (*Scanner) Keep

func (s *Scanner) Keep()

Keep advances the stream to the next rune and adds the current rune to the token value and literal.

func (*Scanner) Peek

func (s *Scanner) Peek(i int) rune

func (*Scanner) Skip

func (s *Scanner) Skip()

Skip advances the string to the next rune without adding the current rune to the token value. This current rune is written to the literal value.

func (*Scanner) Undo

func (s *Scanner) Undo()

type StrRule

type StrRule struct {
	// contains filtered or unexported fields
}

func NewStrRule

func NewStrRule(begin rune, end rune) StrRule

func (StrRule) Eval

func (r StrRule) Eval(s *Scanner) bool

func (StrRule) WithEscape

func (r StrRule) WithEscape(rn rune) StrRule

func (StrRule) WithEscapeRules

func (r StrRule) WithEscapeRules(rules ...Rule) StrRule

func (StrRule) WithMaxLen

func (r StrRule) WithMaxLen(l uint) StrRule

func (StrRule) WithMultiline

func (r StrRule) WithMultiline(b bool) StrRule

func (StrRule) WithType

func (r StrRule) WithType(t string) StrRule

type Test

type Test struct {
	// contains filtered or unexported fields
}

func NewTest

func NewTest(src string, val string, line int, col int, type_ string) Test

func (Test) And

func (t Test) And(val string, line int, col int, type_ string) Test

func (Test) WithError

func (t Test) WithError(message string) Test

type Token

type Token struct {
	Val  string `json:"val"`
	Lit  string `json:"lit,omitempty"`
	Type string `json:"type"`
	Pos  Pos    `json:"pos"`
	Errs Errors `json:"errs,omitempty"`
}

func (Token) Equal added in v0.1.0

func (t Token) Equal(t2 Token) bool

func (Token) IsEndOfText added in v0.1.0

func (t Token) IsEndOfText() bool

func (Token) IsValid

func (t Token) IsValid() bool

func (Token) String

func (t Token) String() string

type WhileRule

type WhileRule struct {
	// contains filtered or unexported fields
}

func NewWhileRule added in v0.2.0

func NewWhileRule(space Class, type_ string) WhileRule

func (WhileRule) Eval added in v0.2.0

func (r WhileRule) Eval(s *Scanner) bool

func (WhileRule) WithKeep added in v0.2.0

func (r WhileRule) WithKeep(b bool) WhileRule

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL