scan

package module

v0.3.0 Latest Latest Go to latest Published: Aug 29, 2024 License: MIT Imports: 11 Imported by: 11

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/blackchip-org/scan

Links

Open Source Insights

README ¶

scan

An all-purpose scanner for your lexical needs.

Introduction

A scanner is used to extract the fundamental and basic elements of a structured text document. These basic elements are called tokens and represent the atomic units used when parsing a document. As an example, we will use the following GeoJSON file that is featured on the front page of https://geojson.org:

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "name": "Dinagat Islands"
  }
}

Some things in this document don't matter when parsing. Whitespace is one of those things. This document is indented by two spaces but it could have been four, eight, or even zero spaces without changing its meaning.

Some things really do matter. The brackets that are used to define an object. The strings that are enclosed by double quotes. The colon that separates a property name from its value. These are the tokens of the document. Parsing a document is much easier when working with a stream of tokens instead of individual characters.

Let's look at the third line of the example. It starts with a string token with the value of geometry. It is important to capture both bits of information-- its value of geometry and its type of string. A parser for this document is interested if the next token is a string or not but is not concerned about its value. Users of the parse result will be interested in the value.

When there are errors scanning and parsing the document, good error messages are useful to help identify where problems are located. Recording the line and column number with each token is important for this reporting.

The examples directory contains a JSON scanner using the scanner provided by this package. Run it with the following:

go run cmd/scan-json/scan-json.go example.json

A table should be printed to the terminal with the stream of tokens found in the file. Its output starts with the following:

Pos                 Type  Value              Literal
example.json:1:1    {     "{"                "{"
example.json:2:5    str   "type"             '"type"'
example.json:2:11   :     ":"                ":"
example.json:2:13   str   "Feature"          '"Feature"'
example.json:2:22   ,     ","                ","
example.json:3:5    str   "geometry"         '"geometry"'
example.json:3:15   :     ":"                ":"
example.json:3:17   {     "{"                "{"
example.json:4:9    str   "type"             '"type"'

Building a scanner manually isn't difficult but it can be a bit tedious at times. This scanner provides low level functions for iterating over a stream and for building tokens. It also has higher level functions that are useful for chaining together common scanning patterns. Scanners for files that follow standard patterns can be constructed easily using the definitions found in this package. Complex or non-standard cases can stick with the lower level functions which still provide a lot of value in getting a scanner up and running quickly.

Creating a Scanner

A scanner is initialized with a filename followed by an input source which can be either an io.Reader, a string or a byte slice. If the filename is not important or doesn't apply to the input, it can set to a blank value. Use one of the corresponding new functions to create a scanner or use an init function to either initialize a zero value scanner or to reset an existing scanner.

An example of using new to read from a file:

    f, err := os.Open(filename)
    if err != nil {
        return err
    }
    defer f.Close()
    s := scan.NewScanner(filename, f)

An example of using init to read from a string:

    var s scan.Scanner
    s.InitString("", "hello world")

Basic Run Loop

When using the lower level functions, a stream is processed one rune at a time. A buffer is used to accumulate these runes until a complete token has been seen. The basic scanning loop is to look at the upcoming rune in the input and to decide to either keep the rune for the current token or to emit the token. The following fields and methods are used for this:

`s.This`

The rune at the current position in the input stream.

`s.Next`

The next rune found in the input stream. When the scanner advances, this value will become s.This

`s.HasMore()`

Returns true if there is more runes in the input stream. Returns false when the end of file or an error condition is encountered.

`s.Keep()`

Adds the current rune to the token buffer and advances the scanner to the next rune.

`s.Emit()`

Returns the current token and resets the token buffer.

Example

Scanning a single binary number from an input stream can be done by looping over the stream while it has data and adding to the token buffer as long as the current rune is a zero or a one:

    s := scan.NewScannerFromString("", "1010234abc!")
    for s.HasMore() {
        if s.This == '0' || s.This == '1' {
            s.Keep()
            continue
        }
        break
    }
    tok := s.Emit()
    fmt.Println(tok.Val)

    // Output:
    // 1010

Example 1

Once the scanner encounters a non-binary digit, it breaks out of the loop and emits the token accumulated so far. The remaining contents of the stream, "234abc!", are not read.

Classes

Now we will update the example to scan for an unsigned integer in base 10. The only change necessary is to now check for digits 0 through 9 instead of just 0 and 1. This check can be placed into a separate function that takes a rune as input and returns a bool indicating if the rune is one of those digits. This type of function is a scan.Class function and it is used to determine if a rune is member of that class. Example:

    isDigit09 := func(r rune) bool {
        return r >= '0' && r <= '9'
    }

    s := scan.NewScannerFromString("", "1010234abc!")
    for s.HasMore() && isDigit09(s.This) {
        s.Keep()
    }
    tok := s.Emit()
    fmt.Println(tok.Val)

    // Output:
    // 1010234

Example 2

Class Functions

The scanner package provides some functions to easily construct various rune classes:

`scan.Rune()`

Creates a scan.Class function that returns true if it matches any of the specified runes. An example of a rune class that checks for common whitespace:

    isSpace := scan.Rune(' ', '\n', '\t', '\f', '\r')

`scan.Range()`

Creates a scan.Class function that returns true if it matches a range of runes as defined by their code point values. An example of a rune class that checks for digits between 0 and 9:

    isDigit09 := scan.Range('0', '9')

`scan.Or()`

Creates a scan.Class function that returns true if any of the specified classes returns true. An example of a rune class that checks for hexadecimal digits:

    isDigit0F := scan.Or(
        scan.Range('0', '9'),
        scan.Range('A', 'F'),
        scan.Range('a', 'f'),
    )

`scan.Not()`

Creates a scan.Class function that negates another class. For example, to create a class that accepts all non-whitespace runes:

    isNotSpace := scan.Not(isSpace)

Predefined Classes

Common rune classes are provided by this package and can be used instead. These classes return true under the following conditions:

scan.IsAny: always, as long as there are more runes available in the stream
scan.IsCurrency: currency symbol as defined by Unicode
scan.IsDigit: digit as defined by Unicode
scan.IsDigit01: binary digits
scan.IsDigit07: octal digits
scan.IsDigit09: decimal digits
scan.IsDigit0F: hexadecimal digits
scan.IsLetter: letter as defined by Unicode
scan.IsLetterAZ: letters A through Z and a through z
scan.IsLetterUnder: letter or an underscore
scan.IsLetterDigitUnder: letter, digit, or an underscore
scan.IsNone: never, always returns false
scan.IsPrintable: printable as defined by Unicode
scan.IsRune8: a code point that can fit in an 8-bit number
scan.IsRune16: a code point that can fit in a 16-bit number
scan.IsSpace: whitespace as defined by Unicode.

Updated Example

The previous example can now be updated using the predefined class for base 10 numbers and the s.Is() method:

    s := scan.NewScannerFromString("", "1010234abc!")
    for s.HasMore() && scan.IsDigit09(s.This) {
        s.Keep()
    }
    tok := s.Emit()
    fmt.Println(tok.Val)

    // Output:
    // 1010234

Example 3

Loop Functions

A common loop while scanning is to iterate over the stream while there is more data and then break out of the loop once a condition is met. The following functions can be used in simple loop operations:

`scan.While()`

Consume input from the stream while characters belong to a certain class. The function takes a *scan.Scanner, a scan.Class used to check membership, and a function that should be called for each matching character. For example, to add characters to the current token while digits are seen:

    scan.While(s, scan.IsDigit09, s.Keep)

`scan.Until()`

This function is similar to scan.While() but consumes characters from the stream until a character belongs to the provided class. For example, to add any non-whitespace characters to the curren token:

    scan.Until(s, scan.IsSpace, s.Keep)

`scan.Repeat()`

Sometimes it is necessary to call a function like scan.Keep() a fixed number of times to advance the scanner. Use the scan.Repeat() function for this:

    scan.Repeat(s.Keep, 3)

Updated Example

The example can now be updated to remove the loop with a call to scan.While():

    s := scan.NewScannerFromString("", "1010234abc!")
    scan.While(s, scan.IsDigit09, s.Keep)
    tok := s.Emit()
    fmt.Println(tok.Val)

    // Output:
    // 1010234

Example 4

Discarding

For runes like whitespace, it useful to simply discard them without creating tokens.

`s.Discard()`

Resets the current token buffer, discards the current rune, and advances the scanner to the next rune. The example below now discards any whitespace found at the beginning of the stream:

    s := scan.NewScannerFromString("", " \t 1010234abc!")
    scan.While(s, scan.IsSpace, s.Discard)
    scan.While(s, scan.IsDigit09, s.Keep)
    tok := s.Emit()
    fmt.Println(tok.Val)

    // Output:
    // 1010234

Example 5

Types

The next example is going to scan the entire stream and create tokens for each integer and word that is found. A word is going to be any sequence of letters found in the stream. Integers and words must be separated by whitespace. We will now place each token type will into a separate function and each token will now be identified with a type. A type can be any string value but there are some constants already defined in the scan package. The functions return a boolean value to indicate if there was a match for that token.

The function for the integer is as follows:

    func scanInt(s *scan.Scanner) bool {
        if !scan.IsDigit09(s.This) {
            return false
        }
        s.Type = scan.IntType
        scan.While(s, scan.IsDigit09, s.Keep)
        return true
    }

Example 6

The function for the word is similar and is as follows:

    func scanWord(s *scan.Scanner) bool {
        if !scan.IsLetter(s.This) {
            return false
        }
        s.Type = scan.WordType
        scan.While(s, scan.IsLetter, s.Keep)
        return true
    }

Example 6

Since whitespace is now significant, it will be returned as a token instead of being discarded:

    func scanSpace(s *scan.Scanner) bool {
        if !scan.IsSpace(s.This) {
            return false
        }
        s.Type = scan.SpaceType
        scan.While(s, scan.IsSpace, s.Keep)
        return true
    }

Example 6

Now create a main loop that checks each function for a match and then collects all of the tokens seen along the way:

    scanFuncs := []func(*scan.Scanner) bool{
        scanSpace,
        scanWord,
        scanInt,
    }

    var toks []scan.Token
    s := scan.NewScannerFromString("example6", "abc 123 !@#  \tdef456")
    for s.HasMore() {
        match := false
        for _, fn := range scanFuncs {
            if match = fn(s); match {
                toks = append(toks, s.Emit())
                break
            }
        }
        if !match {
            scan.Until(s, scan.IsSpace, s.Keep)
            s.Illegal("unexpected %v", scan.Quote(s.Val.String()))
            toks = append(toks, s.Emit())
        }
    }
    fmt.Println(scan.FormatTokenTable(toks))

    // Output:
    //
    // Pos            Type     Value         Literal
    // example6:1:1   word     "abc"         "abc"
    // example6:1:4   space    " "           " "
    // example6:1:5   int      "123"         "123"
    // example6:1:8   space    " "           " "
    // example6:1:9   illegal  "!@#"         "!@#"
    // example6:1:12: error: unexpected "!@#"
    // example6:1:12  space    "  {!ch:\t}"  "  {!ch:\t}"
    // example6:1:15  word     "def"         "def"
    // example6:1:18  int      "456"         "456"

Example 6

For each token that is generated, we can check the tok.Type field to see if it is an int, word, space, or an illegal token. The tok.Pos field contains the name of the file (or item) being scanned, the line number, and the column number of where the token starts.

Note that the role of the scanner is to simply emit tokens of the various types seen in the input stream. The "def456" string in the stream is illegal by this example's definition because a space must separate the word and the integer. That check should be handled by the parser that is processing the tokens, not by the scanner itself.

There are a few new functions or methods introduced in this example:

`s.Illegal()`

This function has the same arguments as fmt.Printf() and attaches an error to the token which can be found in tok.Errs. It also sets the token type field to scan.IllegalType.

`scan.Quote()`

This function surrounds a value with double quotes to make it clear where the value starts and ends. If the value contains double quotes, then single quotes will be used instead. If the value contains both double and single quotes, then backticks are used. In the rare event that all quotes are in use, the value is prefixed with {!quote: and ends with }.

Non-printable characters found in values are escaped with a prefix of {!ch: and ends with }. For characters that are normally escaped, they will be displayed like {!ch:\n} otherwise they are shown with a hexadecimal number like {!ch:a0}

`scan.FormatTokenTable()`

This function takes a slice of tokens and returns a string with the tokens formatted in a table to be used for display. For each token, it shows the position of where the token starts, the type of token, and the value and literal. In this example, the value and literal are always the same but that does not always have to be the case. If there are errors attached to a token, those are listed on subsequent lines.

Rules

Scanning for integers, words, and spaces are common things to scan for in a text document. In the next example, the scanning functions are modified to allow configuration of what is considered a digit, a letter, and a space. Instead of defining a function, a struct will now be used that contains the configuration and the scanning function will now be a method called Eval.

These structs will be called rules and they satisfy the scan.Rule interface. Changing the integer function to a rule looks like this:

    type IntRule struct {
        isDigit scan.Class
    }

    func NewIntRule(isDigit scan.Class) IntRule {
        return IntRule{isDigit: isDigit}
    }

    func (r IntRule) Eval(s *scan.Scanner) bool {
        if !r.isDigit(s.This) {
            return false
        }
        s.Type = scan.IntType
        scan.While(s, r.isDigit, s.Keep)
        return true
    }

Example 7

With this change, this one rule can be used for scanning decimal, binary, octal, and hexadecimal numbers simply by calling new with the corresponding digit class. The rules for word and whitespace looks similar and aren't shown here. The full source code for the example can be found in the Example 7 links.

Now a function is needed to handle the case where a token does not match. This also has to be configurable to allow different definitions of whitespace. This will be a function that returns a function that scans the unexpected value until the specified class is found:

func UnexpectedUntil(c scan.Class) func(*scan.Scanner) {
    return func(s *scan.Scanner) {
        scan.Until(s, c, s.Keep)
        s.Illegal("unexpected %s", scan.Quote(s.Lit.String()))
    }
}

These new rules and the no match function can now be bundled up with a scan.RuleSet. Then a scan.Runner can be created with the scanner and rule set to iterate through the tokens using runner.HasMore and runner.Next. Or all the tokens can be obtained by calling runner.All():

    rules := scan.NewRuleSet(
        NewSpaceRule(scan.Whitespace),
        NewWordRule(scan.Letter),
        NewIntRule(scan.Digit),
    ).WithNoMatchFunc(UnexpectedUntil(scan.Whitespace))

    s := scan.NewScannerFromString("example7", "abc 123 !@#  \tdef456")
    runner := scan.NewRunner(s, rules)
    toks := runner.All()
    fmt.Println(scan.FormatTokenTable(toks))

    // Output:
    //
    // Pos            Type     Value         Literal
    // example7:1:1   word     "abc"         "abc"
    // example7:1:4   space    " "           " "
    // example7:1:5   int      "123"         "123"
    // example7:1:8   space    " "           " "
    // example7:1:9   illegal  "!@#"         "!@#"
    // example7:1:12: error: unexpected "!@#"
    // example7:1:12  space    "  {!ch:\t}"  "  {!ch:\t}"
    // example7:1:15  word     "def"         "def"
    // example7:1:18  int      "456"         "456"

Example 7

Preprocessing

Sometimes it is convenient to perform some light preprocessing of the token value before passing it off to the parser. A good example of this is allowing digit separators in an integer value such as "1,000". The parser is most likely going to pass this responsibility over to strconv.ParseInt() and the comma gets in the way. Another use case is to convert runes to another rune. For example, converting all whitespace runes to a single space value or changing newlines to semicolons. The following can be used for these cases:

`s.Skip()`

Adds the current rune to the token's literal but not to the token's value. The scanner then advances to the next rune.

`s.Value`

This is a string builder containing the token value seen so far. If newlines need to be converted to semicolons, this can be done in the following way:

    if s.This == '\n' {
        s.Skip()
        s.Value.WriteRune(';')
    }

Now the integer rule can be updated to include a digit separator. The digit class is mandatory but the digit separator is optional so only the digit class is found in the constructor. The digit separator can be configured with a chaining function:

type IntRule2 struct {
    isDigit    scan.Class
    isDigitSep scan.Class
}

func NewIntRule2(isDigit scan.Class) IntRule2 {
    return IntRule2{
        isDigit:    isDigit,
        isDigitSep: scan.IsNone,
    }
}

func (r IntRule2) WithDigitSep(isDigitSep scan.Class) IntRule2 {
    r.isDigitSep = isDigitSep
    return r
}

func (r IntRule2) Eval(s *scan.Scanner) bool {
    if !r.isDigit(s.This) {
        return false
    }
    s.Keep()
    for s.HasMore() {
        if r.isDigit(s.This) {
            s.Keep()
        } else if r.isDigitSep(s.This) {
            s.Skip()
        } else {
            break
        }
    }
    return true
}

Example 8

Now update the rule set with the new rule and modify the input to include some digit separators:

    rules := scan.NewRuleSet(
        NewSpaceRule(scan.IsSpace),
        NewWordRule(scan.IsLetter),
        NewIntRule2(scan.IsDigit).
            WithDigitSep(scan.Rune(',')),
    ).WithNoMatchFunc(UnexpectedUntil(scan.IsSpace))

    s := scan.NewScannerFromString("example8", "abc 1,234 !@#  \tdef45,678")
    runner := scan.NewRunner(s, rules)
    toks := runner.All()
    fmt.Println(scan.FormatTokenTable(toks))

    // Output:
    //
    // Pos            Type     Value         Literal
    // example8:1:1   word     "abc"         "abc"
    // example8:1:4   space    " "           " "
    // example8:1:5   1234     "1234"        "1,234"
    // example8:1:10  space    " "           " "
    // example8:1:11  illegal  "!@#"         "!@#"
    // example8:1:14: error: unexpected "!@#"
    // example8:1:14  space    "  {!ch:\t}"  "  {!ch:\t}"
    // example8:1:17  word     "def"         "def"
    // example8:1:20  45678    "45678"       "45,678"

Example 8

Predefined Rules

The scan package already has rules defined for reading the token types used so far. The example can now be updated to use those rules instead:

    rules := scan.NewRuleSet(
        scan.KeepSpaceRule,
        scan.NewWhileRule(scan.IsLetter, scan.WordType),
        scan.IntRule.WithDigitSep(scan.Rune(',')),
    ).WithNoMatchFunc(scan.UnexpectedUntil(scan.IsSpace))

    s := scan.NewScannerFromString("example9", "abc 1,234 !@#  \tdef45,678")
    runner := scan.NewRunner(s, rules)
    toks := runner.All()
    fmt.Println(scan.FormatTokenTable(toks))

    // Output:
    //
    // Pos            Type     Value         Literal
    // example9:1:1   word     "abc"         "abc"
    // example9:1:4   space    " "           " "
    // example9:1:5   int      "1234"        "1,234"
    // example9:1:10  space    " "           " "
    // example9:1:11  illegal  "!@#"         "!@#"
    // example9:1:14: error: unexpected "!@#"
    // example9:1:14  space    "  {!ch:\t}"  "  {!ch:\t}"
    // example9:1:17  word     "def"         "def"
    // example9:1:20  int      "45678"       "45,678"

Example 9

The predefined rules are as follows:

scan.CharEncRule: For converting character encodings such as \n to their actual values like U+000A
scan.ClassRule: For runes that are a member of a class.
scan.CommentRule: Comments which can be configured, at runtime, to emit tokens or discard the input.
scan.HexEncRule: For converting hex escape sequences such as \x7f to their actual values.
scan.IdentRule: Identifiers used in programming languages that usually need two characters classes: one for the first character, and one for the rest. Can also be configured with a list of keywords.
scan.LiteralRule: For matching exact strings in the input. Useful for matching against operators, punctuation, and keywords when not using an scan.IdentRule.
scan.NumRule: Integer and real number values. This is quite configurable for most standard needs but it doesn't cover every possibility that can be found in the wild.
scan.OctEncRule: For converting octal escape sequences such as \377 to their actual values.
scan.StrRule: For scanning strings which are bracketed by a start and ending rune. Can be configured to include escape sequences and multiline behavior.
scan.SpaceRule: For discarding whitespace but can be configured to emit tokens if needed.
scan.WhileRule: Scanning a token while a character class is true.

Some rules have predefined configurations as well. For example scan.Hex0xRule that uses a scan.NumRule configured for hexadecimal digits that are prefixed with 0x. See the API documentation for more information.

Lookahead and lookbehind

Most scanning operations can be done by looking at the current character, which is stored in s.This and the next character in the stream that is stored in s.Next. In some cases, it can be easier to look further ahead in the stream or to even speculate and advance the scanner and bail out if the stream doesn't contain what is expected. Use the following for these cases:

`s.Peek()`

Looks ahead in the stream. It takes an integer value which is the number of runes ahead of the current rule it should return. Using s.Peek(0) is the same as s.This and s.Peek(1) is the same as s.Next. The lookahead value can be as large as needed. If a peek operation would go past the end of the stream, the value of scan.EndOfText is returned. A negative value can be used to look back at the literal collected so far. For example, s.Peek(-1) would return the rune that was scanned before the current one. If a negative value goes past the length of the current literal, a value of scan.EndOfText is returned.

`s.Undo()`

When speculating with the scanner, calling this method will place the token literal seen so far back onto the input stream and then clear the token buffer.

Full Examples

There are two full examples provided with this package. The first is a JSON scanner. The scanner for JSON is quite simple but JSON does have some quirks:

Numbers can start with a negative sign, but not a positive sign.
Numbers cannot have leading zeros. "303" is okay, but "0303" is not.
Real numbers cannot have empty parts. "0.3" and "3.0" is valid, but ".3" and "3." is not.

There is also a Go scanner. This one is a bit more complicated but is mostly constructed from predefined rules. The exceptions are for a rule to handle imaginary numbers and a post token processor for automatic semicolon insertion.

In a separate repository, there is an example of a scanner, parser, and a formatter for DMS angles.

Status

This package is still a work in progress and is subject to change. If you find this useful, please drop a line at zc at blackchip dot org.

License

MIT

Documentation ¶

Index ¶

Constants
Variables
func All(s *Scanner) string
func Escape(ch rune) string
func FormatTokenTable(ts []Token) string
func Line(s *Scanner) string
func Quote(s string) string
func QuoteRune(r rune) string
func Repeat(fn func(), n int)
func RunTests(t *testing.T, rules RuleSet, tests []Test)
func Space(s *Scanner)
func UnexpectedRune() func(*Scanner)
func UnexpectedUntil(c Class) func(*Scanner)
func Until(s *Scanner, c Class, fn func())
func UntilRule(s *Scanner, r Rule, fn func())
func While(s *Scanner, c Class, fn func())
func Word(s *Scanner) string
type CharEnc
- func NewCharEnc(from rune, to rune) CharEnc
type CharEncRule
- func NewCharEncRule(maps ...CharEnc) CharEncRule
- func (r CharEncRule) Eval(s *Scanner) bool
type Class
- func Not(c Class) Class
- func Or(cs ...Class) Class
- func Range(from rune, to rune) Class
- func Rune(rs ...rune) Class
type ClassRule
- func NewClassRule(isClass Class) ClassRule
- func (r ClassRule) Eval(s *Scanner) bool
- func (r ClassRule) WithType(t string) ClassRule
type CommentRule
- func NewCommentRule(begin LiteralRule, end LiteralRule) CommentRule
- func (r CommentRule) Eval(s *Scanner) bool
- func (r CommentRule) WithKeep(keep *bool) CommentRule
type Error
- func (e Error) Equal(e2 Error) bool
- func (e Error) Error() string
- func (e Error) Unwrap() error
type Errors
- func (es Errors) Equal(es2 Errors) bool
- func (es Errors) Error() string
type HexEncRule
- func NewHexEncRule(flag rune, digits int) HexEncRule
- func (r HexEncRule) AsByte(b bool) HexEncRule
- func (r HexEncRule) Eval(s *Scanner) bool
type IdentRule
- func NewIdentRule(isHead Class, isTail Class) IdentRule
- func (r IdentRule) Eval(s *Scanner) bool
- func (r IdentRule) WithKeywords(ks ...string) IdentRule
type LiteralRule
- func Literal(lits ...string) LiteralRule
- func (r LiteralRule) Eval(s *Scanner) bool
- func (r LiteralRule) WithSkip(b bool) LiteralRule
type NumRule
- func NewNumRule(digit Class) NumRule
- func (r NumRule) Eval(s *Scanner) bool
- func (r NumRule) WithDecSep(c Class) NumRule
- func (r NumRule) WithDigitSep(c Class) NumRule
- func (r NumRule) WithEmptyPartsAllowed(b bool) NumRule
- func (r NumRule) WithExp(c Class) NumRule
- func (r NumRule) WithExpSign(c Class) NumRule
- func (r NumRule) WithIntType(t string) NumRule
- func (r NumRule) WithLeadingDigitSepAllowed(b bool) NumRule
- func (r NumRule) WithLeadingZeroAllowed(b bool) NumRule
- func (r NumRule) WithPrefix(rule Rule) NumRule
- func (r NumRule) WithRealType(t string) NumRule
- func (r NumRule) WithSign(c Class) NumRule
- func (r NumRule) WithSuffix(rules ...Rule) NumRule
type OctEncRule
- func NewOctEncRule() OctEncRule
- func (r OctEncRule) Eval(s *Scanner) bool
type Pos
- func NewPos(name string) Pos
- func (p Pos) String() string
type Rule
type RuleSet
- func NewRuleSet(rules ...Rule) RuleSet
- func (r RuleSet) Eval(s *Scanner) bool
- func (r RuleSet) Next(s *Scanner) Token
- func (r RuleSet) WithNoMatchFunc(fn func(*Scanner)) RuleSet
- func (r RuleSet) WithPostTokenFunc(fn func(*Scanner, Token) Token) RuleSet
- func (r RuleSet) WithPreTokenFunc(fn func(*Scanner)) RuleSet
type Runner
- func NewRunner(scan *Scanner, rules RuleSet) *Runner
- func (r *Runner) All() []Token
- func (r *Runner) HasMore() bool
- func (r *Runner) Scan() Token
type Scanner
- func NewScanner(name string, src io.Reader) *Scanner
- func NewScannerFromBytes(name string, src []byte) *Scanner
- func NewScannerFromString(name string, src string) *Scanner
- func (s *Scanner) Discard()
- func (s *Scanner) Emit() Token
- func (s *Scanner) Eval(r Rule) (Token, bool)
- func (s *Scanner) HasMore() bool
- func (s *Scanner) Illegal(format string, args ...any)
- func (s *Scanner) Init(name string, src io.Reader)
- func (s *Scanner) InitFromBytes(name string, src []byte)
- func (s *Scanner) InitFromString(name string, src string)
- func (s *Scanner) Keep()
- func (s *Scanner) Peek(i int) rune
- func (s *Scanner) Skip()
- func (s *Scanner) Undo()
type StrRule
- func NewStrRule(begin rune, end rune) StrRule
- func (r StrRule) Eval(s *Scanner) bool
- func (r StrRule) WithEscape(rn rune) StrRule
- func (r StrRule) WithEscapeRules(rules ...Rule) StrRule
- func (r StrRule) WithMaxLen(l uint) StrRule
- func (r StrRule) WithMultiline(b bool) StrRule
- func (r StrRule) WithNesting(t bool) StrRule
- func (r StrRule) WithOptionalTerminator(t bool) StrRule
- func (r StrRule) WithType(t string) StrRule
type Test
- func NewTest(src string, val string, line int, col int, type_ string) Test
- func (t Test) And(val string, line int, col int, type_ string) Test
- func (t Test) WithError(message string) Test
type Token
- func (t Token) Equal(t2 Token) bool
- func (t Token) IsEndOfText() bool
- func (t Token) IsValid() bool
- func (t Token) String() string
type WhileRule
- func NewWhileRule(space Class, type_ string) WhileRule
- func (r WhileRule) Eval(s *Scanner) bool
- func (r WhileRule) WithKeep(b bool) WhileRule

Constants ¶

View Source

const (
	BinType       = "bin"
	CommentType   = "comment"
	EndOfTextType = "end-of-text"
	ErrorType     = "error"
	HexType       = "hex"
	IdentType     = "ident"
	IllegalType   = "illegal"
	IntType       = "int"
	OctType       = "oct"
	RealType      = "real"
	SpaceType     = "space"
	StrType       = "str"
	WordType      = "word"
)

View Source

const (
	EndOfText = rune(-1)
)

Variables ¶

View Source

var (
	Hex2EncRule = NewHexEncRule('x', 2)
	Hex4EncRule = NewHexEncRule('u', 4)
	Hex8EncRule = NewHexEncRule('U', 8)
)

View Source

var (
	StrDoubleQuoteRule = NewStrRule('"', '"').WithEscape('\\')
	StrSingleQuoteRule = NewStrRule('\'', '\'').WithEscape('\\')
)

View Source

var (
	SkipSpaceRule = NewWhileRule(IsSpace, SpaceType).WithKeep(false)
	KeepSpaceRule = NewWhileRule(IsSpace, SpaceType)
	WordRule      = NewWhileRule(Not(IsSpace), WordType)
)

View Source

var OctEnc = NewOctEncRule()

View Source

var TrueRule = trueRule{}

Functions ¶

func All ¶ added in v0.2.0

func All(s *Scanner) string

func Escape ¶

func Escape(ch rune) string

func FormatTokenTable ¶

func FormatTokenTable(ts []Token) string

func Line ¶ added in v0.2.0

func Line(s *Scanner) string

func Quote ¶

func Quote(s string) string

func QuoteRune ¶

func QuoteRune(r rune) string

func Repeat ¶

func Repeat(fn func(), n int)

Repeat calls function fn in a loop for n times.

func RunTests ¶

func RunTests(t *testing.T, rules RuleSet, tests []Test)

func Space ¶ added in v0.2.0

func Space(s *Scanner)

func UnexpectedRune ¶

func UnexpectedRune() func(*Scanner)

func UnexpectedUntil ¶

func UnexpectedUntil(c Class) func(*Scanner)

func Until ¶

func Until(s *Scanner, c Class, fn func())

Until calls function fn until this rune is a member of class c.

func UntilRule ¶ added in v0.1.0

func UntilRule(s *Scanner, r Rule, fn func())

func While ¶

func While(s *Scanner, c Class, fn func())

While calls function fn while this rune is a member of class c.

func Word ¶ added in v0.2.0

func Word(s *Scanner) string

Types ¶

type CharEnc ¶

type CharEnc struct {
	From rune
	To   rune
}

var (
	AlertEnc          CharEnc = NewCharEnc('a', '\a')
	BackspaceEnc      CharEnc = NewCharEnc('b', '\b')
	BellEnc           CharEnc = AlertEnc
	CarriageReturnEnc CharEnc = NewCharEnc('r', '\r')
	FormFeedEnc       CharEnc = NewCharEnc('f', '\f')
	HorizontalTabEnc  CharEnc = NewCharEnc('t', '\t')
	LineFeedEnc       CharEnc = NewCharEnc('n', '\n')
	NewlineEnc        CharEnc = LineFeedEnc
	TabEnc            CharEnc = HorizontalTabEnc
	VerticalTabEnc    CharEnc = NewCharEnc('v', '\v')
)

func NewCharEnc ¶

func NewCharEnc(from rune, to rune) CharEnc

type CharEncRule ¶

type CharEncRule struct {
	// contains filtered or unexported fields
}

func NewCharEncRule ¶

func NewCharEncRule(maps ...CharEnc) CharEncRule

func (CharEncRule) Eval ¶

func (r CharEncRule) Eval(s *Scanner) bool

type Class ¶

type Class func(rune) bool

Class returns true if the given rune is a member.

var (
	// IsAny returns true as long as there is a rune available in the input
	// stream.
	IsAny Class = func(r rune) bool {
		return r != EndOfText
	}

	// IsCurrency returns true when given a rune that is a currency symbol as
	// defined by Unicode.
	IsCurrency Class = func(r rune) bool {
		return unicode.Is(unicode.Sc, r)
	}

	// IsDigit returns true when given a digit as defined by Unicode.
	IsDigit Class = unicode.IsDigit

	// IsDigit01 returns true when given a valid binary digit.
	IsDigit01 Class = Rune('0', '1')

	// IsDigit07 returns true when given a valid octal digit.
	IsDigit07 Class = Range('0', '7')

	// IsDigit09 returns true when given a valid decimal digit.
	IsDigit09 Class = Range('0', '9')

	// IsDigit0F returns true when given a valid hexadecimal digit.
	IsDigit0F Class = Or(
		IsDigit09,
		Range('a', 'f'),
		Range('A', 'F'),
	)

	// IsLetter returns true when given rune is a letter as defined by Unicode.
	IsLetter Class = unicode.IsLetter

	// IsLetterAZ returns true when given letters from the Latin alphabet.
	IsLetterAZ Class = Or(
		Range('a', 'z'),
		Range('A', 'Z'),
	)

	// IsLetterUnder returns true when given letters as defined by Unicode
	// or an underscore.
	IsLetterUnder = Or(
		IsLetter,
		Rune('_'),
	)

	// IsLetterDigitUnder returns true when given letters as digits as defined
	// by Unicode or an underscore.
	IsLetterDigitUnder = Or(
		IsLetterUnder,
		IsDigit,
	)

	// IsNone always returns false.
	IsNone Class = func(r rune) bool {
		return false
	}

	// IsPrintable returns true when given a rune that is printable as defined
	// by Unicode.
	IsPrintable Class = unicode.IsPrint

	// IsRune8 returns true when given a rune that can be represented by
	// an 8-bit number.
	IsRune8 Class = Range(0, 0xff)

	// IsRune16 returns true when given a rune that can be represented by
	// a 16-bit number.
	IsRune16 Class = Range(0, 0xffff)

	// IsSign returns true when the rune is a positive (+) or negative (-)
	// numeric symbol.
	IsSign Class = Rune('+', '-')

	// IsSpace returns true when the given rune is whitespace as
	// defined by Unicode.
	IsSpace Class = unicode.IsSpace
)

func Not ¶

func Not(c Class) Class

Not returns a Class function that returns true when given a rune that does not match class c.

Example ¶

isA := Rune('a')
isNotA := Not(isA)
fmt.Printf("%c: %v\n", 'a', isNotA('a'))
fmt.Printf("%c: %v\n", 'b', isNotA('b'))

Output:

a: false
b: true

func Or ¶

func Or(cs ...Class) Class

Or returns a Class that returns true when given a rune that matches any class found in cs.

Example ¶

isLowerAZ := Range('a', 'z')
isUpperAZ := Range('A', 'Z')
isLetterAZ := Or(isLowerAZ, isUpperAZ)
fmt.Printf("%c: %v\n", 'f', isLetterAZ('f'))
fmt.Printf("%c: %v\n", 'F', isLetterAZ('F'))
fmt.Printf("%c: %v\n", '4', isLetterAZ('4'))

Output:

f: true
F: true
4: false

func Range ¶

func Range(from rune, to rune) Class

Range returns a Class function that returns true when given a rune that is between from and to inclusive.

Example ¶

isDigit09 := Range('0', '9')
fmt.Printf("%c: %v\n", '3', isDigit09('3'))
fmt.Printf("%c: %v\n", '6', isDigit09('6'))
fmt.Printf("%c: %v\n", 'a', isDigit09('a'))

Output:

3: true
6: true
a: false

func Rune ¶

func Rune(rs ...rune) Class

Rune returns a Class function that returns true when given any rune found in rs.

Example ¶

isAB := Rune('a', 'b')
fmt.Printf("%c: %v\n", 'a', isAB('a'))
fmt.Printf("%c: %v\n", 'b', isAB('b'))
fmt.Printf("%c: %v\n", 'c', isAB('c'))

Output:

a: true
b: true
c: false

type ClassRule ¶ added in v0.1.0

type ClassRule struct {
	// contains filtered or unexported fields
}

func NewClassRule ¶ added in v0.1.0

func NewClassRule(isClass Class) ClassRule

func (ClassRule) Eval ¶ added in v0.1.0

func (r ClassRule) Eval(s *Scanner) bool

func (ClassRule) WithType ¶ added in v0.1.0

func (r ClassRule) WithType(t string) ClassRule

type CommentRule ¶

type CommentRule struct {
	// contains filtered or unexported fields
}

func NewCommentRule ¶

func NewCommentRule(begin LiteralRule, end LiteralRule) CommentRule

func (CommentRule) Eval ¶

func (r CommentRule) Eval(s *Scanner) bool

func (CommentRule) WithKeep ¶

func (r CommentRule) WithKeep(keep *bool) CommentRule

type Error ¶

type Error struct {
	Pos     Pos
	Message string
	Cause   error
}

func (Error) Equal ¶ added in v0.1.0

func (e Error) Equal(e2 Error) bool

func (Error) Error ¶

func (e Error) Error() string

func (Error) Unwrap ¶

func (e Error) Unwrap() error

type Errors ¶

type Errors []Error

func (Errors) Equal ¶ added in v0.1.0

func (es Errors) Equal(es2 Errors) bool

func (Errors) Error ¶

func (es Errors) Error() string

type HexEncRule ¶

type HexEncRule struct {
	// contains filtered or unexported fields
}

func NewHexEncRule ¶

func NewHexEncRule(flag rune, digits int) HexEncRule

func (HexEncRule) AsByte ¶ added in v0.0.1

func (r HexEncRule) AsByte(b bool) HexEncRule

func (HexEncRule) Eval ¶

func (r HexEncRule) Eval(s *Scanner) bool

type IdentRule ¶

type IdentRule struct {
	// contains filtered or unexported fields
}

var (
	StandardIdentRule IdentRule = NewIdentRule(IsLetterUnder, IsLetterDigitUnder)
)

func NewIdentRule ¶

func NewIdentRule(isHead Class, isTail Class) IdentRule

func (IdentRule) Eval ¶

func (r IdentRule) Eval(s *Scanner) bool

func (IdentRule) WithKeywords ¶

func (r IdentRule) WithKeywords(ks ...string) IdentRule

type LiteralRule ¶

type LiteralRule struct {
	// contains filtered or unexported fields
}

func Literal ¶

func Literal(lits ...string) LiteralRule

func (LiteralRule) Eval ¶

func (r LiteralRule) Eval(s *Scanner) bool

func (LiteralRule) WithSkip ¶

func (r LiteralRule) WithSkip(b bool) LiteralRule

type NumRule ¶

type NumRule struct {
	// contains filtered or unexported fields
}

var (
	BinRule           NumRule = NewNumRule(IsDigit01).WithIntType(BinType)
	Bin0bRule         NumRule = BinRule.WithPrefix(Literal("0b", "0B"))
	HexRule           NumRule = NewNumRule(IsDigit0F).WithIntType(HexType)
	Hex0xRule         NumRule = HexRule.WithPrefix(Literal("0x", "0X"))
	OctRule           NumRule = NewNumRule(IsDigit07).WithIntType(OctType)
	Oct0oRule         NumRule = OctRule.WithPrefix(Literal("0o", "0O"))
	IntRule           NumRule = NewNumRule(IsDigit09)
	RealRule          NumRule = IntRule.WithDecSep(Rune('.'))
	RealExpRule       NumRule = RealRule.WithExp(Rune('e', 'E')).WithExpSign(IsSign)
	SignedIntRule     NumRule = IntRule.WithSign(IsSign)
	SignedRealRule    NumRule = RealRule.WithSign(IsSign)
	SignedRealExpRule NumRule = RealExpRule.WithSign(IsSign)
)

func NewNumRule ¶

func NewNumRule(digit Class) NumRule

func (NumRule) Eval ¶

func (r NumRule) Eval(s *Scanner) bool

func (NumRule) WithDecSep ¶

func (r NumRule) WithDecSep(c Class) NumRule

func (NumRule) WithDigitSep ¶

func (r NumRule) WithDigitSep(c Class) NumRule

func (NumRule) WithEmptyPartsAllowed ¶

func (r NumRule) WithEmptyPartsAllowed(b bool) NumRule

func (NumRule) WithExp ¶

func (r NumRule) WithExp(c Class) NumRule

func (NumRule) WithExpSign ¶

func (r NumRule) WithExpSign(c Class) NumRule

func (NumRule) WithIntType ¶

func (r NumRule) WithIntType(t string) NumRule

func (NumRule) WithLeadingDigitSepAllowed ¶

func (r NumRule) WithLeadingDigitSepAllowed(b bool) NumRule

func (NumRule) WithLeadingZeroAllowed ¶

func (r NumRule) WithLeadingZeroAllowed(b bool) NumRule

func (NumRule) WithPrefix ¶

func (r NumRule) WithPrefix(rule Rule) NumRule

func (NumRule) WithRealType ¶

func (r NumRule) WithRealType(t string) NumRule

func (NumRule) WithSign ¶

func (r NumRule) WithSign(c Class) NumRule

func (NumRule) WithSuffix ¶

func (r NumRule) WithSuffix(rules ...Rule) NumRule

type OctEncRule ¶

type OctEncRule struct {
}

func NewOctEncRule ¶

func NewOctEncRule() OctEncRule

func (OctEncRule) Eval ¶

func (r OctEncRule) Eval(s *Scanner) bool

type Pos ¶

type Pos struct {
	Name string `json:"name,omitempty"`
	Line int    `json:"line"`
	Col  int    `json:"col"`
}

Pos represents a position within an input stream.

func NewPos ¶

func NewPos(name string) Pos

NewPos returns a new position with the given name and the line number and column number set to one.

func (Pos) String ¶

func (p Pos) String() string

type Rule ¶

type Rule interface {
	Eval(*Scanner) bool
}

type RuleSet ¶

type RuleSet struct {
	// contains filtered or unexported fields
}

func NewRuleSet ¶

func NewRuleSet(rules ...Rule) RuleSet

func (RuleSet) Eval ¶

func (r RuleSet) Eval(s *Scanner) bool

func (RuleSet) Next ¶

func (r RuleSet) Next(s *Scanner) Token

func (RuleSet) WithNoMatchFunc ¶

func (r RuleSet) WithNoMatchFunc(fn func(*Scanner)) RuleSet

func (RuleSet) WithPostTokenFunc ¶

func (r RuleSet) WithPostTokenFunc(fn func(*Scanner, Token) Token) RuleSet

func (RuleSet) WithPreTokenFunc ¶

func (r RuleSet) WithPreTokenFunc(fn func(*Scanner)) RuleSet

type Runner ¶

type Runner struct {
	Rules RuleSet
	This  Token
	Next  Token
	// contains filtered or unexported fields
}

func NewRunner ¶

func NewRunner(scan *Scanner, rules RuleSet) *Runner

func (*Runner) All ¶

func (r *Runner) All() []Token

func (*Runner) HasMore ¶

func (r *Runner) HasMore() bool

func (*Runner) Scan ¶ added in v0.1.0

func (r *Runner) Scan() Token

type Scanner ¶

type Scanner struct {
	This rune
	Next rune
	Val  strings.Builder
	Lit  strings.Builder
	Pos  Pos
	Errs Errors
	Type string
	// contains filtered or unexported fields
}

func NewScanner ¶

func NewScanner(name string, src io.Reader) *Scanner

func NewScannerFromBytes ¶

func NewScannerFromBytes(name string, src []byte) *Scanner

func NewScannerFromString ¶

func NewScannerFromString(name string, src string) *Scanner

func (*Scanner) Discard ¶

func (s *Scanner) Discard()

Discard advances the string to the next rune without adding the current rune to the token.

func (*Scanner) Emit ¶

func (s *Scanner) Emit() Token

Emit returns the token that has been built and resets the builder for the next token.

func (*Scanner) Eval ¶ added in v0.2.0

func (s *Scanner) Eval(r Rule) (Token, bool)

func (*Scanner) HasMore ¶

func (s *Scanner) HasMore() bool

HasMore return true if there is more data to consume in the stream.

func (*Scanner) Illegal ¶

func (s *Scanner) Illegal(format string, args ...any)

func (*Scanner) Init ¶

func (s *Scanner) Init(name string, src io.Reader)

func (*Scanner) InitFromBytes ¶

func (s *Scanner) InitFromBytes(name string, src []byte)

func (*Scanner) InitFromString ¶

func (s *Scanner) InitFromString(name string, src string)

func (*Scanner) Keep ¶

func (s *Scanner) Keep()

Keep advances the stream to the next rune and adds the current rune to the token value and literal.

func (*Scanner) Peek ¶

func (s *Scanner) Peek(i int) rune

func (*Scanner) Skip ¶

func (s *Scanner) Skip()

Skip advances the string to the next rune without adding the current rune to the token value. This current rune is written to the literal value.

func (*Scanner) Undo ¶

func (s *Scanner) Undo()

type StrRule ¶

type StrRule struct {
	// contains filtered or unexported fields
}

func NewStrRule ¶

func NewStrRule(begin rune, end rune) StrRule

func (StrRule) Eval ¶

func (r StrRule) Eval(s *Scanner) bool

func (StrRule) WithEscape ¶

func (r StrRule) WithEscape(rn rune) StrRule

func (StrRule) WithEscapeRules ¶

func (r StrRule) WithEscapeRules(rules ...Rule) StrRule

func (StrRule) WithMaxLen ¶

func (r StrRule) WithMaxLen(l uint) StrRule

func (StrRule) WithMultiline ¶

func (r StrRule) WithMultiline(b bool) StrRule

func (StrRule) WithNesting ¶ added in v0.3.0

func (r StrRule) WithNesting(t bool) StrRule

func (StrRule) WithOptionalTerminator ¶ added in v0.3.0

func (r StrRule) WithOptionalTerminator(t bool) StrRule

func (StrRule) WithType ¶

func (r StrRule) WithType(t string) StrRule

type Test ¶

type Test struct {
	// contains filtered or unexported fields
}

func NewTest ¶

func NewTest(src string, val string, line int, col int, type_ string) Test

func (Test) And ¶

func (t Test) And(val string, line int, col int, type_ string) Test

func (Test) WithError ¶

func (t Test) WithError(message string) Test

type Token ¶

type Token struct {
	Val  string `json:"val"`
	Lit  string `json:"lit,omitempty"`
	Type string `json:"type"`
	Pos  Pos    `json:"pos"`
	Errs Errors `json:"errs,omitempty"`
}

func (Token) Equal ¶ added in v0.1.0

func (t Token) Equal(t2 Token) bool

func (Token) IsEndOfText ¶ added in v0.1.0

func (t Token) IsEndOfText() bool

func (Token) IsValid ¶

func (t Token) IsValid() bool

func (Token) String ¶

func (t Token) String() string

type WhileRule ¶

type WhileRule struct {
	// contains filtered or unexported fields
}

func NewWhileRule ¶ added in v0.2.0

func NewWhileRule(space Class, type_ string) WhileRule

func (WhileRule) Eval ¶ added in v0.2.0

func (r WhileRule) Eval(s *Scanner) bool

func (WhileRule) WithKeep ¶ added in v0.2.0

func (r WhileRule) WithKeep(b bool) WhileRule

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
scan-go
scan-json
iotest
peek
scango
scanjson

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL