scanner

package module
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 5, 2023 License: MIT Imports: 4 Imported by: 2

README

logo

Custom Go text token scanner implementation

Package scanner is a custom text scanner implementation. It has the same idiomatic Go scanner programming interface, and it lets the client to freely navigate the buffer. The scanner is also capable of peeking ahead of the cursor. Read runes are rendered as tokens with additional information on their position in the buffer. Consult the package documentation or see Usage to see how to use it.

Installation

Use the following command to add the package to an existing project.

go get github.com/mdm-code/scanner

Usage

Here is a snippet showing the basic usage of the scanner to read text as a stream of tokens using the public API of the scanner package.

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"

    "github.com/mdm-code/scanner"
)

func main() {
    r := bufio.NewReader(os.Stdin)
    s, err := scanner.New(r)
    if err != nil {
        log.Fatalln(err)
    }
    var ts []scanner.Token
    for s.Scan() {
        t := s.Token()
        ts = append(ts, t)
    }
    fmt.Println(ts)
}

Development

Consult Makefile to see how to format, examine code with go vet, run unit test, run code linter with golint in order to get test coverage and check if the package builds all right.

Remember to install golint before you try to run tests and test the build:

go install golang.org/x/lint/golint@latest

License

Copyright (c) 2023 Michał Adamczyk.

This project is licensed under the MIT license. See LICENSE for more details.

Documentation

Overview

Package scanner is a custom text scanner implementation. It has the same idiomatic Go scanner programming interface, and it lets the client to freely navigate the buffer. The scanner is also capable of peeking ahead of the cursor. Read runes are rendered as tokens with additional information on their position in the buffer.

Usage

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"

	"github.com/mdm-code/scanner"
)

func main() {
	r := bufio.NewReader(os.Stdin)
	s, err := scanner.New(r)
	if err != nil {
		log.Fatalln(err)
	}
	var ts []scanner.Token
	for s.Scan() {
		t := s.Token()
		ts = append(ts, t)
	}
	fmt.Println(ts)
}

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrNilIOReader error = errors.New("provided io.Reader is nil")

ErrNilIOReader indicates that the parameter passed to an attribute of the inteface type io.Reader has a nil value.

View Source
var ErrRuneError error = errors.New("Unicode replacement character found")

ErrRuneError says that UTF-8 Unicode replacement character was encountered by the Scanner.

View Source
var Zero = Pos{Rune: '\u0000', Start: 0, End: 0}

Zero represents the initial state of the Scanner with the cursor pointing at the start of the byte buffer.

Functions

This section is empty.

Types

type Pos

type Pos struct {
	Rune       rune
	Start, End int
}

Pos carries information about the position of the rune in the byte buffer.

func (Pos) String

func (p Pos) String() string

String returns a text representation of the Pos.

type Scanner

type Scanner struct {
	Buffer []byte
	Errors []error
	Cursor Pos
}

Scanner encapsulates the logic of scanning runes from a text file. Its instance is stateful and unsafe to use across multiple threads.

func New

func New(r io.Reader) (*Scanner, error)

New creates an instance of the Scanner in its initial state.

func (*Scanner) Errored added in v1.2.1

func (s *Scanner) Errored() bool

Errored reports if the Scanner encountered errors while scanning the underlying byte buffer.

Example

ExampleScanner_Errored shows how check if errors were encountered while scanning the read text buffer.

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/mdm-code/scanner"
)

func main() {
	text := "Hello!"
	r := strings.NewReader(text)
	s, err := scanner.New(r)
	if err != nil {
		log.Fatal(err)
	}
	s.Errors = append(s.Errors, scanner.ErrRuneError)
	fmt.Println(s.Errored())
}
Output:

true

func (*Scanner) Goto

func (s *Scanner) Goto(t Token)

Goto moves the cursor of the Scanner to the position of the t Token.

Example

ExampleScanner_Goto shows how an already emitted token can be used to move the cursor of the scanner back to the position it's pointing at.

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/mdm-code/scanner"
)

func main() {
	r := strings.NewReader("Hello!")
	s, err := scanner.New(r)
	if err != nil {
		log.Fatal(err)
	}

	var final scanner.Token
	for s.Scan() {
		if curr := s.Token(); curr.Rune == 'e' {
			final = curr
		}
	}
	s.Goto(final)
	fmt.Println(s.Token())
}
Output:

{ e 1:2 }

func (*Scanner) Peek

func (s *Scanner) Peek(v string) bool

Peek reports whether the v string matches the byte buffer from the position currently pointed at by the cursor. It returns true if there is a match. It returns false either if there is no match or the provided v string goes beyond the length of the buffer. It does not advance the Scanner.

Example

ExampleScanner_Peek shows how to peek ahead of the scanner cursor to see whether the buffer ahead matches the provided string.

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/mdm-code/scanner"
)

func main() {
	r := strings.NewReader("There's a match!")
	s, err := scanner.New(r)
	if err != nil {
		log.Fatal(err)
	}
	for s.Scan() {
		if t := s.Token(); t.Rune == 's' {
			break
		}
	}
	result := s.Peek(" a match!")
	fmt.Println(result)
}
Output:

true

func (*Scanner) Reset

func (s *Scanner) Reset()

Reset puts the Scanner back in its initial state with the cursor pointing at the start of the byte buffer and clears all the recored scanner errors.

Example

ExampleScanner_Reset shows how to reset the scanner back to its initial, zero state. In the example, tokens produced by the scanner the usual way are discarded, and then the scanner gets reset back to its initial state.

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/mdm-code/scanner"
)

func main() {
	r := strings.NewReader("Hello!")
	s, err := scanner.New(r)
	if err != nil {
		log.Fatal(err)
	}
	var t scanner.Token
	for s.Scan() {
	}
	s.Reset()
	s.Scan()
	t = s.Token()
	fmt.Println(t)
}
Output:

{ H 0:1 }

func (*Scanner) Scan

func (s *Scanner) Scan() bool

Scan advances the cursor of the Scanner by a single UTF-8 encoded Unicode character. The method returns a boolean value so that is can be used idiomatically the same way other scanners in the standard Go library are used.

Example

ExampleScanner_Scan shows how to translate text into a list of tokens with the Scanner public API. It combines New, Scan and Token to get a slice of tokens matching the provided "Hello\!" input.

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/mdm-code/scanner"
)

func main() {
	in := "Hello!"
	r := strings.NewReader(in)
	s, err := scanner.New(r)
	if err != nil {
		log.Fatal(err)
	}

	var ts = []scanner.Token{}
	for s.Scan() {
		t := s.Token()
		ts = append(ts, t)
	}
	fmt.Println(ts)
}
Output:

[{ H 0:1 } { e 1:2 } { l 2:3 } { l 3:4 } { o 4:5 } { ! 5:6 }]

func (*Scanner) ScanAll added in v1.2.1

func (s *Scanner) ScanAll() ([]Token, bool)

ScanAll scans all Tokens representing UTF-8 encoded Unicode characters from the byte buffer underlying the Scanner.

Example

ExampleScanner_ScanAll shows how to convert text into a list of tokens with a single method call to ScanAll() instead of using a for loop to traverse the input one token at a time.

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/mdm-code/scanner"
)

func main() {
	in := "Hello!"
	r := strings.NewReader(in)
	s, err := scanner.New(r)
	if err != nil {
		log.Fatal(err)
	}
	ts, ok := s.ScanAll()
	if !ok {
		log.Fatal(s.Errors[0])
	}
	fmt.Println(ts)
}
Output:

[{ H 0:1 } { e 1:2 } { l 2:3 } { l 3:4 } { o 4:5 } { ! 5:6 }]

func (*Scanner) Token

func (s *Scanner) Token() Token

Token returns the Token currently pointed at by the cursor of the Scanner.

type Token

type Token struct {
	Pos
	Buffer *[]byte
}

Token represents a single rune read from the byte buffer.

func (Token) Position

func (t Token) Position() Pos

Position returns the position of the recorded character in the byte buffer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL