scanner

package
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 26, 2023 License: MIT Imports: 2 Imported by: 0

Documentation

Overview

Package scanner provides a scanner and tokenizer for UTF-8-encoded text. It takes an io.Reader providing the source, which then can be tokenized through repeated calls to the Scan function. For compatibility with existing tools, the NUL character is not allowed. If the first character in the source is a UTF-8 encoded byte order mark (BOM), it is discarded.

By default, a Scanner skips white space and Go comments and recognizes all literals as defined by the Go language specification. It may be customized to recognize only a subset of those literals and to recognize different white space characters.

Basic usage pattern:

var s scanner.Scanner
s.Init(src)
tok := s.Scan()
for tok != scanner.EOF {
	// do something with tok
	tok = s.Scan()
}

Index

Constants

View Source
const (
	ScanIdents     = 1 << -Ident
	ScanInts       = 1 << -Int
	ScanFloats     = 1 << -Float
	ScanChars      = 1 << -Char
	ScanStrings    = 1 << -String
	ScanRawStrings = 1 << -RawString
	ScanComments   = 1 << -Comment
	SkipComments   = 1 << -skipComment
	GoTokens       = ScanIdents | ScanFloats | ScanChars | ScanStrings | ScanRawStrings | ScanComments | SkipComments
)

Predefined mode bits to control recognition of tokens. For instance, to configure a Scanner such that it only recognizes (Go) identifiers, integers, and skips comments, set the Scanner's Mode field to:

ScanIdents | ScanInts | SkipComments
View Source
const (
	EOF = -(iota + 1)
	Ident
	Int
	Float
	Char
	String
	RawString
	Comment
)

The result of Scan is one of the following tokens or a Unicode character.

View Source
const GoWhitespace = 1<<'\t' | 1<<'\n' | 1<<'\r' | 1<<' '

GoWhitespace is the default value for the Scanner's Whitespace field. Its value selects Go's white space characters.

Variables

This section is empty.

Functions

func TokenString

func TokenString(tok rune) string

TokenString returns a printable string for a token or Unicode character.

Types

type Position

type Position struct {
	Filename string
	Offset   int
	Line     int
	Column   int
}

A source position is represented by a Position value. A position is valid if Line > 0.

func (*Position) IsValid

func (pos *Position) IsValid() bool

IsValid returns true if the position is valid.

func (Position) String

func (pos Position) String() string

type Scanner

type Scanner struct {
	Error func(s *Scanner, msg string)

	ErrorCount int

	Mode uint

	Whitespace uint64

	Position
	// contains filtered or unexported fields
}

A Scanner implements reading of Unicode characters and tokens from an io.Reader.

func (*Scanner) Init

func (s *Scanner) Init(src io.Reader) *Scanner

Init initializes a Scanner with a new source and returns s. Error is set to nil, ErrorCount is set to 0, Mode is set to GoTokens, and Whitespace is set to GoWhitespace.

func (*Scanner) Next

func (s *Scanner) Next() rune

Next reads and returns the next Unicode character. It returns EOF at the end of the source. It reports a read error by calling s.Error, if not nil; otherwise it prints an error message to os.Stderr. Next does not update the Scanner's Position field; use Pos() to get the current position.

func (*Scanner) Peek

func (s *Scanner) Peek() rune

Peek returns the next Unicode character in the source without advancing the scanner. It returns EOF if the scanner's position is at the last character of the source.

func (*Scanner) Pos

func (s *Scanner) Pos() (pos Position)

Pos returns the position of the character immediately after the character or token returned by the last call to Next or Scan.

func (*Scanner) Scan

func (s *Scanner) Scan() rune

Scan reads the next token or Unicode character from source and returns it. It only recognizes tokens t for which the respective Mode bit (1<<-t) is set. It returns EOF at the end of the source. It reports scanner errors (read and token errors) by calling s.Error, if not nil; otherwise it prints an error message to os.Stderr.

func (*Scanner) TokenText

func (s *Scanner) TokenText() string

TokenText returns the string corresponding to the most recently scanned token. Valid after calling Scan().

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL