scan

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 22, 2026 License: BSD-3-Clause Imports: 7 Imported by: 0

README

Scanner

A scanner takes a string in input and returns an array of tokens.

graph LR
s[ ] --> |source| a(scanner)
--> |tokens| b(parser)
--> |AST| c[ ]
subgraph scanner
    a
end
style s height:0px;
style c height:0px;

Tokens can be of the following kinds:

  • identifier
  • number
  • operator
  • separator
  • string
  • block

Resolving nested blocks in the scanner is making the parser simple and generic, without having to resort to parse tables.

The lexical rules are provided by a language specification at language level which includes the following:

  • a set of composable properties (1 per bit, on an integer) for each character in the ASCII range (where all separator, operators and reserved keywords must be defined).
  • for each block or string, the specification of starting and ending delimiter.

Development status

A successful test must be provided to check the status.

  • numbers starting with a digit
  • numbers starting otherwise
  • unescaped strings (including multiline)
  • escaped string (including multiline)
  • separators (in UTF-8 range)
  • single line string (\n not allowed)
  • identifiers (in UTF-8 range)
  • operators, concatenated or not
  • single character block/string delimiters
  • arbitrarly nested blocks and strings
  • multiple characters block/string delimiters
  • semi-colon automatic insertion after newline
  • blocks delimited by operator characters
  • blocks delimited by identifiers
  • blocks with delimiter inclusion/exclusion rules
  • blocks delimited by indentation level (python, yaml, ...)

Documentation

Overview

Package scan provide a language independent scanner.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrBlock   = errors.New("block not terminated")
	ErrIllegal = errors.New("illegal token")
)

Error definitions.

Functions

This section is empty.

Types

type Scanner

type Scanner struct {
	*lang.Spec
	Sources Sources // source position registry (multi-file / REPL)
	PosBase int     // base offset for current source
	// contains filtered or unexported fields
}

Scanner contains the scanner rules for a language.

func NewScanner

func NewScanner(spec *lang.Spec) *Scanner

NewScanner returns a new scanner for a given language specification.

func (*Scanner) Next

func (sc *Scanner) Next(src string) (tok Token, err error)

Next returns the next token in string.

func (*Scanner) Scan

func (sc *Scanner) Scan(src string, semiEOF bool) (tokens []Token, err error)

Scan performs a lexical analysis on src and returns tokens or an error.

type Source

type Source struct {
	Name string
	Base int // base byte offset in the unified position space
	Len  int // length in bytes
	// contains filtered or unexported fields
}

Source describes a source text.

func (*Source) Content added in v0.2.0

func (s *Source) Content() string

Content returns the source text.

func (*Source) Lines added in v0.2.0

func (s *Source) Lines() int

Lines returns the number of lines in the source. An empty source has zero lines; a source whose last character is '\n' is not counted as having a trailing empty line.

type Sources

type Sources []Source

Sources is an ordered list of Source entries.

func (*Sources) Add

func (ss *Sources) Add(name, src string) int

Add registers a new source and returns its base offset.

func (Sources) ByName added in v0.2.0

func (ss Sources) ByName(name string) *Source

ByName returns the source with the given name, or nil if not found. When multiple sources share a name (e.g. the REPL's anonymous chunks), the first registered match is returned.

func (Sources) FormatPos

func (ss Sources) FormatPos(pos int) string

FormatPos converts a global byte offset to a "[file:]line:col" string.

func (Sources) LineText added in v0.2.0

func (ss Sources) LineText(pos int) string

LineText returns the source line containing pos, without the trailing newline. Returns "" if pos is out of range.

func (Sources) Resolve

func (ss Sources) Resolve(pos int) (name string, line, col int)

Resolve converts a global byte offset to (source name, line, col). Returns ("", 0, 0) if pos is out of range.

func (Sources) Snippet added in v0.3.0

func (ss Sources) Snippet(pos int) string

Snippet renders the source line containing pos plus a caret pointing at pos, as

\n  <line> | <text>\n  <pad>^\n

Returns "" when pos has no resolvable source line. Shared by the runtime PanicError renderer (vm) and compile-time diagnostics (interp.Eval).

type Token

type Token struct {
	Tok lang.Token // token identificator
	Pos int        // position in source
	Str string     // string in source
	Beg int        // length of begin delimiter (block, string)
	End int        // length of end delimiter (block, string)
}

Token defines a scanner token.

func (*Token) Block

func (t *Token) Block() string

Block return the block content of t.

func (*Token) Describe added in v0.3.0

func (t *Token) Describe() string

Describe returns a short human-readable form of t for diagnostics: the token kind plus a quoted, whitespace-collapsed, truncated snippet of its source.

func (*Token) Name

func (t *Token) Name() string

Name return the name of t (short string for debugging).

func (*Token) Prefix

func (t *Token) Prefix() string

Prefix returns the block starting delimiter of t.

func (*Token) String

func (t *Token) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL