common

package
v0.0.0-...-bf3c7c9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2023 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Overview

Package common contains common definitions and routines for the Hydra parser. This includes definitions of character classes, errors, locations, standard tokens, and the Profile, which enables dynamic changes to the way the parser functions. The Profile, in particular, allows for relatively easy versioning of the Hydra language.

Character classes are in classes.go; common errors, in errors.go. The Location class exists in locations.go; and options, which houses the Profile, is in options.go. The Profile itself is defined in profile.go, and basic interfaces, such as the one defining a scanner, are in interfaces.go.

The basic tokens are defined in tokens.go, with identifiers.go, operators.go, and strings.go containing the code for describing those token types. (The identifiers.go file contains code associated with keywords, which are recognized by the identifiers recognizer in the lexer.)

Index

Constants

View Source
const (
	EOF rune = -(iota + 1) // End of file
	Err                    // An error occurred
)

Special character constants.

View Source
const (
	CharWS       uint16 = 1 << iota // Whitespace characters
	CharNL                          // Newline character
	CharBinDigit                    // Binary digit
	CharOctDigit                    // Octal digit
	CharDecDigit                    // Decimal digit
	CharHexDigit                    // Hexadecimal digit
	CharIDStart                     // Valid character for ID start
	CharIDCont                      // Valid character for ID continue
	CharStrFlag                     // String flag character
	CharQuote                       // String quote character
	CharComment                     // Comment character
)

Defined character classes.

View Source
const (
	StrRaw    uint8 = 1 << iota // Raw strings, ignores escapes
	StrBytes                    // Byte strings
	StrMulti                    // Multi-line (triple-quoted) string
	StrTriple                   // Quote allows triples
)

Defined string flags

Variables

View Source
var (
	ErrSplitEntity       = errors.New("entity split across files")
	ErrBadRune           = errors.New("illegal UTF-8 encoding")
	ErrBadIndent         = errors.New("inconsistent indentation")
	ErrBadOp             = errors.New("bad operator character")
	ErrMixedIndent       = errors.New("mixed whitespace types in indent")
	ErrDanglingBackslash = errors.New("dangling backslash")
	ErrBadNumber         = errors.New("bad character for number literal")
	ErrBadEscape         = errors.New("bad escape sequence")
	ErrBadStrChar        = errors.New("invalid character for string")
	ErrUnclosedStr       = errors.New("unclosed string literal")
	ErrBadIdent          = errors.New("bad identifier character")
)

Various errors that may occur during parsing.

View Source
var (
	TokError      = &Symbol{Name: "<Error>"}
	TokEOF        = &Symbol{Name: "<EOF>"}
	TokNewline    = &Symbol{Name: "<Newline>"}
	TokIndent     = &Symbol{Name: "<Indent>"}
	TokDedent     = &Symbol{Name: "<Dedent>"}
	TokIdent      = &Symbol{Name: "<Ident>"}
	TokInt        = &Symbol{Name: "<Int>"}
	TokFloat      = &Symbol{Name: "<Float>"}
	TokString     = &Symbol{Name: "<String>"}
	TokBytes      = &Symbol{Name: "<Bytes>"}
	TokDocComment = &Symbol{Name: "<DocComment>"}
)

Standard token symbols

View Source
var CharClasses = utils.FlagSet16{
	CharWS:       "whitespace",
	CharNL:       "newline",
	CharBinDigit: "binary digit",
	CharOctDigit: "octal digit",
	CharDecDigit: "decimal digit",
	CharHexDigit: "hexadecimal digit",
	CharIDStart:  "ID start",
	CharIDCont:   "ID continue",
	CharStrFlag:  "string flag",
	CharQuote:    "quote",
	CharComment:  "comment",
}

CharClasses is a mapping of character class flags to names.

View Source
var StrFlags = utils.FlagSet8{
	StrRaw:    "raw",
	StrBytes:  "bytes",
	StrMulti:  "multi-line",
	StrTriple: "triple quote",
}

StrFlags is a mapping of string flags to names.

Functions

func ErrDanglingOpen

func ErrDanglingOpen(tok *Token) error

ErrDanglingOpen generates an error for a dangling open operator with no corresponding close operator.

func ErrNoOpen

func ErrNoOpen(sym *Symbol) error

ErrNoOpen generates an error for a close operator with no corresponding open operator.

func ErrOpMismatch

func ErrOpMismatch(openTok *Token, close *Symbol) error

ErrOpMismatch generates an error for a close operator that doesn't match the open operator.

Types

type AugChar

type AugChar struct {
	C     rune        // The character
	Class uint16      // The character's class
	Loc   Location    // The character's location
	Val   interface{} // The "value"; an integer for digits
}

AugChar is a struct that packages together a character, its class, its location, and any numeric value it may have. This is the type that the scanner returns.

type FilePos

type FilePos struct {
	L int // The line number of the position
	C int // The column number of the position
}

FilePos specifies a position within a given file.

type Keywords

type Keywords map[string]*Symbol

Keywords is a map mapping identifier strings to the symbols to use for keyword tokens.

func (Keywords) Add

func (k Keywords) Add(sym *Symbol)

Add adds a new keyword.

func (Keywords) Copy

func (k Keywords) Copy() Keywords

Copy produces a new copy of a Keywords object.

func (Keywords) Remove

func (k Keywords) Remove(sym *Symbol)

Remove removes a keyword.

type Lexer

type Lexer interface {
	// Next retrieves the next token from the scanner.  If the end
	// of file is reached, an EOF token is returned; if an error
	// occurs while scanning or lexically analyzing the file, an
	// error token is returned with the error as the token's
	// semantic value.  After either an EOF token or an error
	// token, nil will be returned.
	Next() *Token

	// Push pushes a single token back onto the lexer.  Any number
	// of tokens may be pushed back.
	Push(tok *Token)
}

Lexer is an interface describing a lexer. A lexer pulls characters from a scanner and converts them to tokens, which may then be used by the parser.

type Location

type Location struct {
	File string  // The name of the file
	B    FilePos // The beginning of the range
	E    FilePos // The end of the range
}

Location specifies the exact range of locations of some entity.

func OctEscape

func OctEscape(ch AugChar, s Scanner, flags uint8) (rune, Location, error)

OctEscape is a StrEscape that consumes octal digits and returns the specified rune.

func (*Location) Advance

func (l *Location) Advance(offset FilePos)

Advance advances a location in place. The current range end becomes the range beginning, and the range end is the sum of the new range beginning and the provided offset.

func (*Location) AdvanceTab

func (l *Location) AdvanceTab(tabstop int)

AdvanceTab advances a location in place, as if by a tab character. The argument indicates the size of a tab stop.

func (Location) String

func (l Location) String() string

String constructs a string representation of the location.

func (Location) Thru

func (l Location) Thru(other Location) Location

Thru creates a new Location that ranges from the beginning of this location to the beginning of another Location.

func (Location) ThruEnd

func (l Location) ThruEnd(other Location) Location

ThruEnd is similar to Thru, except that it creates a new Location that ranges from the beginning of this location to the ending of another Location.

type MockLexer

type MockLexer struct {
	mock.Mock
}

MockLexer is a mock object for lexers.

func (*MockLexer) Next

func (m *MockLexer) Next() *Token

Next retrieves the next token from the scanner. If the end of file is reached, an EOF token is returned; if an error occurs while scanning or lexically analyzing the file, an error token is returned with the error as the token's semantic value. After either an EOF token or an error token, nil will be returned.

func (*MockLexer) Push

func (m *MockLexer) Push(tok *Token)

Push pushes a single token back onto the lexer. Any number of tokens may be pushed back.

type MockScanner

type MockScanner struct {
	mock.Mock
}

MockScanner is a mock object for scanners.

func (*MockScanner) Next

func (m *MockScanner) Next() AugChar

Next retrieves the next rune from the file. An EOF augmented character is returned on end of file, and an Err augmented character is returned in the event of an error.

func (*MockScanner) Push

func (m *MockScanner) Push(ch AugChar)

Push pushes back a single augmented character onto the scanner. Any number of characters may be pushed back.

type Operators

type Operators struct {
	Sym *Symbol // The operator at this node
	// contains filtered or unexported fields
}

Operators is a structure for describing an operator tree. The lexer uses the operator tree to match operators, while allowing for backtracking; this enables selecting the longest match.

func NewOperators

func NewOperators(ops ...*Symbol) *Operators

NewOperators constructs an Operators tree with all the specified operators.

func (*Operators) Add

func (o *Operators) Add(op *Symbol)

Add adds an operator to the operator tree.

func (*Operators) Children

func (o *Operators) Children() []utils.Visitable

Children implements the utils.Visitable interface, allowing an operator tree to be visualized using utils.Visualize().

func (*Operators) Copy

func (o *Operators) Copy() *Operators

Copy constructs a copy of this Operators tree. The copy will contain just the subtree rooted at this node, if this node is not the root.

func (*Operators) Next

func (o *Operators) Next(r rune) *Operators

Next looks up the next node in the tree, given an operator rune. Returns nil if no corresponding node exists in the tree.

func (*Operators) Remove

func (o *Operators) Remove(op *Symbol)

Remove removes an operator from the operator tree.

func (*Operators) String

func (o *Operators) String() string

String outputs the operator tree node as a string.

type Option

type Option func(opts *Options)

Option type for option functions. Each function mutates a passed-in Options structure to set the specific option.

func Encoding

func Encoding(encoding string) Option

Encoding sets the encoding for the file being scanned. If not set, an attempt is made to guess it from the source (depends on source implementing io.Seeker), and a default of "utf-8" is used if that fails.

func Filename

func Filename(file string) Option

Filename sets the filename being scanned. If not set, an attempt is made to guess it from the source (depends on source having a Name() method returning a string), and a default is used if that fails.

func TabStop

func TabStop(tabstop int) Option

TabStop sets the size of a tab stop. If not set, it defaults to 8.

type Options

type Options struct {
	Source   io.Reader // The source from which to read
	Filename string    // The name of the file being parsed
	Encoding string    // The encoding of the source
	Prof     *Profile  // The profile
	TabStop  int       // The size of a tab stop
}

Options contains the options for the parser.

func (*Options) Advance

func (opts *Options) Advance(ch rune, loc *Location)

Advance advances the location to account for the specified character.

func (*Options) Classify

func (opts *Options) Classify(ch rune, loc Location, err error) AugChar

Classify classifies a character and composes an AugChar describing the character.

func (*Options) Parse

func (o *Options) Parse(opts ...Option)

Parse parses a series of options into the Options structure.

type Profile

type Profile struct {
	IDStart   runes.Set          // Set of valid identifier start chars
	IDCont    runes.Set          // Set of valid identifier continue chars
	StrFlags  map[rune]uint8     // Valid string flags
	Quotes    map[rune]uint8     // Valid quote characters
	Escapes   map[rune]StrEscape // String escapes
	Keywords  Keywords           // Mapping of keywords
	Norm      norm.Form          // Normalization for identifiers
	Operators *Operators         // Recognized operators
}

Profile describes a profile for the parser. A profile is simply the version-specific rules, with desired options applied, and covers such things as the sets of identifier characters, etc.

func (*Profile) Copy

func (p *Profile) Copy() *Profile

Copy generates a copy of a profile. An Options structure always contains a profile copy, to enable it to be mutated by options without accidentally changing the master profile.

type Scanner

type Scanner interface {
	// Next retrieves the next rune from the file.  An EOF
	// augmented character is returned on end of file, and an Err
	// augmented character is returned in the event of an error.
	Next() AugChar

	// Push pushes back a single augmented character onto the
	// scanner.  Any number of characters may be pushed back.
	Push(ch AugChar)
}

Scanner is an interface describing a scanner. A scanner reads a source character rune by character rune, returning augmented characters.

type StrEscape

type StrEscape func(ch AugChar, s Scanner, flags uint8) (rune, Location, error)

StrEscape is a function type for handling string escapes. It is called with the character, the scanner, and the string flags, and should return a rune to add to the buffer and the escape sequence location. If an error is returned, the error location should be returned instead. If no character should be written, return EOF.

func HexEscape

func HexEscape(cnt int) StrEscape

HexEscape sets up a StrEscape that consumes the specified number of hexadecimal digits and returns the specified rune.

func SimpleEscape

func SimpleEscape(r rune) StrEscape

SimpleEscape sets up a StrEscape that returns a specified character.

type Symbol

type Symbol struct {
	Name  string // The name of the symbol, for display purposes
	Open  string // Paired operator that opens
	Close string // Paired operator that closes
}

Symbol represents a defined symbol, or token type. This could indicate something with a fixed value, like an operator, or something that has semantic value, such as a number literal.

func (Symbol) String

func (s Symbol) String() string

String constructs a string representation of a symbol--e.g., the symbol name.

type Token

type Token struct {
	Sym *Symbol     // The token type
	Loc Location    // The location range of the token
	Val interface{} // The semantic value of the token
}

Token represents a single token emitted by the lexer.

func (*Token) Children

func (t *Token) Children() []utils.Visitable

Children implements the utils.Visitable interface, allowing a Token to be part of an AST.

func (Token) String

func (t Token) String() string

String constructs a string representation of a token.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL