match

package
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 3, 2020 License: BSD-3-Clause Imports: 10 Imported by: 0

Documentation

Overview

Package match defines matching algorithms and support code for the license checker.

Index

Constants

This section is empty.

Variables

View Source
var TraceDFA int

TraceDFA controls whether DFA execution prints debug tracing when stuck. If TraceDFA > 0 and the DFA has followed a path of at least TraceDFA symbols since the last matching state but hits a dead end, it prints out information about the dead end.

Functions

This section is empty.

Types

type Dict

type Dict struct {
	// contains filtered or unexported fields
}

A Dict maps words to integer indexes in a word list, of type WordID. The zero Dict is an empty dictionary ready for use.

Lookup and Words are read-only operations, safe for any number of concurrent calls from multiple goroutines. Insert is a write operation; it must not run concurrently with any other call, whether to Insert, Lookup, or Words.

func (*Dict) Insert

func (d *Dict) Insert(w string) WordID

Insert adds the word w to the word list, returning its index. If w is already in the word list, it is not added again; Insert returns the existing index.

func (*Dict) InsertSplit

func (d *Dict) InsertSplit(text string) []Word

InsertSplit splits text into a sequence of lowercase words, inserting any new words in the dictionary.

func (*Dict) Lookup

func (d *Dict) Lookup(w string) WordID

Lookup looks for the word w in the word list and returns its index. If w is not in the word list, Lookup returns BadWord.

func (*Dict) Split

func (d *Dict) Split(text string) []Word

Split splits text into a sequence of lowercase words. It does not add any new words to the dictionary. Unrecognized words are reported as having ID = BadWord.

func (*Dict) Words

func (d *Dict) Words() []string

Words returns the current word list. The list is not a copy; the caller can read but must not modify the list.

type LRE

type LRE struct {
	// contains filtered or unexported fields
}

An LRE is a compiled license regular expression.

TODO: Move this comment somewhere non-internal later.

A license regular expression (LRE) is a pattern syntax intended for describing large English texts such as software licenses, with minor allowed variations. The pattern syntax and the matching are word-based and case-insensitive; punctuation is ignored in the pattern and in the matched text.

The valid LRE patterns are:

word            - a single case-insensitive word
__N__           - any sequence of up to N words
expr1 expr2     - concatenation
expr1 || expr2  - alternation
(( expr ))      - grouping
expr??          - zero or one instances of expr
//** text **//  - a comment

To make patterns harder to misread in large texts:

  • || must only appear inside (( ))
  • ?? must only follow (( ))
  • (( must be at the start of a line, preceded only by spaces
  • )) must be at the end of a line, followed only by spaces and ??.

For example:

//** https://en.wikipedia.org/wiki/Filler_text **//
Now is
((not))??
the time for all good
((men || women || people))
to come to the aid of their __1__.

func ParseLRE

func ParseLRE(d *Dict, file, s string) (*LRE, error)

ParseLRE parses the string s as a license regexp. The file name is used in error messages if non-empty.

func (*LRE) Dict

func (re *LRE) Dict() *Dict

Dict returns the Dict used by the LRE.

func (*LRE) File

func (re *LRE) File() string

File returns the file name passed to ParseLRE.

type Match

type Match struct {
	ID    int // index of LRE in list passed to NewMultiLRE
	Start int // word index of start of match
	End   int // word index of end of match
}

A Match records the position of a single match in a text.

type Matches

type Matches struct {
	Text  string  // the entire text
	Words []Word  // the text, split into Words
	List  []Match // the matches
}

A Matches is a collection of all leftmost-longest, non-overlapping matches in text.

type MultiLRE

type MultiLRE struct {
	// contains filtered or unexported fields
}

A MultiLRE matches multiple LREs simultaneously against a text. It is more efficient than matching each LRE in sequence against the text.

func NewMultiLRE

func NewMultiLRE(list []*LRE) (_ *MultiLRE, err error)

NewMultiLRE returns a MultiLRE looking for the given LREs. All the LREs must have been parsed using the same Dict; if not, NewMultiLRE panics.

func (*MultiLRE) Dict

func (re *MultiLRE) Dict() *Dict

Dict returns the Dict used by the MultiLRE.

func (*MultiLRE) Match

func (re *MultiLRE) Match(text string) *Matches

Match reports all leftmost-longest, non-overlapping matches in text. It always returns a non-nil *Matches, in order to return the split text. Check len(matches.List) to see whether any matches were found.

type SyntaxError

type SyntaxError struct {
	File    string
	Offset  int
	Context string
	Err     string
}

A SyntaxError reports a syntax error during parsing.

func (*SyntaxError) Error

func (e *SyntaxError) Error() string

type Word

type Word struct {
	ID WordID
	Lo int32 // Word appears at text[Lo:Hi].
	Hi int32
}

A Word represents a single word found in a text.

type WordID

type WordID int32

A WordID is the index of a word in a dictionary.

const AnyWord WordID = -2

AnyWord represents a wildcard matching any word.

const BadWord WordID = -1

BadWord represents a word not present in the dictionary.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL