Documentation ¶
Overview ¶
Package match defines matching algorithms and support code for the license checker.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var TraceDFA int
TraceDFA controls whether DFA execution prints debug tracing when stuck. If TraceDFA > 0 and the DFA has followed a path of at least TraceDFA symbols since the last matching state but hits a dead end, it prints out information about the dead end.
Functions ¶
This section is empty.
Types ¶
type Dict ¶
type Dict struct {
// contains filtered or unexported fields
}
A Dict maps words to integer indexes in a word list, of type WordID. The zero Dict is an empty dictionary ready for use.
Lookup and Words are read-only operations, safe for any number of concurrent calls from multiple goroutines. Insert is a write operation; it must not run concurrently with any other call, whether to Insert, Lookup, or Words.
func (*Dict) Insert ¶
Insert adds the word w to the word list, returning its index. If w is already in the word list, it is not added again; Insert returns the existing index.
func (*Dict) InsertSplit ¶
InsertSplit splits text into a sequence of lowercase words, inserting any new words in the dictionary.
func (*Dict) Lookup ¶
Lookup looks for the word w in the word list and returns its index. If w is not in the word list, Lookup returns BadWord.
type LRE ¶
type LRE struct {
// contains filtered or unexported fields
}
An LRE is a compiled license regular expression.
TODO: Move this comment somewhere non-internal later.
A license regular expression (LRE) is a pattern syntax intended for describing large English texts such as software licenses, with minor allowed variations. The pattern syntax and the matching are word-based and case-insensitive; punctuation is ignored in the pattern and in the matched text.
The valid LRE patterns are:
word - a single case-insensitive word __N__ - any sequence of up to N words expr1 expr2 - concatenation expr1 || expr2 - alternation (( expr )) - grouping expr?? - zero or one instances of expr //** text **// - a comment
To make patterns harder to misread in large texts:
- || must only appear inside (( ))
- ?? must only follow (( ))
- (( must be at the start of a line, preceded only by spaces
- )) must be at the end of a line, followed only by spaces and ??.
For example:
//** https://en.wikipedia.org/wiki/Filler_text **// Now is ((not))?? the time for all good ((men || women || people)) to come to the aid of their __1__.
type Match ¶
type Match struct { ID int // index of LRE in list passed to NewMultiLRE Start int // word index of start of match End int // word index of end of match }
A Match records the position of a single match in a text.
type Matches ¶
type Matches struct { Text string // the entire text Words []Word // the text, split into Words List []Match // the matches }
A Matches is a collection of all leftmost-longest, non-overlapping matches in text.
type MultiLRE ¶
type MultiLRE struct {
// contains filtered or unexported fields
}
A MultiLRE matches multiple LREs simultaneously against a text. It is more efficient than matching each LRE in sequence against the text.
func NewMultiLRE ¶
NewMultiLRE returns a MultiLRE looking for the given LREs. All the LREs must have been parsed using the same Dict; if not, NewMultiLRE panics.
type SyntaxError ¶
A SyntaxError reports a syntax error during parsing.
func (*SyntaxError) Error ¶
func (e *SyntaxError) Error() string