Documentation
¶
Index ¶
- Constants
- type ConnectionTable
- type Dic
- type DoubleArray
- func (d *DoubleArray) Build(keywords []string) (err error)
- func (d *DoubleArray) BuildWithIds(keywords []string, ids []int) (err error)
- func (d *DoubleArray) CommonPrefixSearchBytes(input []byte) (ids, lens []int)
- func (d *DoubleArray) CommonPrefixSearchString(input string) (ids, lens []int)
- func (d *DoubleArray) FindBytes(input []byte) (id int, ok bool)
- func (d *DoubleArray) FindString(input string) (id int, ok bool)
- func (d *DoubleArray) PrefixSearchBytes(input []byte) (id int, ok bool)
- func (d *DoubleArray) PrefixSearchString(input string) (id int, ok bool)
- type Morph
- type NodeClass
- type Token
- type Tokenizer
- func (t *Tokenizer) Dot(input string, w io.Writer) (tokens []Token)
- func (t *Tokenizer) ExtendedModeTokenize(input string) (tokens []Token)
- func (t *Tokenizer) SearchModeTokenize(input string) (tokens []Token)
- func (t *Tokenizer) SetDic(dic *Dic)
- func (t *Tokenizer) SetUserDic(udic *UserDic)
- func (t *Tokenizer) Tokenize(input string) (tokens []Token)
- type Trie
- type UserDic
- type UserDicContent
Constants ¶
const BosEosId int = -1
Reserved identifier of node id.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ConnectionTable ¶
ConnectionTable represents a connection matrix of morphs.
func (*ConnectionTable) At ¶
func (ct *ConnectionTable) At(row, col int) int16
At returns the connection cost of matrix[row, col].
type Dic ¶
type Dic struct { Morphs []Morph Contents [][]string Connection ConnectionTable Index Trie IndexDup map[int]int CharClass []string CharCategory []byte InvokeList []bool GroupList []bool UnkMorphs []Morph UnkIndex map[int]int UnkIndexDup map[int]int UnkContents [][]string }
Dic represents a dictionary of a tokenizer.
type DoubleArray ¶
type DoubleArray []struct { Base, Check int }
DoubleArray represents the TRIE data structure.
func (*DoubleArray) Build ¶
func (d *DoubleArray) Build(keywords []string) (err error)
Build constructs a double array from given keywords.
func (*DoubleArray) BuildWithIds ¶
func (d *DoubleArray) BuildWithIds(keywords []string, ids []int) (err error)
BuildWithIds constructs a double array from given keywords and ids.
func (*DoubleArray) CommonPrefixSearchBytes ¶
func (d *DoubleArray) CommonPrefixSearchBytes(input []byte) (ids, lens []int)
CommonPrefixSearchBytes finds keywords sharing common prefix in an input and returns the ids and it's lengths if found.
func (*DoubleArray) CommonPrefixSearchString ¶
func (d *DoubleArray) CommonPrefixSearchString(input string) (ids, lens []int)
CommonPrefixSearchString finds keywords sharing common prefix in an input and returns the ids and it's lengths if found.
func (*DoubleArray) FindBytes ¶
func (d *DoubleArray) FindBytes(input []byte) (id int, ok bool)
FindBytes searches TRIE by a given keyword and returns the id if found.
func (*DoubleArray) FindString ¶
func (d *DoubleArray) FindString(input string) (id int, ok bool)
FindString searches TRIE by a given keyword and returns the id if found.
func (*DoubleArray) PrefixSearchBytes ¶
func (d *DoubleArray) PrefixSearchBytes(input []byte) (id int, ok bool)
PrefixSearchBytes returns the longest commom prefix keyword in an input if found.
func (*DoubleArray) PrefixSearchString ¶
func (d *DoubleArray) PrefixSearchString(input string) (id int, ok bool)
PrefixSearchString returns the longest commom prefix keyword in an input if found.
type Morph ¶
type Morph struct {
LeftId, RightId, Weight int16
}
Morph represents part of speeches and an occurrence cost.
type Token ¶
type Token struct { Id int Class NodeClass Start int End int Surface string // contains filtered or unexported fields }
Token represents a morph of a sentence.
type Tokenizer ¶
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer represents morphological analyzer.
func NewThreadsafeTokenizer ¶ added in v0.3.0
func NewThreadsafeTokenizer() (t *Tokenizer)
NewThreadsafeTokenizer create a threadsafe tokenizer.
func (*Tokenizer) ExtendedModeTokenize ¶ added in v0.3.0
ExtendedModeTokenize returns morphs of a sentence.
func (*Tokenizer) SearchModeTokenize ¶ added in v0.3.0
SearchModeTokenize returns morphs of a sentence.
func (*Tokenizer) SetUserDic ¶
SetUserDic sets user dictionary to udic.
type Trie ¶
type Trie interface { FindString(string) (id int, ok bool) // search a dictionary by a keyword. CommonPrefixSearchString(string) (ids, lens []int) // finds keywords sharing common prefix in a dictionary. }
Any type implements Trie interface may be used as a dictionary.
type UserDic ¶
type UserDic struct { Index Trie Contents []UserDicContent }
UserDic represents a user dictionary.
func NewUserDic ¶
NewUserDic build a user dictionary from a file.
type UserDicContent ¶
UserDicContent represents contents of a word in a user dictionary.