Documentation
¶
Overview ¶
Package dic implements the dictionary of the morph analyzer.
Index ¶
Constants ¶
const ( // IPADicPath represents the internal IPA dictionary path. IPADicPath = "dic/ipa/ipa.dic" // UniDicPath represents the internal UniDic dictionary path. UniDicPath = "dic/uni/uni.dic" )
const UserDicColumnSize = 4
UserDicColumnSize is the column size of the user dictionary.
Variables ¶
This section is empty.
Functions ¶
func NewContents ¶ added in v1.3.0
NewContents creates dictionary contents from byte slice
Types ¶
type ConnectionTable ¶
ConnectionTable represents a connection matrix of morphs.
func LoadConnectionTable ¶
func LoadConnectionTable(r io.Reader) (ConnectionTable, error)
LoadConnectionTable loads ConnectionTable from io.Reader.
func (*ConnectionTable) At ¶
func (t *ConnectionTable) At(row, col int) int16
At returns the connection cost of matrix[row, col].
type Dic ¶
type Dic struct { Morphs []Morph POSTable POSTable Contents [][]string Connection ConnectionTable Index IndexTable CharClass []string CharCategory []byte InvokeList []bool GroupList []bool UnkDic }
Dic represents a dictionary of a tokenizer.
func LoadSimple ¶ added in v1.7.1
LoadSimple loads a dictionary from a file without contents.
func SysDicIPASimple ¶ added in v1.7.0
func SysDicIPASimple() *Dic
SysDicIPASimple returns the IPA system dictionary without contents.
func SysDicSimple ¶ added in v1.7.0
func SysDicSimple() *Dic
SysDicSimple returns the kagome system dictionary without contents.
func SysDicUni ¶ added in v1.3.0
func SysDicUni() *Dic
SysDicUni returns the UniDic system dictionary.
func SysDicUniSimple ¶ added in v1.7.0
func SysDicUniSimple() *Dic
SysDicUniSimple returns the IPA system dictionary without contents.
func (Dic) CharacterCategory ¶ added in v1.4.0
CharacterCategory returns the category of a rune.
type IndexTable ¶
type IndexTable struct { Da da.DoubleArray Dup map[int32]int32 }
IndexTable represents a dictionary index.
func BuildIndexTable ¶
func BuildIndexTable(sortedKeywords []string) (IndexTable, error)
BuildIndexTable constructs a index table from keywords.
func ReadIndexTable ¶
func ReadIndexTable(r io.Reader) (IndexTable, error)
ReadIndexTable loads a index table.
func (IndexTable) CommonPrefixSearch ¶
func (idx IndexTable) CommonPrefixSearch(input string) (lens []int, ids [][]int)
CommonPrefixSearch finds keywords sharing common prefix in an input and returns the ids and it's lengths if found.
func (IndexTable) CommonPrefixSearchCallback ¶ added in v1.5.1
func (idx IndexTable) CommonPrefixSearchCallback(input string, callback func(id, l int))
CommonPrefixSearchCallback finds keywords sharing common prefix in an input and callback with id and length.
func (IndexTable) Search ¶
func (idx IndexTable) Search(input string) []int
Search finds the given keyword and returns the id if found.
type Morph ¶
type Morph struct {
LeftID, RightID, Weight int16
}
Morph represents part of speeches and an occurrence cost.
type POSMap ¶ added in v1.7.0
POSMap represents a part of speech control table.
type POSTable ¶ added in v1.7.0
POSTable represents a table for managing part of speeches.
func ReadPOSTable ¶ added in v1.7.0
ReadPOSTable loads a POS table.
type Trie ¶
type Trie interface { Search(input string) []int32 PrefixSearch(input string) (length int, output []int32) CommonPrefixSearch(input string) (lens []int, outputs [][]int32) CommonPrefixSearchCallback(input string, callback func(id, l int)) }
Trie is an interface representing retrieval ability.
type UnkDic ¶ added in v1.7.1
type UnkDic struct { UnkMorphs []Morph UnkIndex map[int32]int32 UnkIndexDup map[int32]int32 UnkContents [][]string }
UnkDic represents an unknown word dictionary part.
func ReadUnkDic ¶ added in v1.7.1
ReadUnkDic loads an unknown word dictionary.
type UserDic ¶
type UserDic struct { Index IndexTable Contents []UserDicContent }
UserDic represents a user dictionary.
func NewUserDic ¶
NewUserDic build a user dictionary from a file.
type UserDicContent ¶
UserDicContent represents contents of a word in a user dictionary.