Documentation ¶
Overview ¶
Package samurai provides the functions to split a token based on the Samurai algorithm, and its required context information and tables.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var Separator string = " "
Separator specifies the current separator.
Functions ¶
Types ¶
type FrequencyTable ¶
type FrequencyTable struct {
// contains filtered or unexported fields
}
FrequencyTable is a lookup table that stores the number of occurrences of each unique string in a set of strings.
func NewFrequencyTable ¶
func NewFrequencyTable() *FrequencyTable
NewFrequencyTable creates and initializes an empty frequency table.
func (FrequencyTable) Frequency ¶
func (f FrequencyTable) Frequency(token string) float64
Frequency determines how frequently a token occurs in a set of strings.
func (*FrequencyTable) SetOccurrences ¶
func (f *FrequencyTable) SetOccurrences(token string, occurrences int) error
SetOccurrences sets how many times a token appeared in a set of strings.
func (FrequencyTable) TotalOccurrences ¶
func (f FrequencyTable) TotalOccurrences() int
TotalOccurrences provides the total number of occurrences on the frequency table.
type TokenContext ¶
type TokenContext struct {
// contains filtered or unexported fields
}
TokenContext holds the local and global frequency tables in the context of a given token.
func NewTokenContext ¶
func NewTokenContext(local *FrequencyTable, global *FrequencyTable) TokenContext
NewTokenContext creates a context for the token, setting the local and global frequency tables.
func (TokenContext) Score ¶
func (ctx TokenContext) Score(word string) float64
Score calculates the score for a string based on how frequently a word appears in the program under analysis and in a more global scope of a large set of programs.