samurai

package
v0.0.0-...-a6cba19 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 14, 2020 License: MIT Imports: 6 Imported by: 3

Documentation

Overview

Package samurai provides the functions to split a token based on the Samurai algorithm, and its required context information and tables.

Index

Constants

This section is empty.

Variables

View Source
var Separator string = " "

Separator specifies the current separator.

Functions

func Split

func Split(token string, tCtx TokenContext, prefixes lists.List, suffixes lists.List) string

Split on Samurai receives a token and returns a string of hard/soft words separated by the defined separator, split by the Samurai algorithm proposed by Hill et all.

Types

type FrequencyTable

type FrequencyTable struct {
	// contains filtered or unexported fields
}

FrequencyTable is a lookup table that stores the number of occurrences of each unique string in a set of strings.

func NewFrequencyTable

func NewFrequencyTable() *FrequencyTable

NewFrequencyTable creates and initializes an empty frequency table.

func (FrequencyTable) Frequency

func (f FrequencyTable) Frequency(token string) float64

Frequency determines how frequently a token occurs in a set of strings.

func (*FrequencyTable) SetOccurrences

func (f *FrequencyTable) SetOccurrences(token string, occurrences int) error

SetOccurrences sets how many times a token appeared in a set of strings.

func (FrequencyTable) TotalOccurrences

func (f FrequencyTable) TotalOccurrences() int

TotalOccurrences provides the total number of occurrences on the frequency table.

type TokenContext

type TokenContext struct {
	// contains filtered or unexported fields
}

TokenContext holds the local and global frequency tables in the context of a given token.

func NewTokenContext

func NewTokenContext(local *FrequencyTable, global *FrequencyTable) TokenContext

NewTokenContext creates a context for the token, setting the local and global frequency tables.

func (TokenContext) Score

func (ctx TokenContext) Score(word string) float64

Score calculates the score for a string based on how frequently a word appears in the program under analysis and in a more global scope of a large set of programs.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL