wordfreq
Text corpus calculation in Golang.
Supports Chinese, English.
This work is a derivative of wordfreq by Timothy Guan-tin Chien.
Install
With a correctly configured Go toolchain:
go get -u github.com/twsiyuan/wordfreq
Simple Example
import(
"github.com/twsiyuan/wordfreq"
)
func main(){
wfreq, _ := wordfreq.New(wordfreq.Options{})
tlist := wfreq.Process("text") // Term list
}
Available options in wordfreq.Options:
Languages: Array of keywords to specify languages to process. Available keywords are chinese, english. Default to both.
StopWordSets: Array of keywords to specify the built-in set of stop words to exclude in the count. Available: cjk, english1, and english2. Default to all.
StopWords: Array of words/phrases to exclude in the count. Case insensitive. Default to empty.
MinimumCount: Minimal count required to be included in the returned list. Default to 2.
NoFilterSubstring: (Chinese language only) No filter out the recounted substring. Default to false.
MaxiumPhraseLength: (Chinese language only) Maxium length to consider a phrase. Default to 8.