idf

package
v0.67.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 6, 2021 License: Apache-2.0 Imports: 6 Imported by: 2

Documentation

Index

Constants

This section is empty.

Variables

View Source
var StopWordMap = map[string]bool{
	"the":   true,
	"of":    true,
	"is":    true,
	"and":   true,
	"to":    true,
	"in":    true,
	"that":  true,
	"we":    true,
	"for":   true,
	"an":    true,
	"are":   true,
	"by":    true,
	"be":    true,
	"as":    true,
	"on":    true,
	"with":  true,
	"can":   true,
	"if":    true,
	"from":  true,
	"which": true,
	"you":   true,
	"it":    true,
	"this":  true,
	"then":  true,
	"at":    true,
	"have":  true,
	"all":   true,
	"not":   true,
	"one":   true,
	"has":   true,
	"or":    true,
}

StopWordMap default contains some stop words.

Functions

This section is empty.

Types

type Idf

type Idf struct {
	// contains filtered or unexported fields
}

Idf represents a dictionary for all words with their IDFs(Inverse Document Frequency).

func NewIdf

func NewIdf() *Idf

NewIdf creates a new Idf instance.

func (*Idf) AddToken

func (i *Idf) AddToken(text string, frequency float64, pos ...string)

AddToken adds a new word with IDF into it's dictionary.

func (*Idf) Frequency

func (i *Idf) Frequency(key string) (float64, bool)

Frequency returns the IDF of given word.

func (*Idf) LoadDict

func (i *Idf) LoadDict(files ...string) error

LoadDict load idf dictionary

type Segment

type Segment struct {
	// contains filtered or unexported fields
}

Segment represents a word with weight.

func (Segment) Text

func (s Segment) Text() string

Text returns the segment's text.

func (Segment) Weight

func (s Segment) Weight() float64

Weight returns the segment's weight.

type Segments

type Segments []Segment

Segments represents a slice of Segment.

func (Segments) Len

func (ss Segments) Len() int

func (Segments) Less

func (ss Segments) Less(i, j int) bool

func (Segments) Swap

func (ss Segments) Swap(i, j int)

type StopWord

type StopWord struct {
	// contains filtered or unexported fields
}

StopWord is a dictionary for all stop words.

func NewStopWord

func NewStopWord() *StopWord

NewStopWord create a new StopWord with default stop words.

func (*StopWord) AddStop added in v0.63.0

func (s *StopWord) AddStop(text string)

AddStop adds a token into StopWord dictionary.

func (*StopWord) IsStopWord

func (s *StopWord) IsStopWord(word string) bool

IsStopWord checks if a given word is stop word.

func (*StopWord) LoadDict

func (s *StopWord) LoadDict(files ...string) error

LoadDict load idf stop dictionary

func (*StopWord) RemoveStop added in v0.63.0

func (s *StopWord) RemoveStop(text string)

RemoveStop remove a token into StopWord dictionary.

type TagExtracter

type TagExtracter struct {
	Idf *Idf
	// contains filtered or unexported fields
}

TagExtracter is used to extract tags from sentence.

func (*TagExtracter) ExtractTags

func (t *TagExtracter) ExtractTags(sentence string, topK int) (tags Segments)

ExtractTags extracts the topK key words from sentence.

func (*TagExtracter) LoadDict

func (t *TagExtracter) LoadDict(fileName ...string) error

LoadDict reads the given filename and create a new dictionary.

func (*TagExtracter) LoadIdf

func (t *TagExtracter) LoadIdf(fileName ...string) error

LoadIdf reads the given file and create a new Idf dictionary.

func (*TagExtracter) LoadStopWords

func (t *TagExtracter) LoadStopWords(fileName ...string) error

LoadStopWords reads the given file and create a new StopWord dictionary.

func (*TagExtracter) WithGse

func (t *TagExtracter) WithGse(segs gse.Segmenter)

WithGse register gse segmenter

type TextRanker

type TextRanker struct {
	HMM bool
	// contains filtered or unexported fields
}

TextRanker is used to extract tags from sentence.

func (*TextRanker) LoadDict

func (t *TextRanker) LoadDict(fileName ...string) error

LoadDict reads a given file and create a new dictionary file for Textranker.

func (*TextRanker) TextRank

func (t *TextRanker) TextRank(sentence string, topK int) Segments

TextRank extract keywords from sentence using TextRank algorithm. Parameter topK specify how many top keywords to be returned at most.

func (*TextRanker) TextRankWithPOS

func (t *TextRanker) TextRankWithPOS(sentence string, topK int, allowPOS []string) Segments

TextRankWithPOS extracts keywords from sentence using TextRank algorithm. Parameter allowPOS allows a customized pos list.

func (*TextRanker) WithGse

func (t *TextRanker) WithGse(segs gse.Segmenter)

WithGse register gse segmenter

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL