idf

package

v0.67.0 Latest Latest Go to latest Published: May 6, 2021 License: Apache-2.0 Imports: 6 Imported by: 2

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/go-ego/gse

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
type Idf
- func NewIdf() *Idf
type Segment
- func (s Segment) Text() string
- func (s Segment) Weight() float64
type Segments
type StopWord
- func NewStopWord() *StopWord
type TagExtracter
type TextRanker

Constants ¶

This section is empty.

Variables ¶

View Source

var StopWordMap = map[string]bool{
	"the":   true,
	"of":    true,
	"is":    true,
	"and":   true,
	"to":    true,
	"in":    true,
	"that":  true,
	"we":    true,
	"for":   true,
	"an":    true,
	"are":   true,
	"by":    true,
	"be":    true,
	"as":    true,
	"on":    true,
	"with":  true,
	"can":   true,
	"if":    true,
	"from":  true,
	"which": true,
	"you":   true,
	"it":    true,
	"this":  true,
	"then":  true,
	"at":    true,
	"have":  true,
	"all":   true,
	"not":   true,
	"one":   true,
	"has":   true,
	"or":    true,
}

StopWordMap default contains some stop words.

Functions ¶

This section is empty.

Types ¶

type Idf ¶

type Idf struct {
	// contains filtered or unexported fields
}

Idf represents a dictionary for all words with their IDFs(Inverse Document Frequency).

func NewIdf ¶

func NewIdf() *Idf

NewIdf creates a new Idf instance.

func (*Idf) AddToken ¶

func (i *Idf) AddToken(text string, frequency float64, pos ...string)

AddToken adds a new word with IDF into it's dictionary.

func (*Idf) Frequency ¶

func (i *Idf) Frequency(key string) (float64, bool)

Frequency returns the IDF of given word.

func (*Idf) LoadDict ¶

func (i *Idf) LoadDict(files ...string) error

LoadDict load idf dictionary

type Segment ¶

type Segment struct {
	// contains filtered or unexported fields
}

Segment represents a word with weight.

func (Segment) Text ¶

func (s Segment) Text() string

Text returns the segment's text.

func (Segment) Weight ¶

func (s Segment) Weight() float64

Weight returns the segment's weight.

type Segments ¶

type Segments []Segment

Segments represents a slice of Segment.

func (Segments) Len ¶

func (ss Segments) Len() int

func (Segments) Less ¶

func (ss Segments) Less(i, j int) bool

func (Segments) Swap ¶

func (ss Segments) Swap(i, j int)

type StopWord ¶

type StopWord struct {
	// contains filtered or unexported fields
}

StopWord is a dictionary for all stop words.

func NewStopWord ¶

func NewStopWord() *StopWord

NewStopWord create a new StopWord with default stop words.

func (*StopWord) AddStop ¶ added in v0.63.0

func (s *StopWord) AddStop(text string)

AddStop adds a token into StopWord dictionary.

func (*StopWord) IsStopWord ¶

func (s *StopWord) IsStopWord(word string) bool

IsStopWord checks if a given word is stop word.

func (*StopWord) LoadDict ¶

func (s *StopWord) LoadDict(files ...string) error

LoadDict load idf stop dictionary

func (*StopWord) RemoveStop ¶ added in v0.63.0

func (s *StopWord) RemoveStop(text string)

RemoveStop remove a token into StopWord dictionary.

type TagExtracter ¶

type TagExtracter struct {
	Idf *Idf
	// contains filtered or unexported fields
}

TagExtracter is used to extract tags from sentence.

func (*TagExtracter) ExtractTags ¶

func (t *TagExtracter) ExtractTags(sentence string, topK int) (tags Segments)

ExtractTags extracts the topK key words from sentence.

func (*TagExtracter) LoadDict ¶

func (t *TagExtracter) LoadDict(fileName ...string) error

LoadDict reads the given filename and create a new dictionary.

func (*TagExtracter) LoadIdf ¶

func (t *TagExtracter) LoadIdf(fileName ...string) error

LoadIdf reads the given file and create a new Idf dictionary.

func (*TagExtracter) LoadStopWords ¶

func (t *TagExtracter) LoadStopWords(fileName ...string) error

LoadStopWords reads the given file and create a new StopWord dictionary.

func (*TagExtracter) WithGse ¶

func (t *TagExtracter) WithGse(segs gse.Segmenter)

WithGse register gse segmenter

type TextRanker ¶

type TextRanker struct {
	HMM bool
	// contains filtered or unexported fields
}

TextRanker is used to extract tags from sentence.

func (*TextRanker) LoadDict ¶

func (t *TextRanker) LoadDict(fileName ...string) error

LoadDict reads a given file and create a new dictionary file for Textranker.

func (*TextRanker) TextRank ¶

func (t *TextRanker) TextRank(sentence string, topK int) Segments

TextRank extract keywords from sentence using TextRank algorithm. Parameter topK specify how many top keywords to be returned at most.

func (*TextRanker) TextRankWithPOS ¶

func (t *TextRanker) TextRankWithPOS(sentence string, topK int, allowPOS []string) Segments

TextRankWithPOS extracts keywords from sentence using TextRank algorithm. Parameter allowPOS allows a customized pos list.

func (*TextRanker) WithGse ¶

func (t *TextRanker) WithGse(segs gse.Segmenter)

WithGse register gse segmenter

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL