Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Document ¶ added in v0.1.1
type Document interface { Summarize(length int, threshold float64, focus string) ([]*Sentence, error) Highlight(length int, merge bool) ([]*Keyword, error) Characters() (int, int) }
A Document represents a given text, and is responsible for handling the summarization and keyword extraction process.
type Highlighter ¶ added in v0.1.1
type Highlighter interface { Initialize(tokens []*Token, filter TokenFilter, window int) Rank(iters int) Highlight(length int, merge bool) ([]*Keyword, error) }
A Highlighter is responsible for extracting key words from a document.
type Keyword ¶ added in v0.2.0
A Keyword is the keyword belonging to a highlighted document. A Keyword contains the raw word, and its associated weight.
type Parser ¶ added in v0.1.1
A Parser is responsible for parsing and tokenizing a document into strings and words. A Parser also performs additional tasks such as POS-tagging and sentiment analysis.
type Sentence ¶ added in v0.1.1
type Sentence struct { Raw string // Raw sentence string. Tokens []*Token // Tokenized sentence. Sentiment float64 // Sentiment score. Score float64 // Score (weight) of the sentence. Bias float64 // Bias assigned to the sentence for ranking. Order int // The sentence's order in the text. }
A Sentence represents an individual sentence within the text.
type Similarity ¶ added in v0.1.1
type Similarity func(n1, n2 []*Token, filter TokenFilter) float64
A Similarity computes the similarity of two sentences after applying the token filter.
type Summarizer ¶ added in v0.1.1
type Summarizer interface { Initialize(sents []*Sentence, similar Similarity, filter TokenFilter, focusString *Sentence, threshold float64) Rank(iters int) }
A Summarizer is responsible for extracting key sentences from a document.
type Token ¶ added in v0.1.1
type Token struct { Tag string // The token's part-of-speech tag. Text string // The token's actual content. Order int // The token's order in the text. }
A Token represents an individual token of text such as a word or punctuation symbol.
type TokenFilter ¶ added in v0.1.1
A TokenFilter represents a (black/white) filter applied to tokens before similarity calculations.
Directories ¶
Path | Synopsis |
---|---|
internal
|
|
prose
Package prose is a repository of packages related to text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
|
Package prose is a repository of packages related to text processing, including tokenization, part-of-speech tagging, and named-entity extraction. |