The highest tagged major version is v14.

tokenize

package

v8.0.2 Latest Latest Go to latest Published: Feb 7, 2020 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ndabAP/assocentity

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
type Lang
type NLP
- func NewNLP(credentialsFile string, lang Lang) (*NLP, error)
- func (nlp *NLP) Tokenize(text string) ([]Token, error)
type PoSDeterm
- func NewPoSDetermer(poS int) *PoSDeterm
- func (dps *PoSDeterm) Determ(tokenizedText []Token, tokenizedEntities [][]Token) ([]Token, error)
type PoSDetermer
type Token
type Tokenizer

Constants ¶

View Source

const (
	ADJ   = 1 << iota // Adjective
	ADP               // Adposition
	ADV               // Adverb
	AFFIX             // Affix
	CONJ              // Conjunction
	DET               // Determiner
	NOUN              // Noun
	NUM               // Cardinal number
	PRON              // Pronoun
	PRT               // Particle or other function word
	PUNCT             // Punctuation
	UNKN              // Unknown
	VERB              // Verb (all tenses and modes)
	X                 // Other: foreign words, typos, abbreviations
	ANY   = ADJ | ADP | ADV | AFFIX | CONJ | DET | NOUN | NUM | PRON | PRT | PUNCT | UNKN | VERB | X
)

Part of speech

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Lang ¶

type Lang string

Lang defines the language used to examine the text. Both ISO and BCP-47 language codes are accepted

var AutoLang Lang = "auto"

AutoLang tries to automatically recognize the language

type NLP ¶

type NLP struct {
	// contains filtered or unexported fields
}

NLP tokenizes a text using NLP

func NewNLP ¶

func NewNLP(credentialsFile string, lang Lang) (*NLP, error)

NewNLP returns a new NLP instance

func (*NLP) Tokenize ¶

func (nlp *NLP) Tokenize(text string) ([]Token, error)

Tokenize tokenizes a text

type PoSDeterm ¶

type PoSDeterm struct {
	// contains filtered or unexported fields
}

PoSDeterm represents the default part of speech determinator

func NewPoSDetermer ¶

func NewPoSDetermer(poS int) *PoSDeterm

NewPoSDetermer returns a new default part of speech determinator

func (*PoSDeterm) Determ ¶

func (dps *PoSDeterm) Determ(tokenizedText []Token, tokenizedEntities [][]Token) ([]Token, error)

Determ deterimantes if a part of speech tag should be deleted

type PoSDetermer ¶

type PoSDetermer interface {
	Determ(tokenizedText []Token, tokenizedEntities [][]Token) ([]Token, error)
}

PoSDetermer determinates if part of speech tags should be deleted

type Token ¶

type Token struct {
	PoS   int    // Part of speech
	Token string // Text
}

Token represents a tokenized text unit

type Tokenizer ¶

type Tokenizer interface {
	Tokenize(text string) ([]Token, error)
}

Tokenizer tokenizes a text and entities

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL