tokenize

package
v5.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 12, 2019 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

View Source
const (
	ADJ = 1 << iota
	ADP
	ADV
	AFFIX
	CONJ
	DET
	NOUN
	NUM
	PRON
	PRT
	PUNCT
	UNKN
	VERB
	X
	ANY = ADJ | ADP | ADV | AFFIX | CONJ | DET | NOUN | NUM | PRON | PRT | PUNCT | UNKN | VERB | X
)

Part of speech

View Source
const (
	// Whitespace is the default separator
	Whitespace = " "
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Join

type Join struct {
	// contains filtered or unexported fields
}

Join is the default joiner

func NewJoin

func NewJoin(sep string) *Join

NewJoin returns a new default join

func (*Join) Join

func (dj *Join) Join(dps PoSDetermer, tokenizer Tokenizer) ([]string, error)

Join joins strings in a string slice

type Joiner

type Joiner interface {
	Join(PoSDetermer, Tokenizer) ([]string, error)
}

Joiner joines a tokenizer taking the part of speech determinator into account

type NLP

type NLP struct {
	// contains filtered or unexported fields
}

NLP tokenizes a text using NLP

func NewNLP

func NewNLP(credentialsFile, text string, entities []string) (*NLP, error)

NewNLP returns a new NLP instance

func (*NLP) TokenizeEntities

func (nlp *NLP) TokenizeEntities() ([][]Token, error)

TokenizeEntities returns nested tokenized entities

func (*NLP) TokenizeText

func (nlp *NLP) TokenizeText() ([]Token, error)

TokenizeText tokenizes a text

type PoSDeterm

type PoSDeterm struct {
	// contains filtered or unexported fields
}

PoSDeterm represents the default part of speech determinator

func NewPoSDetermer

func NewPoSDetermer(poS int) *PoSDeterm

NewPoSDetermer returns a new default part of speech determinator

func (*PoSDeterm) Determ

func (dps *PoSDeterm) Determ(tokenizer Tokenizer) ([]string, error)

Determ deterimantes if a part of speech tag should be deleted

type PoSDetermer

type PoSDetermer interface {
	Determ(Tokenizer) ([]string, error)
}

PoSDetermer determinates if part of speech tags should be deleted

type Token

type Token struct {
	PoS   int
	Token string
}

Token represents a tokenized text unit

type Tokenizer

type Tokenizer interface {
	TokenizeText() ([]Token, error)
	TokenizeEntities() ([][]Token, error)
}

Tokenizer tokenizes a text and entities

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL