sentencepiece

package
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2021 License: BSD-2-Clause Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer is a Sentence Piece tokenizer.

func NewFromModelFolder

func NewFromModelFolder(path string, lowercase bool) (*Tokenizer, error)

NewFromModelFolder returns a new Tokenizer.

func (*Tokenizer) Detokenize

func (t *Tokenizer) Detokenize(tokens []string) string

Detokenize flatten and merges a list of tokens into a single string.

func (*Tokenizer) IDsToTokens

func (t *Tokenizer) IDsToTokens(ids []int) []string

IDsToTokens returns a list of string terms from a list of token IDs. It panics if a token is not found in the vocabulary.

func (*Tokenizer) Tokenize

func (t *Tokenizer) Tokenize(text string) []string

Tokenize performs sentence-piece tokenization.

func (*Tokenizer) TokensToIDs

func (t *Tokenizer) TokensToIDs(tokens []string) []int

TokensToIDs returns a list of token IDs from a list of string tokens. It panics if a token is not found in the vocabulary and no unknown token is found.

Directories

Path Synopsis
internal
sentencepiece
Package sentencepiece implements the SentencePiece encoder (Kudo and Richardson, 2018).
Package sentencepiece implements the SentencePiece encoder (Kudo and Richardson, 2018).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL