Documentation
¶
Overview ¶
Package sentencepiece implements a tokenizers.Tokenizer based on SentencePiece tokenizer.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Tokenizer ¶
type Tokenizer struct { *esentencepiece.Processor Info *esentencepiece.ModelInfo }
Tokenizer implements tokenizers.Tokenizer interface based on SentencePiece tokenizer by Google.
func (*Tokenizer) Decode ¶
Decode returns the text from a sequence of ids. It implements sampler.Vocabulary.
func (*Tokenizer) Encode ¶
Encode returns the text encoded into a sequence of ids. It implements sampler.Vocabulary.
func (*Tokenizer) SpecialTokenID ¶
func (p *Tokenizer) SpecialTokenID(token api.SpecialToken) (int, error)
SpecialTokenID returns the token for the given symbol, or an error if not known.
Directories
¶
Path | Synopsis |
---|---|
private
|
|
protos
Package protos have the Proto Buffer code for the sentencepiece_model.proto file, downloaded from https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto.
|
Package protos have the Proto Buffer code for the sentencepiece_model.proto file, downloaded from https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto. |
Click to show internal directories.
Click to hide internal directories.