vocab

package
v0.2.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 1, 2021 License: MIT Imports: 2 Imported by: 3

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Dict

type Dict struct {
	// contains filtered or unexported fields
}

Dict is a container for tokens NOTE: python uses an OrderedDict, unsure of implications

func FromFile

func FromFile(path string) (Dict, error)

FromFile will read a newline delimited file into a Dict

func New

func New(tokens []string) Dict

New will return a vocab dict from the given tokens, IDs will match index

func (Dict) Add

func (v Dict) Add(token string)

Add will add an item to the vocabulary, is not thread-safe

func (Dict) ConvertItems

func (v Dict) ConvertItems(items []string) []ID

ConvertItems convert items to ids

func (Dict) ConvertTokens

func (v Dict) ConvertTokens(tokens []string) []ID

ConvertTokens convert token to id

func (Dict) GetID

func (v Dict) GetID(token string) ID

GetID will return the ID of the token in the vocab. Will be negative if it doesn't exist

func (Dict) IsInVocab added in v0.1.1

func (v Dict) IsInVocab(token string) bool

func (Dict) LongestSubstring

func (v Dict) LongestSubstring(token string) string

LongestSubstring returns the longest token that is a substring of the token

func (Dict) Size

func (v Dict) Size() int

Size returns the size of the vocabulary

type ID

type ID int32

ID is used to identify vocab items

func (ID) Int32

func (id ID) Int32() int32

Int32 int32 representation of an ID

type Provider

type Provider interface {
	Vocab() Dict
}

Provider is an interface for exposing a vocab

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL