Documentation ¶
Overview ¶
Package fasttext provides a simple wrapper for Facebook fastText dataset (https://fasttext.cc/docs/en/crawl-vectors.html). It allows fast look-up of word embeddings from persistent data store (SQLite3).
Installation
go get -u github.com/ekzhu/go-fasttext
After downloading a .vec data file from the fastText project, you can initialize the SQLite3 database (in your code):
import ( _ "github.com/mattn/go-sqlite3" "github.com/ekzhu/go-fasttext" ) ... ft := fasttext.NewFastText("/path/to/sqlite3/file") vecFile, err := os.Open("/path/to/word/embedding/.vec/file") err = ft.BuildDB(vecFile)
This will create a new file on your disk for the SQLite3 database. Once the above step is finished, you can start looking up word embeddings (in your code):
emb, err := ft.GetEmb("king") if err != nil { fmt.Println(err) } fmt.Println(emb)
Each word embedding vector is a slice of float32 with length of 300.
Note that you only need to initialize the SQLite3 database once. The next time you use it you can skip the call to BuildDB.
For faster querying during runtime, you can use an in-memory database.
ft := NewFastTextInMem("/path/to/sqlite3/file")
This creates an in-memory SQLite3 database which is a copy of the on-disk one. Using the in-memory version makes query time much faster, but takes a few minutes to load the database.
Index ¶
Constants ¶
const ( // TableName used in SQLite3 TableName = "fasttext" // Dim is the number of dimensions in FastText word embedding vectors Dim = 300 )
Variables ¶
var ( // ErrNoEmbFound ... ErrNoEmbFound = errors.New("No embedding found for the given word") // ByteOrder is for the serialization of the embedding vector in // SQLite3 database. ByteOrder = binary.BigEndian )
Functions ¶
This section is empty.
Types ¶
type FastText ¶
type FastText struct {
// contains filtered or unexported fields
}
The FastText session. In multi-thread setting, each thread must have its own copy of FastText session. A single FastText session cannot be shared among multiple threads.
func NewFastText ¶
NewFastText starts a new FastText session given the location of the SQLite3 database file.
func NewFastTextInMem ¶
NewFastTextInMem creates a new FastText session that uses an in-memory database for faster query time. The on-disk SQLite3 database (given by dbFilename) will be loaded into an in-memory SQLite3 database in this function, which will take a few miniutes to finish.
func (*FastText) BuildDB ¶
BuildDB initializes the SQLite3 database by importing the word embeddings from the .vec file downloaded from https://fasttext.cc/docs/en/crawl-vectors.html