model

package

v0.0.0-...-22e7a19 Latest Latest Go to latest Published: Oct 13, 2017 License: Apache-2.0 Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/roscopecoltran/word-embedding

Links

Open Source Insights

README ¶

Model

Word2Vec

Word2Vec is the generic term below modules:

model:
- Skip-Gram
- CBOW

optimizer:
- Hierarchical Softmax
- Negative Sampling

In training, select one model and one optimizer above. model and optimizer represent architecture of objective and the way of approximating its function respectively.

Features

Skip-Gram
CBOW
Hierarchical Softmax
Negative Sampling
Subsampling
Update learning rate in training

Usage

Embed words using word2vec

Usage:
  word-embedding word2vec [flags]

Flags:
      --batchSize int       Set the batch size to update learning rate (default 10000)
  -d, --dimension int       Set the dimension of word vector (default 10)
      --initlr float        Set the initial learning rate (default 0.025)
  -i, --inputFile string    Set the input file path to load corpus (default "example/input.txt")
      --lower               Whether the words on corpus convert to lowercase or not (default true)
      --maxDepth int        Set the number of times to track huffman tree, max-depth=0 means to track full path from root to word (using only hierarchical softmax)
      --model string        Set the model of Word2Vec. One of: cbow|skip-gram (default "cbow")
      --optimizer string    Set the optimizer of Word2Vec. One of: hs|ns (default "hs")
  -o, --outputFile string   Set the output file path to save word vectors (default "example/word_vectors.txt")
      --sample int          Set the number of the samples as negative (using only negative sampling) (default 5)
      --theta float         Set the lower limit of learning rate (lr >= initlr * theta) (default 0.0001)
      --threshold float     Set the threshold for subsampling (default 0.001)
  -w, --window int          Set the context window size (default 5)

Documentation ¶

Index ¶

func SigmoidF32(f float32) float32
func SigmoidF64(f float64) float64
type Config
- func NewConfig(toLower bool, dimension, window int, initlr float64) *Config
type Model
type SyncTensor

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func SigmoidF32 ¶

func SigmoidF32(f float32) float32

SigmoidF32 returns f(x) = \frac{1}{1 + e^{-x}}.

func SigmoidF64 ¶

func SigmoidF64(f float64) float64

SigmoidF64 returns f(x) = \frac{1}{1 + e^{-x}}. See: http://en.wikipedia.org/wiki/Sigmoid_function.

Types ¶

type Config ¶

type Config struct {
	ToLower          bool
	Dimension        int
	Window           int
	InitLearningRate float64
}

Config stores the common config.

func NewConfig ¶

func NewConfig(toLower bool, dimension, window int, initlr float64) *Config

NewConfig creates *Config

type Model ¶

type Model interface {
	Preprocess(f io.ReadSeeker) (io.ReadCloser, error)
	Train(f io.ReadCloser) error
	Save(outputFile string) error
}

Model is the interface of Preprocess, Train, Save.

type SyncTensor ¶

type SyncTensor struct {
	sync.RWMutex
	tensor.Tensor
}

SyncTensor is a Tensor that has a read-write lock on it.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
word2vec

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL