minhashlsh

package module
v0.0.0-...-faac2c6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 24, 2019 License: MIT Imports: 6 Imported by: 6

README

Minhash LSH in Golang

Build Status GoDoc

Documentation

Install: go get github.com/ekzhu/minhash-lsh

Run Benchmark

Set file format
  1. One set per line

  2. Each set, all items are separated by whitespaces

  3. If the parameter firstItemIsID is set to true, the first itme is the unique ID of the set.

  4. The rest of the items with the following format: <value>____<frequency>

    • value is an unique element of the set
    • frequency is an integer count of the occurance of value
    • ____ (4 underscores) is the separator
All Pair Benchmark
minhash-lsh-all-pair -input <set file name>

Documentation

Index

Constants

This section is empty.

Variables

View Source
var NewMinhashLSH = NewMinhashLSH32

NewMinhashLSH is the default constructor uses 32 bit hash value with pre-allocation of hash tables.

Functions

This section is empty.

Types

type Minhash

type Minhash struct {
	// contains filtered or unexported fields
}

Minhash represents a MinHash object

func NewMinhash

func NewMinhash(seed int64, numHash int) *Minhash

NewMinhash initialize a MinHash object with a seed and the number of hash functions.

func (*Minhash) Merge

func (m *Minhash) Merge(o *Minhash)

Merge combines the signature of the other Minhash with this one, making this one carry the signature of the union.

func (*Minhash) Push

func (m *Minhash) Push(b []byte)

Push a new value to the MinHash object. The value should be serialized to byte slice.

func (*Minhash) Signature

func (m *Minhash) Signature() []uint64

Signature exports the MinHash as a list of hash values.

type MinhashLSH

type MinhashLSH struct {
	// contains filtered or unexported fields
}

MinhashLSH represents a MinHash LSH implemented using LSH Forest (http://ilpubs.stanford.edu:8090/678/1/2005-14.pdf). It supports query-time setting of the MinHash LSH parameters L (number of bands) and K (number of hash functions per band).

func NewMinhashLSH16

func NewMinhashLSH16(numHash int, threshold float64, initSize int) *MinhashLSH

NewMinhashLSH16 uses 16-bit hash values and pre-allocation of hash tables. MinHash signatures with 64 or 32 bit hash values will have their hash values trimed.

func NewMinhashLSH32

func NewMinhashLSH32(numHash int, threshold float64, initSize int) *MinhashLSH

NewMinhashLSH32 uses 32-bit hash values and pre-allocation of hash tables. MinHash signatures with 64 bit hash values will have their hash values trimed.

func NewMinhashLSH64

func NewMinhashLSH64(numHash int, threshold float64, initSize int) *MinhashLSH

NewMinhashLSH64 uses 64-bit hash values and pre-allocation of hash tables.

func (*MinhashLSH) Add

func (f *MinhashLSH) Add(key interface{}, sig []uint64)

Add a key with MinHash signature into the index. The key won't be searchable until Index() is called.

func (*MinhashLSH) Index

func (f *MinhashLSH) Index()

Index makes all the keys added searchable.

func (*MinhashLSH) Params

func (f *MinhashLSH) Params() (k, l int)

Params returns the LSH parameters k and l

func (*MinhashLSH) Query

func (f *MinhashLSH) Query(sig []uint64) []interface{}

Query returns candidate keys given the query signature.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL