bloomfilter

package

v0.0.0-...-e38b929 Latest Latest Go to latest Published: Nov 12, 2017 License: MIT Imports: 10 Imported by: 2

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/GrappigPanda/Olivia

Links

Open Source Insights

README ¶

Bloom Filter

A quick primer if you don't know what a bloom filter is: https://en.wikipedia.org/wiki/Bloom_filter

How Bloom Filters Help Olivia

Bloom filters are used in Olivia to aide in selecting a peer to retrieve a key not found in the current node. So, if a key is requested from an node and the node doesn't contain the value, the operating node will search all known peers for the key. Since latency is a thing which exists, bloom filters are used to improve search times so we can only query nodes which probably have the key.

Once a node is made aware of another node, each node will send a copy of their bloom filter (which will continued to be updated). Moreover, on a timed interval (default of 30-seconds), a new updated bloom filter will be transferred between each node.

This does mean there's a 30 second window where any updates to bloom filters are not known by remote nodes. This is a problem and potential solutions are being thought out. The currently courted (as of writing this) idea/solution is to send a list of updated hashes with each heartbeat. This allows quicker updating the bloom filters by simple O(1) insertions.

The bloomfilter is backed by a third-party library (https://github.com/willf/bitset) and will continue to be so for the foreseeable future. To cut down on network burden, all bloom filters are marshalled to JSON and then run-length encoded. This tends to heavily cut down on total size of data being transmitted.

Documentation ¶

Index ¶

func Decode(encodedString string) string
func Encode(inputString string) string
type Bitset
type BloomFilter
type SimpleBloomFilter
type WFBitset
- func NewWFBitset(maxSize uint) *WFBitset

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Decode ¶

func Decode(encodedString string) string

Decode essentially works opposite of Encode and turns and encoded string into a normal, usable string.

func Encode ¶

func Encode(inputString string) string

Encode handles encoding the bloom filter. An example string before encoding: "AAAABBCCCDZZZRTTT" An example string after encoding "A4B2C3D1Z3R1T3". Please note: This could essentially return an inoptimal compression, as if we have a string with alternating single values (i.e., "01010101010101") this RLE encoding will result in a string twice as long. There is an optimization where we only count runs and not single characters, but for the time being, I don't think it's necessary.

Types ¶

type Bitset ¶

type Bitset interface {
	Add(uint)
	Contains(uint) bool
	ToString() string
	FromString(string)
	Compare(interface{}) bool
	IsSet(uint) bool
	Len() uint
}

type BloomFilter ¶

type BloomFilter interface {
	AddKey(key []byte) (bool, []uint)
	HasKey(key []byte) (bool, []uint)
	Serialize() string
	GetMaxSize() uint
	GetStorage() Bitset
	Compare(interface{}) bool
	HashKey([]byte) []uint
}

type SimpleBloomFilter ¶

type SimpleBloomFilter struct {

	// Total number of hashing functions
	HashFunctions uint

	HashCache *lru.LRUCacheInt32Array
	// contains filtered or unexported fields
}

func Deserialize ¶

func Deserialize(inputString string, maxSize uint) (*SimpleBloomFilter, error)

ConvertStringToBF Decodes the RLE'd bloom filter and then converts it to an actual bloom filter in-memory.

func NewByFailRate ¶

func NewByFailRate(items uint, probability float64) *SimpleBloomFilter

NewByFailRate allows generation of a bloom filter with a pre-conceived amount of items and a false-positive failure rate. We calculate our bloom filter bounds and generate the new bloom filter this way.

func NewSimpleBF ¶

func NewSimpleBF(maxSize uint, hashFuns uint) *SimpleBloomFilter

New Returns a pointer to a newly allocated `SimpleBloomFilter` object

func (*SimpleBloomFilter) AddKey ¶

func (bf *SimpleBloomFilter) AddKey(key []byte) (bool, []uint)

AddKey Adds a new key to the bloom filter

func (*SimpleBloomFilter) Compare ¶

func (bf *SimpleBloomFilter) Compare(remote interface{}) bool

Compare returns if the two bloomfilters are equal

func (*SimpleBloomFilter) GetMaxSize ¶

func (bf *SimpleBloomFilter) GetMaxSize() uint

GetMaxSize returns the max size. Just an ugly getter.

func (*SimpleBloomFilter) GetStorage ¶

func (bf *SimpleBloomFilter) GetStorage() Bitset

GetStorage handles returning the underlying bloomfilter bitset.

func (*SimpleBloomFilter) HasKey ¶

func (bf *SimpleBloomFilter) HasKey(key []byte) (bool, []uint)

HasKey verifies if a key is or isn't in the bloom filter.

func (*SimpleBloomFilter) HashKey ¶

func (bf *SimpleBloomFilter) HashKey(key []byte) []uint

HashKey Takes a string in as an argument and hashes it several times to create usable indexes for the bloom filter.

func (*SimpleBloomFilter) Serialize ¶

func (bf *SimpleBloomFilter) Serialize() string

ConvertToString handles conversion of a bloom filter to a string. Moreover, it enforces RLE encoding, so that fewer bytes are transferred per request.

type WFBitset ¶

type WFBitset struct {
	// contains filtered or unexported fields
}

WFBitset is a simple wrapper around the willf bitset library.

func NewWFBitset ¶

func NewWFBitset(maxSize uint) *WFBitset

NewWFBitset constructs a new bitset to be used with bloom filters.

func (*WFBitset) Add ¶

func (b *WFBitset) Add(index uint)

Add handles adding a new hashed index into the bitset.

func (*WFBitset) Compare ¶

func (b *WFBitset) Compare(compareTo interface{}) bool

func (*WFBitset) Contains ¶

func (b *WFBitset) Contains(index uint) bool

Contains verifies if a hash index is actually in the bitset or not.

func (*WFBitset) FromString ¶

func (b *WFBitset) FromString(inputString string)

FromString handles converting a (valid json) string to a valid underlying bitset.

func (*WFBitset) IsSet ¶

func (b *WFBitset) IsSet(bitIndex uint) bool

func (*WFBitset) Len ¶

func (b *WFBitset) Len() uint

func (*WFBitset) ToString ¶

func (b *WFBitset) ToString() string

ToString handles converting the bitset to a RLE usable string.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
search

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL