bloomfilter

package
v0.0.0-...-e38b929 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 12, 2017 License: MIT Imports: 10 Imported by: 2

README

Bloom Filter

A quick primer if you don't know what a bloom filter is: https://en.wikipedia.org/wiki/Bloom_filter

How Bloom Filters Help Olivia

Bloom filters are used in Olivia to aide in selecting a peer to retrieve a key not found in the current node. So, if a key is requested from an node and the node doesn't contain the value, the operating node will search all known peers for the key. Since latency is a thing which exists, bloom filters are used to improve search times so we can only query nodes which probably have the key.

Once a node is made aware of another node, each node will send a copy of their bloom filter (which will continued to be updated). Moreover, on a timed interval (default of 30-seconds), a new updated bloom filter will be transferred between each node.

This does mean there's a 30 second window where any updates to bloom filters are not known by remote nodes. This is a problem and potential solutions are being thought out. The currently courted (as of writing this) idea/solution is to send a list of updated hashes with each heartbeat. This allows quicker updating the bloom filters by simple O(1) insertions.

The bloomfilter is backed by a third-party library (https://github.com/willf/bitset) and will continue to be so for the foreseeable future. To cut down on network burden, all bloom filters are marshalled to JSON and then run-length encoded. This tends to heavily cut down on total size of data being transmitted.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Decode

func Decode(encodedString string) string

Decode essentially works opposite of Encode and turns and encoded string into a normal, usable string.

func Encode

func Encode(inputString string) string

Encode handles encoding the bloom filter. An example string before encoding: "AAAABBCCCDZZZRTTT" An example string after encoding "A4B2C3D1Z3R1T3". Please note: This could essentially return an inoptimal compression, as if we have a string with alternating single values (i.e., "01010101010101") this RLE encoding will result in a string twice as long. There is an optimization where we only count runs and not single characters, but for the time being, I don't think it's necessary.

Types

type Bitset

type Bitset interface {
	Add(uint)
	Contains(uint) bool
	ToString() string
	FromString(string)
	Compare(interface{}) bool
	IsSet(uint) bool
	Len() uint
}

type BloomFilter

type BloomFilter interface {
	AddKey(key []byte) (bool, []uint)
	HasKey(key []byte) (bool, []uint)
	Serialize() string
	GetMaxSize() uint
	GetStorage() Bitset
	Compare(interface{}) bool
	HashKey([]byte) []uint
}

type SimpleBloomFilter

type SimpleBloomFilter struct {

	// Total number of hashing functions
	HashFunctions uint

	HashCache *lru.LRUCacheInt32Array
	// contains filtered or unexported fields
}

func Deserialize

func Deserialize(inputString string, maxSize uint) (*SimpleBloomFilter, error)

ConvertStringToBF Decodes the RLE'd bloom filter and then converts it to an actual bloom filter in-memory.

func NewByFailRate

func NewByFailRate(items uint, probability float64) *SimpleBloomFilter

NewByFailRate allows generation of a bloom filter with a pre-conceived amount of items and a false-positive failure rate. We calculate our bloom filter bounds and generate the new bloom filter this way.

func NewSimpleBF

func NewSimpleBF(maxSize uint, hashFuns uint) *SimpleBloomFilter

New Returns a pointer to a newly allocated `SimpleBloomFilter` object

func (*SimpleBloomFilter) AddKey

func (bf *SimpleBloomFilter) AddKey(key []byte) (bool, []uint)

AddKey Adds a new key to the bloom filter

func (*SimpleBloomFilter) Compare

func (bf *SimpleBloomFilter) Compare(remote interface{}) bool

Compare returns if the two bloomfilters are equal

func (*SimpleBloomFilter) GetMaxSize

func (bf *SimpleBloomFilter) GetMaxSize() uint

GetMaxSize returns the max size. Just an ugly getter.

func (*SimpleBloomFilter) GetStorage

func (bf *SimpleBloomFilter) GetStorage() Bitset

GetStorage handles returning the underlying bloomfilter bitset.

func (*SimpleBloomFilter) HasKey

func (bf *SimpleBloomFilter) HasKey(key []byte) (bool, []uint)

HasKey verifies if a key is or isn't in the bloom filter.

func (*SimpleBloomFilter) HashKey

func (bf *SimpleBloomFilter) HashKey(key []byte) []uint

HashKey Takes a string in as an argument and hashes it several times to create usable indexes for the bloom filter.

func (*SimpleBloomFilter) Serialize

func (bf *SimpleBloomFilter) Serialize() string

ConvertToString handles conversion of a bloom filter to a string. Moreover, it enforces RLE encoding, so that fewer bytes are transferred per request.

type WFBitset

type WFBitset struct {
	// contains filtered or unexported fields
}

WFBitset is a simple wrapper around the willf bitset library.

func NewWFBitset

func NewWFBitset(maxSize uint) *WFBitset

NewWFBitset constructs a new bitset to be used with bloom filters.

func (*WFBitset) Add

func (b *WFBitset) Add(index uint)

Add handles adding a new hashed index into the bitset.

func (*WFBitset) Compare

func (b *WFBitset) Compare(compareTo interface{}) bool

func (*WFBitset) Contains

func (b *WFBitset) Contains(index uint) bool

Contains verifies if a hash index is actually in the bitset or not.

func (*WFBitset) FromString

func (b *WFBitset) FromString(inputString string)

FromString handles converting a (valid json) string to a valid underlying bitset.

func (*WFBitset) IsSet

func (b *WFBitset) IsSet(bitIndex uint) bool

func (*WFBitset) Len

func (b *WFBitset) Len() uint

func (*WFBitset) ToString

func (b *WFBitset) ToString() string

ToString handles converting the bitset to a RLE usable string.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL