gloom

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 13, 2018 License: MIT Imports: 5 Imported by: 0

README

gloom

GoDoc Build Status

A simple and intuitive implementation of a Bloom filter using enhanced double hashing.

Within gloom, enhanced double hashing is used to set bit positions. The choice for double hashing was shown to be effective without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice by Adam Kirsch and Michael Mitzenmacher in Less Hashing, Same Performance: Building a Better Bloom Filter.

The enhanced double hash is of the form:

gi(x) = (H1(x) + iH2(x) + f(i)) mod m, where

H1 is FNV-1a 64-bit, H2 is Murmur3 64-bit, and f(i) = i3

What is a Bloom filter?

A Bloom filter is a space-efficient probabilistic data structure used for set inclusion queries. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set".

Essentially, a Bloom filter contains a single bit vector of size m and k independent and uniform hash functions to insert n set items. Upon inserting a set item into the filter, the k hash functions return all bit vector positions to set for said item.

To test inclusion of an item, all mapped k bit positions must contain a set bit. It is possible to get a false positive for an item, but under a desired and given probability. Although Bloom filters allow false positives, the space savings often outweigh this drawback.

API

To initialize a Bloom filter, a given set size and desired false positive probability is needed. The size of the bit vector and the number of hash functions to use is determined by these parameters.

import (
   	"github.com/alexanderbez/gloom"
)

bf, err := gloom.NewBloomFilter(n, gloom.DefaultFalsePosProb)

item := []byte("foo")
bf.Set(item)

ok, err := bf.Includes(item)

Tests

$ go test -v ./...

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b feature/my-new-feature)
  3. Commit your changes (git commit -m 'Add some feature')
  4. Push to the branch (git push origin feature/my-new-feature)
  5. Create a new Pull Request

Documentation

Index

Constants

View Source
const (
	// DefaultFalsePosProb is the default value (1%) for the probability of a
	// false positive in a Bloom filter.
	DefaultFalsePosProb = 0.01
)

Variables

This section is empty.

Functions

This section is empty.

Types

type BloomFilter

type BloomFilter struct {
	// contains filtered or unexported fields
}

BloomFilter implements a space-efficient randomized data structure for representing a set in order to support membership queries. The filters allow for false positives, however, the space savings often outweigh this drawback. In other words, the filter allows for one-sided errors: Either an element is "probably" in the set or it is definitely not in the set.

The Fundamental implementation contains two hash functions that allow for a double hashing technique to facilitate up to k-1 hashes. This is combined with the size of the bit vector, directly correlates to the probability of a false positive.

Choice for double hashing was shown to be effective without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice by Adam Kirsch and Michael Mitzenmacher in: 'Less Hashing, Same Performance: Building a Better Bloom Filter'

Non-cryptographic hash functions FNV-1a and MurmurHash3 are used for speed performance.

func NewBloomFilter

func NewBloomFilter(n uint64, p float64) (*BloomFilter, error)

NewBloomFilter returns a reference to a new Bloom filter. It requires a, potentially approximate, set size n and false positive probability p. The optimal value of k, number of hash functions, and m, bit vector size will be used.

func (*BloomFilter) Includes

func (bf *BloomFilter) Includes(data []byte) (bool, error)

Includes returns whether or not some arbitrary set item (byte slice) is most likely in the Bloom filter. There is a possibility for a false positive with the probability being under the Bloom filter's p value. An error is returned if any hash function write fails.

func (*BloomFilter) NumItemsApprox

func (bf *BloomFilter) NumItemsApprox() uint64

NumItemsApprox returns the approximate total number of items in the Bloom filter.

func (*BloomFilter) Set

func (bf *BloomFilter) Set(data []byte) error

Set accepts a set item (byte slice) and sets the appropriate bits to 1 in the bit vector. An error is returned if any hash function write fails. Enhanced double hashing is used with two hash functions instead of k uniform random hash functions.

func (*BloomFilter) String

func (bf *BloomFilter) String() string

String implements the Stringer interface for a Bloom filter.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL