gloom

package module

v1.0.0 Latest Latest Go to latest Published: Jun 13, 2018 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/alexanderbez/gloom

Links

Open Source Insights

README ¶

gloom

A simple and intuitive implementation of a Bloom filter using enhanced double hashing.

Within gloom, enhanced double hashing is used to set bit positions. The choice for double hashing was shown to be effective without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice by Adam Kirsch and Michael Mitzenmacher in Less Hashing, Same Performance: Building a Better Bloom Filter.

The enhanced double hash is of the form:

g_i(x) = (H₁(x) + iH₂(x) + f(i)) mod m, where

H₁ is FNV-1a 64-bit, H₂ is Murmur3 64-bit, and f(i) = i³

What is a Bloom filter?

A Bloom filter is a space-efficient probabilistic data structure used for set inclusion queries. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set".

Essentially, a Bloom filter contains a single bit vector of size m and k independent and uniform hash functions to insert n set items. Upon inserting a set item into the filter, the k hash functions return all bit vector positions to set for said item.

To test inclusion of an item, all mapped k bit positions must contain a set bit. It is possible to get a false positive for an item, but under a desired and given probability. Although Bloom filters allow false positives, the space savings often outweigh this drawback.

API

To initialize a Bloom filter, a given set size and desired false positive probability is needed. The size of the bit vector and the number of hash functions to use is determined by these parameters.

import (
   	"github.com/alexanderbez/gloom"
)

bf, err := gloom.NewBloomFilter(n, gloom.DefaultFalsePosProb)

item := []byte("foo")
bf.Set(item)

ok, err := bf.Includes(item)

Tests

$ go test -v ./...

Contributing

Fork it
Create your feature branch (git checkout -b feature/my-new-feature)
Commit your changes (git commit -m 'Add some feature')
Push to the branch (git push origin feature/my-new-feature)
Create a new Pull Request

Documentation ¶

Index ¶

Constants
type BloomFilter
- func NewBloomFilter(n uint64, p float64) (*BloomFilter, error)

Constants ¶

View Source

const (
	// DefaultFalsePosProb is the default value (1%) for the probability of a
	// false positive in a Bloom filter.
	DefaultFalsePosProb = 0.01
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type BloomFilter ¶

type BloomFilter struct {
	// contains filtered or unexported fields
}

BloomFilter implements a space-efficient randomized data structure for representing a set in order to support membership queries. The filters allow for false positives, however, the space savings often outweigh this drawback. In other words, the filter allows for one-sided errors: Either an element is "probably" in the set or it is definitely not in the set.

The Fundamental implementation contains two hash functions that allow for a double hashing technique to facilitate up to k-1 hashes. This is combined with the size of the bit vector, directly correlates to the probability of a false positive.

Choice for double hashing was shown to be effective without any loss in the asymptotic false positive probability, leading to less computation and potentially less need for randomness in practice by Adam Kirsch and Michael Mitzenmacher in: 'Less Hashing, Same Performance: Building a Better Bloom Filter'

Non-cryptographic hash functions FNV-1a and MurmurHash3 are used for speed performance.

func NewBloomFilter ¶

func NewBloomFilter(n uint64, p float64) (*BloomFilter, error)

NewBloomFilter returns a reference to a new Bloom filter. It requires a, potentially approximate, set size n and false positive probability p. The optimal value of k, number of hash functions, and m, bit vector size will be used.

func (*BloomFilter) Includes ¶

func (bf *BloomFilter) Includes(data []byte) (bool, error)

Includes returns whether or not some arbitrary set item (byte slice) is most likely in the Bloom filter. There is a possibility for a false positive with the probability being under the Bloom filter's p value. An error is returned if any hash function write fails.

func (*BloomFilter) NumItemsApprox ¶

func (bf *BloomFilter) NumItemsApprox() uint64

NumItemsApprox returns the approximate total number of items in the Bloom filter.

func (*BloomFilter) Set ¶

func (bf *BloomFilter) Set(data []byte) error

Set accepts a set item (byte slice) and sets the appropriate bits to 1 in the bit vector. An error is returned if any hash function write fails. Enhanced double hashing is used with two hash functions instead of k uniform random hash functions.

func (*BloomFilter) String ¶

func (bf *BloomFilter) String() string

String implements the Stringer interface for a Bloom filter.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
murmur3

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL