bloom

package module
v0.0.0-...-65d2bcd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 27, 2017 License: CC0-1.0 Imports: 7 Imported by: 0

README

bloom

Bloom filter in Go

What

Package bloom is a fast, space-efficient bloom filter.

Tips

Install or build this package with -tags=unsafe to utilize faster string to []byte conversions.

These conversions are only done when -tags=unsafe is added and only done inside the hash function.

On my old i3 laptop this increases the speed of the bloom filter from ~75ns to ~64ns.

Performance

BenchmarkAndreas-4	20000000	        75.8 ns/op
BenchmarkBloom-4  	20000000	        68.1 ns/op
BenchmarkWillf-4  	 5000000	       351 ns/op
BenchmarkSpencer-4	 5000000	       314 ns/op

GoDoc

GoDoc

Documentation

Overview

Package bloom is a fast, space-efficient bloom filter.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrUnknownEncoding is returned from UnmarshalBinary if the version isn't
	// recognized.
	ErrUnknownEncoding = errors.New("bloom: unknown encoding")

	// ErrDataTooShort is returned from UnmarshalBinary if the length of the
	// data is too short.
	ErrDataTooShort = errors.New("bloom: data too short")
)

Functions

func Jaccard

func Jaccard(a, b *Filter) (float64, error)

Jaccard estimates the Jaccard Index of a and b. Its special cases are the same as Union's.

Types

type Dynamic

type Dynamic struct {
	// contains filtered or unexported fields
}

Dynamic is a Bloom filter that doesn't need a pre-set size. The idea comes from http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

func NewDynamic

func NewDynamic(p float64) *Dynamic

NewDynamic creates a new Bloom filter for an unbounded number of items with probability p. See New for more information.

func (*Dynamic) Add

func (d *Dynamic) Add(key string)

func (*Dynamic) AddBytes

func (d *Dynamic) AddBytes(key []byte)

func (*Dynamic) Has

func (d *Dynamic) Has(key string) bool

func (*Dynamic) HasBytes

func (d *Dynamic) HasBytes(key []byte) bool

func (*Dynamic) MarshalBinary

func (d *Dynamic) MarshalBinary() (data []byte, err error)

func (*Dynamic) UnmarshalBinary

func (d *Dynamic) UnmarshalBinary(data []byte) error

type Filter

type Filter struct {
	N uint64  // original number of items
	P float64 // original p value
	// contains filtered or unexported fields
}

Filter is a Bloom filter.

func New

func New(n int, p float64) *Filter

New creates a new Bloom filter for n items with probability p. p should be the between 0 and 1 and indicate the probability of false positives wanted. n should be a positive integer describing the number of items in the filter.

func (*Filter) Add

func (f *Filter) Add(key string)

Add adds a key to the filter.

func (*Filter) AddBytes

func (f *Filter) AddBytes(key []byte)

AddBytes adds a key to the filter.

func (*Filter) ErrRate

func (f *Filter) ErrRate() float64

ErrRate returns the estimated false positive rate based on the number of items in the filter. This approaches 1 as more items are added to the filter.

func (*Filter) Has

func (f *Filter) Has(key string) bool

Has returns true if the key probably exists in the filter.

func (*Filter) HasBytes

func (f *Filter) HasBytes(key []byte) bool

HasBytes returns true if the key probably exists in the filter.

func (*Filter) Intersect

func (f *Filter) Intersect(f1, f2 *Filter) error

Intersect sets f to the intersection of f1 and f2. Intersect's special cases are the same as Union's.

func (*Filter) MarshalBinary

func (f *Filter) MarshalBinary() (data []byte, err error)

MarshalBinary implements encoding.BinaryMarshaler.

func (*Filter) Size

func (f *Filter) Size() int

Size returns the approximate number of items in the filter. At most it should be within 3.5% of the actual amount, assuming the number of items in the filter is <= the original size of the filter.

Full disclosure: in practice, the variance is less than 1%; 3.5% is absolute maximum, tested up to 1e7 elements (see bloom_test.go). The algorithm is from http://pubs.acs.org/doi/abs/10.1021/ci600526a, but since I do not have access to ACS I do not know if the authors of the authors published the variance of the algorithm.

func (*Filter) Stats

func (f *Filter) Stats() (hashes, nbits, popcount uint64)

Stats returns basic memory information. hashes is the number of hash functions and nbits is the number of usable bits. The total number of bits allocated by the filter will be nbits rounded up to the nearest multiple of 64. popcount is the number of set bits.

func (*Filter) Union

func (f *Filter) Union(f1, f2 *Filter) error

Union sets f to the union of f1 and f2.

The union of two filters can only be taken if their P values are the same and their bit vectors are the same length. In practice, this means f1.P == f2.P && f1.N == f2.N. f will not be modified if err != nil. It is safe for f to alias either f1 or f2.

func (*Filter) UnmarshalBinary

func (f *Filter) UnmarshalBinary(data []byte) error

UnmarshalBinary implements encoding.BinaryUnmarshaler.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL