README
¶
Important: Zeroth, consider if a Cuckoo filter could be right for your use-case.
Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
Copyright © 2014-2016,2018 Barry Allard
WTF is a bloom filter
**TL;DR: **Probabilistic, extra lookup table to track a set of elements kept elsewhere to reduce expensive, unnecessary set element retrieval and/or iterator operations when an element is not present in the set. It's a classic time-storage tradeoff algoritm.
Properties
See wikipedia for algorithm details
Impact | What | Description |
---|---|---|
Good | No false negatives | know for certain if a given element is definitely NOT in the set |
Bad | False positives | uncertain if a given element is in the set |
Bad | Theoretical potential for hash collisions | in very large systems and/or badly hash.Hash64-conforming implementations |
Bad | Add only | Cannot remove an element, it would destroy information about other elements |
Good | Constant storage | uses only a fixed amount of memory |
Naming conventions
(Similar to algorithm)
Variable/function | Description | Range |
---|---|---|
m/M() | number of bits in the bloom filter (memory representation is about m/8 bytes in size) | >=2 |
n/N() | number of elements present | >=0 |
k/K() | number of keys to use (keys are kept private to user code but are de/serialized to Marshal and file I/O) | >=0 |
maxN | maximum capacity of intended structure | >0 |
p | maximum allowed probability of collision (for computing m and k for optimal sizing) | >0..<1 |
- Memory representation should be exactly
24 + 8*(k + (m+63)/64) + unsafe.Sizeof(RWMutex)
bytes. - Serialized (
BinaryMarshaler
) representation should be exactly72 + 8*(k + (m+63)/64)
bytes. (Disk format is less due to compression.)
Binary serialization format
All values in Little-endian format
Offset | Offset (Hex) | Length (bytes) | Name | Type |
---|---|---|---|---|
0 | 00 | 8 | k | uint64 |
8 | 08 | 8 | n | uint64 |
16 | 10 | 8 | m | uint64 |
24 | 18 | k | (keys) | [k]uint64 |
24+8*k | ... | (m+63)/64 | (bloom filter) | [(m+63)/64]uint64 |
24+8*k+8*((m+63)/64) | ... | 48 | (SHA384 of all previous fields, hashed in order) | [48]byte |
bloomfilter.Filter
conforms toencoding.BinaryMarshaler
and `encoding.BinaryUnmarshaler'
Usage
import "github.com/steakknife/bloomfilter"
const (
maxElements = 100000
probCollide = 0.0000001
)
bf, err := bloomfilter.NewOptimal(maxElements, probCollide)
if err != nil {
panic(err)
}
someValue := ... // must conform to hash.Hash64
bf.Add(someValue)
if bf.Contains(someValue) { // probably true, could be false
// whatever
}
anotherValue := ... // must also conform to hash.Hash64
if bf.Contains(anotherValue) {
panic("This should never happen")
}
err := bf.WriteFile("1.bf.gz") // saves this BF to a file
if err != nil {
panic(err)
}
bf2, err := bloomfilter.ReadFile("1.bf.gz") // read the BF to another var
if err != nil {
panic(err)
}
Design
Where possible, branch-free operations are used to avoid deep pipeline / execution unit stalls on branch-misses.
Get
go get -u github.com/steakknife/bloomfilter # master is always stable
Source
-
On the web: https://github.com/steakknife/bloomfilter
-
Git:
git clone https://github.com/steakknife/bloomfilter
Contact
License
Copyright © 2014-2016 Barry Allard
Documentation
¶
Overview ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license ¶
Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go
https://github.com/steakknife/bloomfilter
Copyright © 2014, 2015, 2018 Barry Allard ¶
MIT license
Index ¶
- Constants
- func EnableDebugging()
- func OptimalK(m, maxN uint64) uint64
- func OptimalM(maxN uint64, p float64) uint64
- func UniqueKeys(keys []uint64) bool
- type Filter
- func New(m, k uint64) (*Filter, error)
- func NewOptimal(maxN uint64, p float64) (*Filter, error)
- func NewWithKeys(m uint64, origKeys []uint64) (f *Filter, err error)
- func ReadFile(filename string) (f *Filter, n int64, err error)
- func ReadFrom(r io.Reader) (f *Filter, n int64, err error)
- func UnmarshalText(text []byte) (f *Filter, err error)
- func (f *Filter) Add(v uint64)
- func (f *Filter) Contains(v uint64) bool
- func (f *Filter) Copy() (*Filter, error)
- func (f *Filter) FalsePosititveProbability() float64
- func (f *Filter) GobDecode(data []byte) error
- func (f *Filter) GobEncode() ([]byte, error)
- func (f *Filter) IsCompatible(f2 *Filter) bool
- func (f *Filter) K() uint64
- func (f *Filter) M() uint64
- func (f *Filter) MarshalBinary() (data []byte, err error)
- func (f *Filter) MarshalText() (text []byte, err error)
- func (f *Filter) N() uint64
- func (f *Filter) NewCompatible() (*Filter, error)
- func (f *Filter) PreciseFilledRatio() float64
- func (f *Filter) ReadFrom(r io.Reader) (n int64, err error)
- func (f *Filter) Union(f2 *Filter) (out *Filter, err error)
- func (f *Filter) UnionInPlace(f2 *Filter) error
- func (f *Filter) UnmarshalBinary(data []byte) (err error)
- func (f *Filter) UnmarshalText(text []byte) error
- func (f *Filter) WriteFile(filename string) (n int64, err error)
- func (f *Filter) WriteTo(w io.Writer) (n int64, err error)
Constants ¶
const ( // MMin is the minimum Bloom filter bits count MMin = 2 // KMin is the minimum number of keys KMin = 1 // Uint64Bytes is the number of bytes in type uint64 Uint64Bytes = 8 )
Variables ¶
This section is empty.
Functions ¶
func EnableDebugging ¶
func EnableDebugging()
EnableDebugging permits debug() logging of details to stderr
func OptimalK ¶
OptimalK calculates the optimal k value for creating a new Bloom filter maxn is the maximum anticipated number of elements
Types ¶
type Filter ¶
type Filter struct {
// contains filtered or unexported fields
}
Filter is an opaque Bloom filter type
func New ¶
New Filter with CSPRNG keys
m is the size of the Bloom filter, in bits, >= 2
k is the number of random keys, >= 1
func NewOptimal ¶
NewOptimal Bloom filter with random CSPRNG keys
func NewWithKeys ¶
NewWithKeys creates a new Filter from user-supplied origKeys
func ReadFile ¶
ReadFile from filename into a lossless-compressed Bloom Filter f Suggested file extension: .bf.gz
func UnmarshalText ¶
UnmarshalText conforms to TextUnmarshaler
func (*Filter) Contains ¶
Contains tests if f contains v false: f definitely does not contain value v true: f maybe contains value v
func (*Filter) FalsePosititveProbability ¶
FalsePosititveProbability is the upper-bound probability of false positives
(1 - exp(-k*(n+0.5)/(m-1))) ** k
func (*Filter) IsCompatible ¶
IsCompatible is true if f and f2 can be Union()ed together
func (*Filter) MarshalBinary ¶
MarshalBinary converts a Filter into []bytes
func (*Filter) MarshalText ¶
MarshalText conforms to encoding.TextMarshaler
func (*Filter) N ¶
N is how many elements have been inserted (actually, how many Add()s have been performed?)
func (*Filter) NewCompatible ¶
NewCompatible Filter compatible with f
func (*Filter) PreciseFilledRatio ¶
PreciseFilledRatio is an exhaustive count # of 1's
func (*Filter) UnionInPlace ¶
UnionInPlace merges Bloom filter f2 into f
func (*Filter) UnmarshalBinary ¶
UnmarshalBinary converts []bytes into a Filter conforms to encoding.BinaryUnmarshaler
func (*Filter) UnmarshalText ¶
UnmarshalText method overwrites f with data decoded from text