bloom

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

README

bloom

Go Reference Go Report Card Build Status Go Coverage LICENSE

An ultra fast, lightweight, concurrent-safe Bloom filter for Go.

Zero dependencies — only the Go standard library. Zero allocations per operation. Lock-free concurrent reads and writes via atomic.Uint64. Built for production use, small enough to understand (just over 100 LOC).

This library is based on my blog post: The Magic of Bloom Filters, which covers the theory, math, benchmarks, and design decisions in depth.

Features

  • Zero dependencies — uses only the Go standard library (hash/fnv, sync/atomic, sync, math)
  • Zero allocations — hashers are recycled via sync.Pool (0 B/op, 0 allocs/op)
  • Concurrent-safe with no locksatomic.Uint64 bitset for lock-free Add and Contains
  • Kirsch-Mitzenmacher optimization — one hash call simulates k independent hashes
  • Configurable — create from explicit (m, k) or from (n, p) probability parameters
  • Pluggable hash function — default FNV-1a, swappable via WithHashFunc option
  • Binary serializationMarshalBinary / UnmarshalBinary for persistence and network transfer

Install

go get github.com/phrozen/bloom

Usage

Bloom filters answer one question: "Is this item in the set?" The answer is either a definitive No or a probabilistic Probably. There are no false negatives — if the filter says no, the item was never added. This makes them ideal for guarding expensive lookups: check the filter first, skip the database/disk/network call when the answer is no.

Create from probability parameters

Most of the time, you know how many items you expect and what false positive rate you can tolerate. The constructor handles the math for you — it computes the optimal number of bits and hash functions automatically:

// Filter for 1M items with a 1% false positive rate.
// Internally this allocates ~1.14 MB (9.58M bits) and uses 7 hash functions.
f := bloom.NewFilterFromProbability(1_000_000, 0.01)
Create with explicit parameters

If you already know the exact bit size and hash count you want (e.g. reproducing a filter from known parameters), you can specify them directly:

// 10 million bits, 7 hash functions
f := bloom.NewFilter(10_000_000, 7)
Add and query

Both functions are concurrent-safe and cannot fail (no error). Passing an empty []byte is valid — it simply hashes like any other input.

id := "550e8400-e29b-41d4-a716-446655440000"
f.Add([]byte(id))

if !f.Contains([]byte(id)) {
    // Definitively NOT in the set — skip the database entirely
    return ErrNotFound
}
// Probably in the set — check the database to confirm
Custom hash function

The default hasher is FNV-1a from the standard library. If you need a different algorithm, swap it via the WithHashFunc option. Any hash.Hash64 implementation works:

import "github.com/cespare/xxhash/v2"

f := bloom.NewFilterFromProbability(1_000_000, 0.01,
    bloom.WithHashFunc(func() hash.Hash64 { return xxhash.New() }),
)
Serialize / Deserialize

Filters implement encoding.BinaryMarshaler and encoding.BinaryUnmarshaler, so you can persist them to disk, send them over the network, or embed them in any encoding that supports those interfaces:

// Save
data, err := f.MarshalBinary()

// Restore
var restored bloom.Filter
err = restored.UnmarshalBinary(data)

The binary format is compact: 16 bytes of header (m + k) followed by the raw bitset. A 1M-item filter at 1% FPR serializes to ~1.14 MB.

Benchmarks

All benchmarks on AMD Ryzen 7 5800X (16 threads), 16-byte UUID keys, go test -bench=. -benchmem:

Per-operation
Operation ns/op B/op allocs/op
Add ~36 ns 0 0
Contains ~25 ns 0 0
At scale (1M items, 1% FPR, 16 goroutines)
Benchmark ns/op B/op allocs/op
Concurrent Add ~12 ns 0 0
Concurrent Contains ~9 ns 0 0
Concurrent Add+Contains (50/50) ~14 ns 0 0
Filter memory footprint 1.14 MB
Actual false positive rate ~1.00%
Hash function comparison

The sync.Pool optimization eliminates the interface allocation penalty entirely. All hashers now perform within the same ballpark, despite wildly different internal struct sizes:

Hasher Add ns/op Contains ns/op allocs/op
FNV-1a (default, std lib) ~36 ns ~25 ns 0
xxHash ~40 ns ~26 ns 0
Murmur3 ~36 ns ~29 ns 0
XXH3 ~36 ns ~29 ns 0

FNV-1a is the default because it ships with Go — zero dependencies — and with Kirsch-Mitzenmacher splitting a 64-bit hash into two 32-bit halves, all four hashers produce identical false positive rates (~1% for a 1% target). Hash "quality" is a non-factor for this use case.

How it works

  1. Bit array — a []atomic.Uint64 slice aligned to CPU word size for single-instruction bit operations
  2. Kirsch-Mitzenmacher — hash once, split into two 32-bit halves h1 and h2, simulate k hashes via h1 + i*h2 (paper)
  3. sync.Pool — recycles hasher instances so Add/Contains do zero heap allocations
  4. Atomic operationsOr to set bits, Load to read them — fully concurrent, no mutex

API

// Constructors
func NewFilter(m int, k int, opts ...Option) *Filter
func NewFilterFromProbability(n int, p float64, opts ...Option) *Filter

// Options
func WithHashFunc(h HashFunc) Option

// Operations
func (f *Filter) Add(data []byte)
func (f *Filter) Contains(data []byte) bool

// Serialization (encoding.BinaryMarshaler / encoding.BinaryUnmarshaler)
func (f *Filter) MarshalBinary() ([]byte, error)
func (f *Filter) UnmarshalBinary(data []byte) error

License

Apache 2.0

Documentation

Overview

Package bloom provides a space-efficient, concurrent-safe Bloom filter with zero allocations per operation.

A Bloom filter is a probabilistic data structure that can answer the question "Is this item in the set?" with either a definitive No or a probabilistic Probably. There are no false negatives: if Contains returns false, the item was never added. False positives are possible but their rate is controlled by the filter's size and number of hash functions.

The filter uses atomic.Uint64 for its bitset, making Add and Contains fully safe for concurrent use without any locks. Hashers are recycled via sync.Pool to eliminate per-call heap allocations.

Hash independence is achieved through the Kirsch-Mitzenmacher optimization: a single 64-bit hash is split into two 32-bit halves to simulate k independent hash functions. The default hasher is FNV-1a from the standard library; a custom hash.Hash64 can be injected via WithHashFunc.

Filters implement encoding.BinaryMarshaler and encoding.BinaryUnmarshaler for compact serialization.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Filter

type Filter struct {
	// contains filtered or unexported fields
}

Filter represents a Bloom filter. We use atomic.Uint64 so that Add and Contains are fully concurrent-safe without requiring an external mutex. On modern 64-bit architectures, atomic operations on aligned 64-bit words are essentially free.

func NewFilter

func NewFilter(m int, k int, opts ...Option) *Filter

NewFilter creates a Bloom filter directly with a specific bit size (m) and number of hash iterations (k).

func NewFilterFromProbability

func NewFilterFromProbability(n int, p float64, opts ...Option) *Filter

NewFilterFromProbability creates a Bloom filter tailored for an expected number of elements (`n`) and a desired false-positive probability (`p`). Example: NewFilterFromProbability(10000, 0.01) creates a filter for 10k items with a 1% chance of a false positive.

Example
// Create a filter for 1000 items with a 1% false positive rate
f := NewFilterFromProbability(1000, 0.01)

// Add an item
f.Add([]byte("my-database-key"))

// Check if it exists
if f.Contains([]byte("my-database-key")) {
	fmt.Println("Item is probably in the set.")
}

if !f.Contains([]byte("non-existent-key")) {
	fmt.Println("Item is definitively NOT in the set.")
}
Output:

Item is probably in the set.
Item is definitively NOT in the set.

func (*Filter) Add

func (f *Filter) Add(data []byte)

Add inserts data into the Bloom filter. It hashes the data once, splits the 64-bit result into two 32-bit halves, and simulates `k` independent hashes via Kirsch-Mitzenmacher. This method is safe for concurrent use.

func (*Filter) Contains

func (f *Filter) Contains(data []byte) bool

Contains checks if data might be in the set. If *any* of the `k` hash positions are 0, the element was definitively never added. If *all* of the positions are 1, it *might* have been added (or we have a collision). This method is safe for concurrent use.

func (*Filter) MarshalBinary

func (f *Filter) MarshalBinary() ([]byte, error)

MarshalBinary implements the encoding.BinaryMarshaler interface. It serializes the filter's parameters and bitset into a portable binary format. The hash function is NOT serialized — upon UnmarshalBinary, the default FNV-1a hasher is used. Use WithHashFunc after unmarshaling if needed.

func (*Filter) UnmarshalBinary

func (f *Filter) UnmarshalBinary(data []byte) error

UnmarshalBinary implements the encoding.BinaryUnmarshaler interface. It restores the filter's parameters and bitset from data produced by MarshalBinary. If the receiver was created with a custom hasher (via WithHashFunc), that hasher is preserved. Otherwise, it defaults to FNV-1a.

type HashFunc

type HashFunc func() hash.Hash64

HashFunc is a function that returns a new hash.Hash64. By defining an interface for our hasher, we avoid hardcoding a specific algorithm. This gives users the flexibility to swap the default hasher (FNV-1a) with a faster one (like Murmur3 or xxHash) if their use case requires maximum throughput.

type Option

type Option func(*Filter)

Option configures the Filter using the functional options pattern. This pattern allows us to provide sensible defaults while remaining open for extension later, without breaking the API.

func WithHashFunc

func WithHashFunc(h HashFunc) Option

WithHashFunc allows providing a custom hash function constructor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL