Documentation
¶
Overview ¶
Package gloom provides high-performance bloom filter implementations for Go.
A bloom filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set. False positive matches are possible, but false negatives are not – if the filter says an element is not present, it definitely is not. If it says an element might be present, it could be a false positive.
Architecture ¶
Gloom uses two key optimizations for maximum throughput:
Cache-line blocking: Each bloom filter is divided into 512-bit (64-byte) blocks that match the CPU cache line size. All k hash probes for a single key access the same block, meaning only one cache line is touched per operation. This dramatically reduces memory latency compared to traditional bloom filters where k probes might touch k different cache lines.
One-hashing with prime partitions: Instead of computing k independent hash functions, gloom computes a single xxh3 hash and derives k bit positions using modulo operations with distinct prime numbers. This technique, based on the paper "One-Hashing Bloom Filter", provides excellent bit distribution while being much faster than multiple hash computations.
Implementations ¶
Three filter implementations are provided for different use cases:
Filter is the fastest option for single-threaded workloads. It has no synchronization overhead and achieves ~20ns per Add/Test operation.
AtomicFilter provides thread-safety using lock-free atomic operations. Multiple goroutines can safely call Add and Test concurrently. It uses sync/atomic.Uint64.Or (Go 1.23+) for efficient atomic bit-setting.
ShardedAtomicFilter distributes keys across multiple independent shards to reduce contention under heavy parallel writes. The shard count is auto-tuned to GOMAXPROCS by default. Use this when you have many goroutines performing concurrent writes.
Choosing Parameters ¶
Use New, NewAtomic, or NewShardedAtomicDefault with your expected number of items and desired false positive rate:
// Filter for 1 million items with 1% false positive rate f := gloom.New(1_000_000, 0.01)
The functions automatically calculate optimal filter size and number of hash functions. For advanced use cases, NewWithParams and NewAtomicWithParams allow explicit control over the number of 512-bit blocks and hash functions.
False Positive Rate ¶
The false positive rate depends on:
- Filter capacity (number of blocks)
- Number of hash functions (k)
- Number of items added
When the filter is filled to its intended capacity, it will achieve approximately the target false positive rate. Adding more items than the capacity increases the false positive rate. Use Filter.EstimatedFalsePositiveRate to monitor the current rate.
Memory Usage ¶
Memory usage is determined by the number of 512-bit blocks:
memory_bytes = num_blocks * 64
For a filter sized for n items with false positive rate p:
memory_bits ≈ -n * ln(p) / (ln(2))²
Example: 1 million items at 1% FP rate ≈ 1.2 MB
Thread Safety ¶
Filter is NOT thread-safe. Use external synchronization or choose AtomicFilter or ShardedAtomicFilter for concurrent access.
AtomicFilter and ShardedAtomicFilter are safe for concurrent Add and Test operations.
Performance Tips ¶
- Use Filter for single-threaded workloads (fastest)
- Use ShardedAtomicFilter for write-heavy concurrent workloads
- Use AtomicFilter for read-heavy concurrent workloads
- Use string methods (Filter.AddString, Filter.TestString) to avoid allocating when you have string keys
- Build with GOAMD64=v2 or higher to enable hardware POPCNT for Filter.EstimatedFillRatio
References ¶
- One-Hashing Bloom Filter: https://yangtonghome.github.io/uploads/One_Hashing.pdf
- Cache-line blocking (RocksDB): https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter
- Less Hashing, Same Performance: https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf
Example ¶
This example demonstrates basic bloom filter usage for membership testing.
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
// Create a filter for 10,000 items with 1% false positive rate
f := gloom.New(10_000, 0.01)
// Add some items
f.Add([]byte("apple"))
f.Add([]byte("banana"))
f.Add([]byte("cherry"))
// Test membership
fmt.Println("apple:", f.Test([]byte("apple"))) // true (added)
fmt.Println("banana:", f.Test([]byte("banana"))) // true (added)
fmt.Println("grape:", f.Test([]byte("grape"))) // false (not added)
}
Output: apple: true banana: true grape: false
Example (Concurrent) ¶
This example demonstrates using AtomicFilter for concurrent access.
package main
import (
"fmt"
"sync"
"github.com/jcalabro/gloom"
)
func main() {
// AtomicFilter is safe for concurrent Add and Test
f := gloom.NewAtomic(100_000, 0.01)
var wg sync.WaitGroup
// Spawn multiple writers
for i := range 4 {
wg.Add(1)
go func(id int) {
defer wg.Done()
for j := range 1000 {
key := fmt.Sprintf("worker-%d-item-%d", id, j)
f.AddString(key)
}
}(i)
}
// Spawn multiple readers (can run concurrently with writers)
for i := range 4 {
wg.Add(1)
go func(id int) {
defer wg.Done()
for j := range 1000 {
key := fmt.Sprintf("worker-%d-item-%d", id, j)
_ = f.TestString(key)
}
}(i)
}
wg.Wait()
fmt.Println("Items added:", f.Count())
}
Output: Items added: 4000
Example (CustomParameters) ¶
This example demonstrates creating a filter with explicit parameters.
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
// Create with explicit block count and hash functions.
// 1000 blocks * 512 bits = 512,000 bits total capacity.
// k=7 hash functions is optimal for ~1% false positive rate.
f := gloom.NewWithParams(1000, 7)
f.AddString("custom")
fmt.Println("Contains 'custom':", f.TestString("custom"))
fmt.Printf("Blocks: %d, K: %d\n", f.NumBlocks(), f.K())
}
Output: Contains 'custom': true Blocks: 1000, K: 7
Example (Sharded) ¶
This example shows ShardedAtomicFilter for high-throughput concurrent writes.
package main
import (
"fmt"
"sync"
"github.com/jcalabro/gloom"
)
func main() {
// ShardedAtomicFilter distributes writes across shards to reduce contention.
// Use NewShardedAtomicDefault for auto-tuned shard count (based on GOMAXPROCS).
f := gloom.NewShardedAtomicDefault(1_000_000, 0.01)
var wg sync.WaitGroup
// High-throughput parallel writes
for i := range 8 {
wg.Add(1)
go func(id int) {
defer wg.Done()
for j := range 10_000 {
f.AddString(fmt.Sprintf("key-%d-%d", id, j))
}
}(i)
}
wg.Wait()
fmt.Println("Total items:", f.Count())
}
Output: Total items: 80000
Example (Statistics) ¶
This example shows how to monitor filter statistics.
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
f := gloom.New(10_000, 0.01)
// Add some items
for i := range 5000 {
f.Add(fmt.Appendf(nil, "item-%d", i))
}
fmt.Printf("Capacity: %d bits\n", f.Cap())
fmt.Printf("Hash functions (k): %d\n", f.K())
fmt.Printf("Items added: %d\n", f.Count())
fmt.Printf("Fill ratio: %.1f%%\n", f.EstimatedFillRatio()*100)
}
Output: Capacity: 96256 bits Hash functions (k): 7 Items added: 5000 Fill ratio: 30.4%
Example (StringKeys) ¶
This example shows how to use string keys without allocation overhead.
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
f := gloom.New(10_000, 0.01)
// AddString and TestString avoid allocating when you have string keys
f.AddString("user:12345")
f.AddString("user:67890")
fmt.Println("user:12345 exists:", f.TestString("user:12345"))
fmt.Println("user:99999 exists:", f.TestString("user:99999"))
}
Output: user:12345 exists: true user:99999 exists: false
Index ¶
- Constants
- Variables
- func ComputeOffsets(primes []uint32) []uint32
- func EstimateFalsePositiveRate(numBlocks uint64, k uint32, itemsAdded uint64) float64
- func GetPrimePartition(k uint32) []uint32
- func OptimalParams(expectedItems uint64, fpRate float64) (numBlocks uint64, k uint32, bitsPerItem float64)
- type AtomicFilter
- func (f *AtomicFilter) Add(data []byte)
- func (f *AtomicFilter) AddString(s string)
- func (f *AtomicFilter) Cap() uint64
- func (f *AtomicFilter) Count() uint64
- func (f *AtomicFilter) EstimatedFalsePositiveRate() float64
- func (f *AtomicFilter) EstimatedFillRatio() float64
- func (f *AtomicFilter) K() uint32
- func (f *AtomicFilter) NumBlocks() uint64
- func (f *AtomicFilter) Test(data []byte) bool
- func (f *AtomicFilter) TestString(s string) bool
- type Filter
- func (f *Filter) Add(data []byte)
- func (f *Filter) AddString(s string)
- func (f *Filter) Cap() uint64
- func (f *Filter) Count() uint64
- func (f *Filter) EstimatedFalsePositiveRate() float64
- func (f *Filter) EstimatedFillRatio() float64
- func (f *Filter) K() uint32
- func (f *Filter) MarshalBinary() ([]byte, error)
- func (f *Filter) NumBlocks() uint64
- func (f *Filter) Test(data []byte) bool
- func (f *Filter) TestString(s string) bool
- type ShardedAtomicFilter
- func (f *ShardedAtomicFilter) Add(data []byte)
- func (f *ShardedAtomicFilter) AddString(s string)
- func (f *ShardedAtomicFilter) Cap() uint64
- func (f *ShardedAtomicFilter) Count() uint64
- func (f *ShardedAtomicFilter) EstimatedFalsePositiveRate() float64
- func (f *ShardedAtomicFilter) EstimatedFillRatio() float64
- func (f *ShardedAtomicFilter) K() uint32
- func (f *ShardedAtomicFilter) NumBlocks() uint64
- func (f *ShardedAtomicFilter) NumShards() uint64
- func (f *ShardedAtomicFilter) Test(data []byte) bool
- func (f *ShardedAtomicFilter) TestString(s string) bool
Examples ¶
Constants ¶
const ( // BlockBits is the number of bits per block (cache line size). BlockBits = 512 // BlockWords is the number of uint64s per block. BlockWords = BlockBits / 64 // 8 )
Variables ¶
var ( // ErrInvalidData is returned when the serialized data is invalid or corrupted. ErrInvalidData = errors.New("gloom: invalid serialized data") // ErrUnsupportedVersion is returned when the serialization version is not supported. ErrUnsupportedVersion = errors.New("gloom: unsupported serialization version") // ErrInvalidK is returned when k value in serialized data is not supported. ErrInvalidK = errors.New("gloom: invalid k value in serialized data") )
Functions ¶
func ComputeOffsets ¶
ComputeOffsets computes the cumulative bit offsets for each partition. offset[i] = sum of primes[0..i-1]
func EstimateFalsePositiveRate ¶
EstimateFalsePositiveRate estimates the false positive rate for given parameters.
For a cache-line blocked bloom filter, items are distributed across blocks following a Poisson distribution (balls-into-bins). Some blocks receive more items than average, increasing their local FP rate. This function computes the expected per-block FP rate over this Poisson distribution:
FP = E[(1 - e^(-k*J/s))^k] where J ~ Poisson(n/B), s = BlockBits
This gives a more accurate estimate than the standard formula (1 - e^(-kn/m))^k, which assumes uniform bit placement and underestimates the FP rate of blocked filters.
Example ¶
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
// Estimate false positive rate for given parameters
numBlocks := uint64(1000)
k := uint32(7)
itemsAdded := uint64(50000)
rate := gloom.EstimateFalsePositiveRate(numBlocks, k, itemsAdded)
fmt.Printf("Estimated FP rate: %.2f%%\n", rate*100)
}
Output: Estimated FP rate: 0.86%
func GetPrimePartition ¶
GetPrimePartition returns the prime partition for the given k value. Returns nil if k is not supported.
func OptimalParams ¶
func OptimalParams(expectedItems uint64, fpRate float64) (numBlocks uint64, k uint32, bitsPerItem float64)
OptimalParams calculates the optimal bloom filter parameters. Returns the number of blocks, number of hash functions (k), and bits per item.
Example ¶
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
// Calculate optimal parameters for your use case
blocks, k, bitsPerItem := gloom.OptimalParams(1_000_000, 0.01)
fmt.Printf("For 1M items at 1%% FP rate:\n")
fmt.Printf(" Blocks: %d\n", blocks)
fmt.Printf(" Hash functions (k): %d\n", k)
fmt.Printf(" Bits per item: %.1f\n", bitsPerItem)
}
Output: For 1M items at 1% FP rate: Blocks: 18721 Hash functions (k): 7 Bits per item: 9.6
Types ¶
type AtomicFilter ¶
type AtomicFilter struct {
// contains filtered or unexported fields
}
AtomicFilter is a thread-safe bloom filter using atomic operations. It uses the same cache-line blocked one-hashing technique as Filter but with atomic.Uint64 for concurrent access.
func NewAtomic ¶
func NewAtomic(expectedItems uint64, fpRate float64) *AtomicFilter
NewAtomic creates a new thread-safe bloom filter optimized for the expected number of items and desired false positive rate.
Example ¶
package main
import (
"fmt"
"sync"
"github.com/jcalabro/gloom"
)
func main() {
// Create a thread-safe filter for concurrent access.
f := gloom.NewAtomic(100_000, 0.01)
// Safe to call from multiple goroutines
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
f.AddString("from-goroutine-1")
}()
go func() {
defer wg.Done()
f.AddString("from-goroutine-2")
}()
wg.Wait()
fmt.Println("Count:", f.Count())
}
Output: Count: 2
func NewAtomicWithParams ¶
func NewAtomicWithParams(numBlocks uint64, k uint32) *AtomicFilter
NewAtomicWithParams creates a new thread-safe bloom filter with explicit parameters.
func (*AtomicFilter) Add ¶
func (f *AtomicFilter) Add(data []byte)
Add adds data to the bloom filter atomically.
func (*AtomicFilter) AddString ¶
func (f *AtomicFilter) AddString(s string)
AddString adds a string to the bloom filter atomically without allocating.
func (*AtomicFilter) Cap ¶
func (f *AtomicFilter) Cap() uint64
Cap returns the capacity of the filter in bits.
func (*AtomicFilter) Count ¶
func (f *AtomicFilter) Count() uint64
Count returns the approximate number of items added to the filter.
func (*AtomicFilter) EstimatedFalsePositiveRate ¶
func (f *AtomicFilter) EstimatedFalsePositiveRate() float64
EstimatedFalsePositiveRate estimates the current false positive rate.
func (*AtomicFilter) EstimatedFillRatio ¶
func (f *AtomicFilter) EstimatedFillRatio() float64
EstimatedFillRatio estimates the proportion of bits that are set.
func (*AtomicFilter) K ¶
func (f *AtomicFilter) K() uint32
K returns the number of hash functions (partitions) used.
func (*AtomicFilter) NumBlocks ¶
func (f *AtomicFilter) NumBlocks() uint64
NumBlocks returns the number of 512-bit blocks in the filter.
func (*AtomicFilter) Test ¶
func (f *AtomicFilter) Test(data []byte) bool
Test checks if data might be in the bloom filter. This operation is safe to call concurrently with Add.
func (*AtomicFilter) TestString ¶
func (f *AtomicFilter) TestString(s string) bool
TestString checks if a string might be in the bloom filter.
type Filter ¶
type Filter struct {
// contains filtered or unexported fields
}
Filter is a non-thread-safe bloom filter using cache-line blocked one-hashing for optimal performance.
The filter divides memory into 512-bit (64-byte) blocks that fit in a single CPU cache line. Each block is partitioned into k segments using distinct prime sizes, enabling the one-hashing technique where a single hash value generates k independent bit positions via modulo operations.
func New ¶
New creates a new bloom filter optimized for the expected number of items and desired false positive rate.
Example ¶
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
// Create a filter sized for 1 million items with 1% false positive rate.
// The filter automatically calculates optimal size and hash functions.
f := gloom.New(1_000_000, 0.01)
f.Add([]byte("hello"))
fmt.Println(f.Test([]byte("hello")))
}
Output: true
func NewWithParams ¶
NewWithParams creates a new bloom filter with explicit parameters. numBlocks is the number of 512-bit blocks, k is the number of hash functions.
func UnmarshalBinary ¶ added in v0.0.2
UnmarshalBinary deserializes a bloom filter from a byte slice. Returns an error if the data is invalid or corrupted.
func (*Filter) EstimatedFalsePositiveRate ¶
EstimatedFalsePositiveRate estimates the current false positive rate based on the number of items added.
func (*Filter) EstimatedFillRatio ¶
EstimatedFillRatio estimates the proportion of bits that are set.
func (*Filter) MarshalBinary ¶ added in v0.0.2
MarshalBinary serializes the bloom filter to a byte slice. The serialized format is:
- Version (1 byte): serialization format version
- K (4 bytes): number of hash functions (little-endian uint32)
- NumBlocks (8 bytes): number of 512-bit blocks (little-endian uint64)
- Count (8 bytes): number of items added (little-endian uint64)
- Blocks (numBlocks * 64 bytes): the bit array data (little-endian uint64s)
The primes and offsets are not serialized as they can be derived from k.
func (*Filter) Test ¶
Test checks if data might be in the bloom filter. Returns true if the data might be present (with false positive probability), or false if the data is definitely not present.
func (*Filter) TestString ¶
TestString checks if a string might be in the bloom filter without allocating.
type ShardedAtomicFilter ¶
type ShardedAtomicFilter struct {
// contains filtered or unexported fields
}
ShardedAtomicFilter is a thread-safe bloom filter that distributes writes across multiple shards to reduce contention under parallel workloads. Each shard is an independent AtomicFilter, and keys are consistently routed to shards based on their hash.
func NewShardedAtomic ¶
func NewShardedAtomic(expectedItems uint64, fpRate float64, numShards uint64) *ShardedAtomicFilter
NewShardedAtomic creates a new sharded thread-safe bloom filter. numShards must be a power of 2 (will be rounded up if not). The total capacity is distributed evenly across shards.
Example ¶
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
// Create a sharded filter with explicit shard count.
// Use powers of 2 for optimal performance.
f := gloom.NewShardedAtomic(100_000, 0.01, 16)
f.AddString("key1")
fmt.Println("Shards:", f.NumShards())
}
Output: Shards: 16
func NewShardedAtomicDefault ¶
func NewShardedAtomicDefault(expectedItems uint64, fpRate float64) *ShardedAtomicFilter
NewShardedAtomicDefault creates a sharded filter with a number of shards automatically tuned to the current GOMAXPROCS value. This provides good parallel performance without over-sharding on smaller machines.
Example ¶
package main
import (
"fmt"
"github.com/jcalabro/gloom"
)
func main() {
// Create a sharded filter with auto-tuned shard count.
// Shard count is set to GOMAXPROCS (minimum 4).
f := gloom.NewShardedAtomicDefault(100_000, 0.01)
f.AddString("auto-tuned")
fmt.Println("Has key:", f.TestString("auto-tuned"))
}
Output: Has key: true
func (*ShardedAtomicFilter) Add ¶
func (f *ShardedAtomicFilter) Add(data []byte)
Add adds data to the bloom filter.
func (*ShardedAtomicFilter) AddString ¶
func (f *ShardedAtomicFilter) AddString(s string)
AddString adds a string to the bloom filter without allocating.
func (*ShardedAtomicFilter) Cap ¶
func (f *ShardedAtomicFilter) Cap() uint64
Cap returns the total capacity of all shards in bits.
func (*ShardedAtomicFilter) Count ¶
func (f *ShardedAtomicFilter) Count() uint64
Count returns the approximate total number of items added.
func (*ShardedAtomicFilter) EstimatedFalsePositiveRate ¶
func (f *ShardedAtomicFilter) EstimatedFalsePositiveRate() float64
EstimatedFalsePositiveRate estimates the current false positive rate. For sharded filters, this is approximately the average across shards.
func (*ShardedAtomicFilter) EstimatedFillRatio ¶
func (f *ShardedAtomicFilter) EstimatedFillRatio() float64
EstimatedFillRatio estimates the average fill ratio across all shards.
func (*ShardedAtomicFilter) K ¶
func (f *ShardedAtomicFilter) K() uint32
K returns the number of hash functions used per shard.
func (*ShardedAtomicFilter) NumBlocks ¶
func (f *ShardedAtomicFilter) NumBlocks() uint64
NumBlocks returns the total number of blocks across all shards.
func (*ShardedAtomicFilter) NumShards ¶
func (f *ShardedAtomicFilter) NumShards() uint64
NumShards returns the number of shards.
func (*ShardedAtomicFilter) Test ¶
func (f *ShardedAtomicFilter) Test(data []byte) bool
Test checks if data might be in the bloom filter.
func (*ShardedAtomicFilter) TestString ¶
func (f *ShardedAtomicFilter) TestString(s string) bool
TestString checks if a string might be in the bloom filter.