README

cfilter: Cuckoo Filter implementation in Go

GoDoc Build Status Go Report Card

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Some possible use-cases that depend on approximated set-membership queries would be databases, caches, routers, and storage systems where it is used to decide if a given item is in a (usually large) set, with some small false positive probability. Alternatively, given it is designed to be a viable replacement to Bloom filters, it can also be used to reduce the space required in probabilistic routing tables, speed longest-prefix matching for IP addresses, improve network state management and monitoring, and encode multicast forwarding information in packets, among many other applications.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please refer to the original research paper, "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky.

Interface

A cuckoo filter supports following operations:

  • Insert(item): insert an item to the filter
  • Lookup(item): return if item is already in the filter (may return false positive results like Bloom filters)
  • Delete(item): delete the given item from the filter. Note that to use this method, it must be ensured that this item is in the filter (e.g., based on records on external storage); otherwise, a false item may be deleted.
  • Count(): return the total number of items currently in the filter

Example Usage

import "github.com/irfansharif/cfilter"

cf := cfilter.New()

// inserts 'buongiorno' to the filter
cf.Insert([]byte("buongiorno"))

// looks up 'hola' in the filter, may return false positive
cf.Lookup([]byte("hola"))

// returns 1 (given only 'buongiorno' was added)
cf.Count()

// tries deleting 'bonjour' from filter, may delete another element
// this could occur when another byte slice with the same fingerprint
// as another is 'deleted'
cf.Delete([]byte("bonjour"))

This repository was featured on Hacker News, front page (discussion here). Another implementation in Go can be found at seiflotfy/cuckoofilter and is where I borrowed the ideas for my tests, notably TestMultipleInsertions. The original implementation in C++ by the authors of the research paper can be found at efficient/cuckoofilter.

Author

Irfan Sharif: irfanmahmoudsharif@gmail.com, @irfansharifm

License

cfilter source code is available under the MIT License.

Documentation

Overview

Package cfilter is an implementation of the Cuckoo filter, a Bloom filter replacement for approximated set-membership queries. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters.

As documented in the original implementation:

Cuckoo filters provide the flexibility to add and remove items dynamically. A
cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo
filter). It is essentially a cuckoo hash table storing each key's fingerprint.
Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less
space than conventional Bloom filters, for applications that require low false
positive rates (< 3%).

For details about the algorithm and citations please refer to the original research paper, "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BucketSize

func BucketSize(s uint8) option

BucketSize sets the size of each bucket in the filter. Defaults to 4.

func FingerprintSize

func FingerprintSize(s uint8) option

FingerprintSize sets the size of the fingerprint. Defaults to 2.

func HashFn

func HashFn(hashfn hash.Hash) option

HashFn sets the hashing function to be used for fingerprinting. Defaults to a 64-bit FNV-1 hash.Hash.

func MaximumKicks

func MaximumKicks(k uint) option

MaximumKicks sets the maximum number of times we kick down items/displace from their buckets. Defaults to 500.

func Size

func Size(s uint) option

Size sets the number of buckets in the filter. Defaults to ((1 << 18) / BucketSize).

Types

type CFilter

type CFilter struct {
	// contains filtered or unexported fields
}

CFilter represents a Cuckoo Filter, a probabilistic data store for approximated set membership queries.

func New

func New(opts ...option) *CFilter

New returns a new CFilter object. It's Insert, Lookup, Delete and Size behave as their names suggest. Takes zero or more of the following option functions and applies them in order to the Filter:

- cfilter.Size(uint) sets the number of buckets in the filter
- cfilter.BucketSize(uint8) sets the size of each bucket
- cfilter.FingerprintSize(uint8) sets the size of the fingerprint
- cfilter.MaximumKicks(uint) sets the maximum number of bucket kicks
- cfilter.HashFn(hash.Hash) sets the fingerprinting hashing function

func (*CFilter) Count

func (cf *CFilter) Count() uint

Count returns the total number of elements currently in the Cuckoo Filter.

func (*CFilter) Delete

func (cf *CFilter) Delete(item []byte) bool

Delete removes an element (in byte-array form) from the Cuckoo Filter, returns true if element existed prior and false otherwise.

func (*CFilter) Insert

func (cf *CFilter) Insert(item []byte) bool

Insert adds an element (in byte-array form) to the Cuckoo filter, returns true if successful and false otherwise.

func (*CFilter) Lookup

func (cf *CFilter) Lookup(item []byte) bool

Lookup checks if an element (in byte-array form) exists in the Cuckoo Filter, returns true if found and false otherwise.