bloom

package module

v0.2.4 Latest Latest Go to latest Published: Jun 29, 2021 License: BSD-3-Clause Imports: 11 Imported by: 7

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/DCSO/bloom

Links

Open Source Insights

README ¶

Bloom

A highly efficient bloom filter implementation for Go

Bloom is a simple tool that provides a very efficient implementation of Bloom filters for the go language. It provides a command line tool that can be used to easily create Bloom filters with desired capacity and false positive probability. Values can be added to filters through standard input, which makes it easy to use the tool in a pipeline workflow.

Usage

NAME:
   Bloom Filter - Utility to work with bloom filters

USAGE:
   bloom [global options] command [command options] [arguments...]

VERSION:
   0.2.2

COMMANDS:
     create, cr         Create a new Bloom filter and store it in the given filename.
     insert, i          Inserts new values into an existing Bloom filter.
     join, j, merge, m  Joins two Bloom filters into one.
     check, c           Checks values against an existing Bloom filter.
     set-data, sd       Sets the data associated with the Bloom filter.
     get-data, gd       Prints the data associated with the Bloom filter.
     show, s            Shows various details about a given Bloom filter.
     help, h            Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --gzip, --gz                      compress bloom file with gzip
   --interactive, -i                 interactively add values to the filter
   --split, -s                       split the input string
   --each, -e                        print each match of a split string individually
   --delimiter value, -d value       delimiter to use for splitting (default: ",")
   --fields value, -f value          fields of split output to use in filter (a single number or a comma-separated list of numbers, zero-indexed)
   --print-fields value, --pf value  fields of split output to print for a successful match (a single number or a comma-separated list of numbers, zero-indexed).
   --help, -h                        show help
   --version, -v                     print the version

Examples

To create a new bloom filter with a desired capacity and false positive probability, you can use the create command:

#will create a gzipped Bloom filter with 100.000 capacity and a 0.1 % false positive probability
bloom --gzip create -p 0.001 -n 100000 test.bloom.gz

To insert values, you can use the insert command and pipe some input to it (each line will be treated as one value):

cat values | bloom --gzip insert test.bloom.gz

You can also interactively add values to the filter by specifying the --interactive command line option:

bloom --gzip --interactive insert test.bloom.gz

To check if a given value or a list of values is in the filter, you can use the check command:

cat values | bloom --gzip check test.bloom.gz

This will return a list of all values in the filter.

Advanced Usage

Sometimes it is useful to attach additional information to a string that we want to check against the Bloom filter, such as a timestamp or the original line content. To make passing along this additional information easier within a shell context, the Bloom tool provides an option for splitting the input string by a given delimiter and checking the filter against the resulting field values. Example:

# will check the Bloom filter for the values foo, bar and baz
cat "foo,bar,baz" | bloom -s filter.bloom

# uses a different delimiter (--magic-delimiter--)
cat "foo--ac5ba--bar--ac5ba--baz" | bloom  -d "--ac5ba--" -s filter.bloom

# will check the Bloom filter against the second field value only
cat "foo,bar,baz" | bloom -f 1 -s filter.bloom

# will check the Bloom filter against the second and third field values only
cat "foo,bar,baz" | bloom -f 1,2 -s filter.bloom

# will print one line for each field value that matched against the filter
cat "foo,bar,baz" | bloom -e -s filter.bloom

# will print the last field value for each line whose fields matched against the filter
cat "foo,bar,baz" | bloom -e -s --pf -1 filter.bloom

This functionality is especially handy when using CSV data, as it allows you to filter CSV rows by checking individual columns against the filter without having to use external tools to split and reassemble the lines.

Installation

Installation on Debian-based systems

Debian command line tool:

sudo apt install golang-github-dcso-bloom-cli

Installation via `go get`:

go get github.com/DCSO/bloom/...

Installation from source

These need to be run from within the GOPATH source directory for this project (e.g. $GOPATH/src/github.com/DCSO/bloom). To install the command line tool:

make install

To run the tests:

make test

To run the benchmarks:

make bench

Cross-Compiling

To compile a binary, simply specify the target architecture and go:

#Windows, 64 bit
env GOOS=windows GOARCH=amd64 go build -v -o bloom.exe github.com/DCSO/bloom
#Windows, 32 bit
env GOOS=windows GOARCH=i386 go build -v -o /tmp/bloom github.com/DCSO/bloom

Documentation ¶

Index ¶

func WriteFilter(filter *BloomFilter, path string, gzip bool) error
type BloomFilter

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func WriteFilter ¶

func WriteFilter(filter *BloomFilter, path string, gzip bool) error

WriteFilter writes a binary Bloom filter representation for a given struct to a file. If 'gzip' is true, then a compressed file will be written.

Types ¶

type BloomFilter ¶

type BloomFilter struct {

	//number of elements in the filter
	N uint64

	//number of 64-bit integers (generated automatically)
	M uint64

	//arbitrary data that we can attach to the filter
	Data []byte
	// contains filtered or unexported fields
}

BloomFilter represents a Bloom filter, a data structure for quickly checking for set membership, with a specific desired capacity and false positive probability.

func Initialize ¶

func Initialize(n uint64, p float64) BloomFilter

Initialize returns a new, empty Bloom filter with the given capacity (n) and FP probability (p).

func LoadFilter ¶

func LoadFilter(path string, gzip bool) (*BloomFilter, error)

LoadFilter reads a binary Bloom filter representation from a file and returns a BloomFilter struct pointer based on it. If 'gzip' is true, then compressed input will be expected.

func LoadFromBytes ¶ added in v0.2.0

func LoadFromBytes(input []byte, gzip bool) (*BloomFilter, error)

LoadFromBytes reads a binary Bloom filter representation from a byte array and returns a BloomFilter struct pointer based on it. If 'gzip' is true, then compressed input will be expected.

func LoadFromReader ¶ added in v0.2.0

func LoadFromReader(inReader io.Reader, gzip bool) (*BloomFilter, error)

LoadFromReader reads a binary Bloom filter representation from an io.Reader and returns a BloomFilter struct pointer based on it. If 'gzip' is true, then compressed input will be expected.

func (*BloomFilter) Add ¶

func (s *BloomFilter) Add(value []byte)

Add adds a byte array element to the Bloom filter.

func (*BloomFilter) Check ¶

func (s *BloomFilter) Check(value []byte) bool

Check returns true if the given value may be in the Bloom filter, false if it is definitely not in it.

func (*BloomFilter) CheckFingerprint ¶

func (s *BloomFilter) CheckFingerprint(fingerprint []uint64) bool

CheckFingerprint returns true if the given fingerprint occurs in the Bloom filter, false if it does not.

func (*BloomFilter) FalsePositiveProb ¶ added in v0.2.0

func (s *BloomFilter) FalsePositiveProb() float64

FalsePositiveProb returns the chosen false positive probability for the Bloom filter.

func (*BloomFilter) Fingerprint ¶

func (s *BloomFilter) Fingerprint(value []byte, fingerprint []uint64)

Fingerprint returns the fingerprint of a given value, as an array of index values.

func (*BloomFilter) Join ¶ added in v0.2.0

func (s *BloomFilter) Join(s2 *BloomFilter) error

Join adds the items of another Bloom filter with identical dimensions to the receiver. That is, all elements that are described in the second filter will also described by the receiver, and the number of elements of the receiver will grow by the number of elements in the added filter. Note that it is implicitly assumed that both filters are disjoint! Otherwise the number of elements in the joined filter must _only_ be considered an upper bound and not an exact value! Joining two differently dimensioned filters may yield unexpected results and hence is not allowed. An error will be returned in this case, and the receiver will be left unaltered.

func (*BloomFilter) MaxNumElements ¶ added in v0.2.0

func (s *BloomFilter) MaxNumElements() uint64

MaxNumElements returns the maximal supported number of elements in the Bloom filter (capacity).

func (*BloomFilter) NumBits ¶ added in v0.2.0

func (s *BloomFilter) NumBits() uint64

NumBits returns the number of bits used in the Bloom filter.

func (*BloomFilter) NumHashFuncs ¶ added in v0.2.0

func (s *BloomFilter) NumHashFuncs() uint64

NumHashFuncs returns the number of hash functions used in the Bloom filter.

func (*BloomFilter) Read ¶

func (s *BloomFilter) Read(input io.Reader) error

Read loads a filter from a reader object.

func (*BloomFilter) Reset ¶

func (s *BloomFilter) Reset()

Reset clears the Bloom filter of all elements.

func (*BloomFilter) Write ¶

func (s *BloomFilter) Write(output io.Writer) error

Write writes the binary representation of a Bloom filter to an io.Writer.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
bloom

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL