hllpp

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2017 License: BSD-3-Clause Imports: 7 Imported by: 22

README

hllpp

Build Status GoDoc

hllpp is an implementation of the HyperLogLog++ cardinality estimation algorithm in go. It optimizes for memory usage over CPU usage. It implements all the HyperLogLog optimizations introduced in the HyperLogLog++ paper (http://goo.gl/Z5Sqgu). Some notable features include:

  • marshaling so you can serialize to your datastore
  • extra space savings by only using 5 bits per register when possible
  • built-in non-streaming murmur3 implementation for fast hashing of input data

Usage

h := hllpp.New()

h.Add([]byte("barclay"))
h.Add([]byte("reginald"))
h.Add([]byte("barclay"))
h.Add([]byte("broccoli"))

fmt.Println(h.Count())
// Output: 3

See the godocs for documentation and more examples.

SEO

This package is a go or golang implementation of HyperLogLog or HyperLogLog++. It doesn't show up when you search for golang hyperloglog, so I am repeating the words golang hyperloglog in the README. golang hyperloglog.

Documentation

Overview

hllpp implements the HyperLogLog++ cardinality estimator as specified in the HyperLogLog++ paper http://goo.gl/Z5Sqgu. hllpp uses a built-in non-streaming implementation of murmur3 to hash data as you add it to the estimator.

Example
h := New()

h.Add([]byte("barclay"))
h.Add([]byte("reginald"))
h.Add([]byte("barclay"))
h.Add([]byte("broccoli"))

fmt.Println(h.Count())
Output:

3

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	// Precision (p). Must be in the range [4..16]. This value can be used
	// to adjust the typical relative error of the estimate. Space requirements
	// grow exponentially as this value is increased. Defaults to 14, the
	// recommended value, which gives an expected error of about 0.8%
	Precision uint8

	// Precision in sparse mode (p'). Must be in the range [p..25] for this
	// implementation. This value can be used to adjust the typical relative
	// error of the estimate when using the sparse representation (typically
	// for cardinalities below 8000 at p'=20). Lowering p' will allow the
	// estimator to remain in sparse mode longer, but will increase the relative
	// error. The HyperLogLog++ paper recommends 20 or 25. Defaults to 20 since
	// that still gives you a much lower error vs. p=14, but saves a signficant
	// amount of space vs. p'=25 (20-25% for cardinalities less than 5000).
	SparsePrecision uint8
}

Config is used to set configurable fields on a HyperLogLog++ via NewWithConfig.

type HLLPP

type HLLPP struct {
	// contains filtered or unexported fields
}

HLLPP represents a single HyperLogLog++ estimator. Create one via New(). It is not safe to interact with an HLLPP object from multiple goroutines at once.

func New

func New() *HLLPP

New creates a HyperLogLog++ estimator with p=14, p'=20.

func NewWithConfig

func NewWithConfig(c Config) (*HLLPP, error)

NewWithConfig creates a HyperLogLog++ estimator with the given Config.

Example
h, err := NewWithConfig(Config{
	Precision:       12,
	SparsePrecision: 14,
})
if err != nil {
	panic(err)
}

h.Add([]byte("qapla'"))
h.Add([]byte("qapla'"))

fmt.Println(h.Count())
Output:

1

func Unmarshal

func Unmarshal(data []byte) (*HLLPP, error)

Unmarshal deserializes a byte slice returned by Marshal back into an HLLPP object.

func (*HLLPP) Add

func (h *HLLPP) Add(v []byte)

Add will hash v and add the result to the HyperLogLog++ estimator h. hllpp uses a built-in non-streaming implementation of murmur3.

func (*HLLPP) Count

func (h *HLLPP) Count() uint64

Count returns the current cardinality estimate for h.

func (*HLLPP) Marshal

func (h *HLLPP) Marshal() []byte

Marshal serializes h into a byte slice that can be deserialized via Unmarshal. The data is naturally compressed, so don't bother trying to compress it any more.

Example
h := New()

h.Add([]byte("hobbledehoyhood"))

serialized := h.Marshal()
h, err := Unmarshal(serialized)
if err != nil {
	panic(err)
}

fmt.Println(h.Count())
Output:

1

func (*HLLPP) Merge

func (h *HLLPP) Merge(other *HLLPP) error

Merge turns h into the union of h and other. h and other must have the same p and p' values.

Example
h := New()
h.Add([]byte("picard"))
h.Add([]byte("janeway"))

other := New()
other.Add([]byte("picard"))
other.Add([]byte("kirk"))

err := h.Merge(other)
if err != nil {
	panic(err)
}

fmt.Println(h.Count())
Output:

3

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL