hyperloglog

package module
v0.0.0-...-84253e5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 19, 2024 License: MIT Imports: 7 Imported by: 120

README

HyperLogLog - an algorithm for approximating the number of distinct elements

GoDoc Go Report Card CircleCI

An improved version of HyperLogLog for the count-distinct problem, approximating the number of distinct elements in a multiset using 33-50% less space than other usual HyperLogLog implementations.

This work is based on "Better with fewer bits: Improving the performance of cardinality estimation of large data streams - Qingjun Xiao, You Zhou, Shigang Chen".

Implementation

The core differences between this and other implementations are:

  • use metro hash instead of xxhash
  • sparse representation for lower cardinalities (like HyperLogLog++)
  • loglog-beta for dynamic bias correction medium and high cardinalities.
  • 4-bit register instead of 5 (HLL) and 6 (HLL++), but most implementations use 1-byte registers out of convenience

In general it borrows a lot from InfluxData's fork of Clark Duvall's HyperLogLog++ implementation, but uses 50% less space.

Results

A direct comparison with the HyperLogLog++ implementation used by InfluxDB yielded the following results:

Exact Axiom (8.2 KB) Influx (16.39 KB)
10 10 (0.0% off) 10 (0.0% off)
50 50 (0.0% off) 50 (0.0% off)
250 250 (0.0% off) 250 (0.0% off)
1250 1249 (0.08% off) 1249 (0.08% off)
6250 6250 (0.0% off) 6250 (0.0% off)
31250 31008 (0.7744% off) 31565 (1.0080% off)
156250 156013 (0.1517% off) 156652 (0.2573% off)
781250 782364 (0.1426% off) 775988 (0.6735% off)
3906250 3869332 (0.9451% off) 3889909 (0.4183% off)
10000000 9952682 (0.4732% off) 9889556 (1.1044% off)

Note

A big thank you to Prof. Shigang Chen and his team at the University of Florida who are actively conducting research around "Big Network Data".

Contributing

Kindly check our contributing guide on how to propose bugfixes and improvements, and submitting pull requests to the project

License

© Axiom, Inc., 2024

Distributed under MIT License (The MIT License).

See LICENSE for more information.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrorTooShort = errors.New("too short binary")

ErrorTooShort is an error that UnmarshalBinary try to parse too short binary.

Functions

This section is empty.

Types

type Sketch

type Sketch struct {
	// contains filtered or unexported fields
}

Sketch is a HyperLogLog data-structure for the count-distinct problem, approximating the number of distinct elements in a multiset.

func New

func New() *Sketch

New returns a HyperLogLog Sketch with 2^14 registers (precision 14)

func New14

func New14() *Sketch

New14 returns a HyperLogLog Sketch with 2^14 registers (precision 14)

func New16

func New16() *Sketch

New16 returns a HyperLogLog Sketch with 2^16 registers (precision 16)

func New16NoSparse

func New16NoSparse() *Sketch

New16NoSparse returns a HyperLogLog Sketch with 2^16 registers (precision 16) that will not use a sparse representation

func NewNoSparse

func NewNoSparse() *Sketch

NewNoSparse returns a HyperLogLog Sketch with 2^14 registers (precision 14) that will not use a sparse representation

func (*Sketch) Clone

func (sk *Sketch) Clone() *Sketch

Clone returns a deep copy of sk.

func (*Sketch) Estimate

func (sk *Sketch) Estimate() uint64

Estimate returns the cardinality of the Sketch

func (*Sketch) Insert

func (sk *Sketch) Insert(e []byte) bool

Insert adds element e to sketch

func (*Sketch) InsertHash

func (sk *Sketch) InsertHash(x uint64) bool

InsertHash adds hash x to sketch

func (*Sketch) MarshalBinary

func (sk *Sketch) MarshalBinary() (data []byte, err error)

MarshalBinary implements the encoding.BinaryMarshaler interface.

func (*Sketch) Merge

func (sk *Sketch) Merge(other *Sketch) error

Merge takes another Sketch and combines it with Sketch h. If Sketch h is using the sparse Sketch, it will be converted to the normal Sketch.

func (*Sketch) UnmarshalBinary

func (sk *Sketch) UnmarshalBinary(data []byte) error

UnmarshalBinary implements the encoding.BinaryUnmarshaler interface.

Directories

Path Synopsis
demo module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL