tdigest

package module
v1.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2017 License: MIT Imports: 7 Imported by: 0

README

T-Digest

A fast map-reduce and parallel streaming friendly data-structure for accurate quantile approximation.

This package provides an implementation of Ted Dunning's t-digest data structure in Go.

Build Status GoDoc Go Report Card

Project Status

This project is activelly maintained. There hasn't been a need for new features or bugs in a while, so there isn't much activity but I'm more than willing to cooperate on bugs and/or requests.

Installation

The current stable version can be installed via:

go get gopkg.in/caio/go-tdigest.v1

Or if you're planning on contributing or just feeling adventurous:

go get github.com/caio/go-tdigest

Usage

package main

import (
        "fmt"
        "math/rand"

        "gopkg.in/caio/go-tdigest.v1"
        // Or if you installed the development version:
        // "github.com/caio/go-tdigest"
)

func main() {
        var t = tdigest.New(100)

        for i := 0; i < 10000; i++ {
                t.Add(rand.Float64(), 1)
        }

        fmt.Printf("p(.5) = %.6f\n", t.Quantile(0.5))
}

References

This is a port of the reference implementation with some ideas borrowed from the python version. If you wanna get a quick grasp of how it works and why it's useful, this video and companion article is pretty helpful.

Documentation

Overview

Package tdigest provides a highly accurate mergeable data-structure for quantile estimation.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type TDigest

type TDigest struct {
	// contains filtered or unexported fields
}

TDigest is a quantile approximation data structure. Typical T-Digest use cases involve accumulating metrics on several distinct nodes of a cluster and then merging them together to get a system-wide quantile overview. Things such as: sensory data from IoT devices, quantiles over enormous document datasets (think ElasticSearch), performance metrics for distributed systems, etc.

func FromBytes

func FromBytes(buf *bytes.Reader) (*TDigest, error)

FromBytes reads a byte buffer with a serialized digest (from AsBytes) and deserializes it.

func New

func New(compression float64) *TDigest

New creates a new digest. The compression parameter rules the threshold in which samples are merged together - the more often distinct samples are merged the more precision is lost. Compression should be tuned according to your data distribution, but a value of 100 is often good enough. A higher compression value means holding more centroids in memory (thus: better precision), which means a bigger serialization payload and higher memory footprint. Compression must be a value greater of equal to 1, will panic otherwise.

func (*TDigest) Add

func (t *TDigest) Add(value float64, count uint32) (err error)

Add registers a new sample in the digest. It's the main entry point for the digest and very likely the only method to be used for collecting samples. The count parameter is for when you are registering a sample that occurred multiple times - the most common value for this is 1.

func (TDigest) AsBytes

func (t TDigest) AsBytes() ([]byte, error)

AsBytes serializes the digest into a byte array so it can be saved to disk or sent over the wire.

func (*TDigest) Compress

func (t *TDigest) Compress() error

Compress tries to reduce the number of individual centroids stored in the digest. Compression trades off accuracy for performance and happens automatically after a certain amount of distinct samples have been stored.

func (*TDigest) ForEachCentroid added in v1.1.0

func (t *TDigest) ForEachCentroid(f func(mean float64, count uint32) bool)

ForEachCentroid calls the specified function for each centroid. Iteration stops when the supplied function returns false, or when all centroids have been iterated.

func (*TDigest) Len added in v1.1.0

func (t *TDigest) Len() int

Len returns the number of centroids in the TDigest.

func (*TDigest) Merge

func (t *TDigest) Merge(other *TDigest) error

Merge joins a given digest into itself. Merging is useful when you have multiple TDigest instances running in separate threads and you want to compute quantiles over all the samples. This is particularly important on a scatter-gather/map-reduce scenario.

func (*TDigest) Quantile

func (t *TDigest) Quantile(q float64) float64

Quantile returns the desired percentile estimation. Values of p must be between 0 and 1 (inclusive), will panic otherwise.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL