gohistogram

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 22, 2016 License: MIT Imports: 1 Imported by: 77

README

gohistogram - Histograms in Go

build status

This package provides Streaming Approximate Histograms for efficient quantile approximations.

The histograms in this package are based on the algorithms found in Ben-Haim & Yom-Tov's A Streaming Parallel Decision Tree Algorithm (PDF). Histogram bins do not have a preset size. As values stream into the histogram, bins are dynamically added and merged.

Another implementation can be found in the Apache Hive project (see NumericHistogram).

An example:

histogram

The accurate method of calculating quantiles (like percentiles) requires data to be sorted. Streaming histograms make it possible to approximate quantiles without sorting (or even individually storing) values.

NumericHistogram is the more basic implementation of a streaming histogram. WeightedHistogram implements bin values as exponentially-weighted moving averages.

A maximum bin size is passed as an argument to the constructor methods. A larger bin size yields more accurate approximations at the cost of increased memory utilization and performance.

A picture of kittens:

stack of kittens

Getting started

Using in your own code
$ go get github.com/VividCortex/gohistogram
import "github.com/VividCortex/gohistogram"
Running tests and making modifications

Get the code into your workspace:

$ cd $GOPATH
$ git clone git@github.com:VividCortex/gohistogram.git ./src/github.com/VividCortex/gohistogram

You can run the tests now:

$ cd src/github.com/VividCortex/gohistogram
$ go test .

API Documentation

Full source documentation can be found here.

Contributing

We only accept pull requests for minor fixes or improvements. This includes:

  • Small bug fixes
  • Typos
  • Documentation or comments

Please open issues to discuss new features. Pull requests for new features will be rejected, so we recommend forking the repository and making changes in your fork for your use case.

License

Copyright (c) 2013 VividCortex

Released under MIT License. Check LICENSE file for details.

Documentation

Overview

Package gohistogram contains implementations of weighted and exponential histograms.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Histogram

type Histogram interface {
	// Add adds a new value, n, to the histogram. Trimming is done
	// automatically.
	Add(n float64)

	// Quantile returns an approximation.
	Quantile(n float64) (q float64)

	// String returns a string reprentation of the histogram,
	// which is useful for printing to a terminal.
	String() (str string)
}

Histogram is the interface that wraps the Add and Quantile methods.

type NumericHistogram

type NumericHistogram struct {
	// contains filtered or unexported fields
}

func NewHistogram

func NewHistogram(n int) *NumericHistogram

NewHistogram returns a new NumericHistogram with a maximum of n bins.

There is no "optimal" bin count, but somewhere between 20 and 80 bins should be sufficient.

func (*NumericHistogram) Add

func (h *NumericHistogram) Add(n float64)

func (*NumericHistogram) CDF

func (h *NumericHistogram) CDF(x float64) float64

CDF returns the value of the cumulative distribution function at x

func (*NumericHistogram) Count

func (h *NumericHistogram) Count() float64

func (*NumericHistogram) Mean

func (h *NumericHistogram) Mean() float64

Mean returns the sample mean of the distribution

func (*NumericHistogram) Quantile

func (h *NumericHistogram) Quantile(q float64) float64

func (*NumericHistogram) String

func (h *NumericHistogram) String() (str string)

String returns a string reprentation of the histogram, which is useful for printing to a terminal.

func (*NumericHistogram) Variance

func (h *NumericHistogram) Variance() float64

Variance returns the variance of the distribution

type WeightedHistogram

type WeightedHistogram struct {
	// contains filtered or unexported fields
}

A WeightedHistogram implements Histogram. A WeightedHistogram has bins that have values which are exponentially weighted moving averages. This allows you keep inserting large amounts of data into the histogram and approximate quantiles with recency factored in.

func NewWeightedHistogram

func NewWeightedHistogram(n int, alpha float64) *WeightedHistogram

NewWeightedHistogram returns a new WeightedHistogram with a maximum of n bins with a decay factor of alpha.

There is no "optimal" bin count, but somewhere between 20 and 80 bins should be sufficient.

Alpha should be set to 2 / (N+1), where N represents the average age of the moving window. For example, a 60-second window with an average age of 30 seconds would yield an alpha of 0.064516129.

func (*WeightedHistogram) Add

func (h *WeightedHistogram) Add(n float64)

func (*WeightedHistogram) CDF

func (h *WeightedHistogram) CDF(x float64) float64

CDF returns the value of the cumulative distribution function at x

func (*WeightedHistogram) Count

func (h *WeightedHistogram) Count() float64

func (*WeightedHistogram) Mean

func (h *WeightedHistogram) Mean() float64

Mean returns the sample mean of the distribution

func (*WeightedHistogram) Quantile

func (h *WeightedHistogram) Quantile(q float64) float64

func (*WeightedHistogram) String

func (h *WeightedHistogram) String() (str string)

String returns a string reprentation of the histogram, which is useful for printing to a terminal.

func (*WeightedHistogram) Variance

func (h *WeightedHistogram) Variance() float64

Variance returns the variance of the distribution

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL