quantile

package module
v0.0.0-...-4246515 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 7, 2022 License: BSD-2-Clause Imports: 2 Imported by: 53

README

Streaming Quantile Estimator

Implements ideas found in Effective Computation of Biased Quantiles over Data Streams (Cormode, Korn, Muthukrishnan, Srivastava) to provide a space and time efficient estimator for streaming quantile estimation.

Build Status

Documentation

Overview

Package quantile implements a streaming quantile estimator. The implementation is based on "Effective Computation of Biased Quantiles over Data Streams" (Cormode, Korn, Muthukrishnan, Srivastava) to provide a space and time efficient estimator for online quantile estimation.

For the normal distribution of 10^9 elements, a tolerance for 0.99th percentile at 0.001 uses under 1000 bins at 32 bytes per bin.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Estimate

type Estimate interface {
	// Delta calculates the acceptable difference in ranks between two values.
	// It is used to remove redundant values during compression.
	Delta(rank, observations float64) float64
}

func Known

func Known(quantile, tolerance float64) Estimate

Known produces a optimal space usage for estimations at the given quantile and error tolerance.

Quantiles not known ahead of time can also be queried, but at a lower accuracy.

func Unknown

func Unknown(tolerance float64) Estimate

Unknown produces estimations for all possible quantiles at this error tolerance. It uses significantly more space and time than when you know the quantiles you wish to estimate.

The Known estimation should be used when you know which quantiles you will be querying.

type Estimator

type Estimator struct {
	// contains filtered or unexported fields
}
Example
package main

import (
	"fmt"
	"time"
)

var rpcs *Estimator

func observeSeconds(est *Estimator, begin time.Time) {
	est.Add(float64(time.Now().Sub(begin)) / float64(time.Second))
}

func Work() {
	defer observeSeconds(rpcs, time.Now())

	// Dance your cares away,
	// Worry's for another day.
	// Let the music play,
}

func main() {
	// We know we want to query the 95th and 99th, with the 95th a little less accurately.
	rpcs = New(Known(0.95, 0.005), Known(0.99, 0.001))

	Work()
	Work()

	// Report the percentiles
	fmt.Println("95th: ", rpcs.Get(0.95))
	fmt.Println("99th: ", rpcs.Get(0.99))
}
Output:

func New

func New(invariants ...Estimate) *Estimator

New allocates a new estimator tolerating the minimum of the invariants provided.

When you know how much error you can tolerate in the quantiles you will query, use a Known estimation for each quantile you will query. For example:

quantile.New(quantile.Known(0.50, 0.01), quantile.Known(0.95, 0.001), quantile.Known(0.99, 0.0005))

When you will query for multiple different quantiles, and know the error tolerance, use the Bias invariant. For example:

quantile.New(quantile.Unknown(0.01))

Targeted estimators consume significantly less resources than Biased estimators.

Passing no parameters will create an estimator that has a tolerance of 0.1, equivalent to:

quantile.New(quantile.Unknown(0.1))

Estimators are not safe to use from multiple goroutines.

func (*Estimator) Add

func (est *Estimator) Add(value float64)

Add buffers a new sample, committing and compressing the data structure when the buffer is full.

func (*Estimator) Get

func (est *Estimator) Get(quantile float64) float64

Get finds a value within (quantile - tolerance) * n <= value <= (quantile + tolerance) * n or 0 if no values have been observed.

func (*Estimator) Samples

func (est *Estimator) Samples() int

Samples returns the number of values this estimator has sampled.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL