Documentation
¶
Overview ¶
Package tdigest provides a highly accurate mergeable data-structure for quantile estimation.
Typical T-Digest use cases involve accumulating metrics on several distinct nodes of a cluster and then merging them together to get a system-wide quantile overview. Things such as: sensory data from IoT devices, quantiles over enormous document datasets (think ElasticSearch), performance metrics for distributed systems, etc.
After you create (and configure, if desired) the digest:
digest, err := tdigest.New(tdigest.Compression(100))
You can then use it for registering measurements:
digest.Add(number)
Estimating quantiles:
digest.Quantile(0.99)
And merging with another digest:
digest.Merge(otherDigest)
Index ¶
- func Compression(compression float64) tdigestOption
- func LocalRandomNumberGenerator(seed int64) tdigestOption
- func RandomNumberGenerator(rng RNG) tdigestOption
- type RNG
- type TDigest
- func (t *TDigest) Add(value float64) error
- func (t *TDigest) AddWeighted(value float64, count uint64) (err error)
- func (t TDigest) AsBytes() ([]byte, error)
- func (t *TDigest) CDF(value float64) float64
- func (t *TDigest) Clone() *TDigest
- func (t *TDigest) Compress() (err error)
- func (t *TDigest) Compression() float64
- func (t TDigest) Count() uint64
- func (t *TDigest) ForEachCentroid(f func(mean float64, count uint64) bool)
- func (t *TDigest) FromBytes(buf []byte) error
- func (t *TDigest) Merge(other *TDigest) (err error)
- func (t *TDigest) MergeDestructive(other *TDigest) (err error)
- func (t *TDigest) Quantile(q float64) float64
- func (t *TDigest) ToBytes(b []byte) []byte
- func (t *TDigest) TrimmedMean(p1, p2 float64) float64
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Compression ¶
func Compression(compression float64) tdigestOption
Compression sets the digest compression
The compression parameter rules the threshold in which samples are merged together - the more often distinct samples are merged the more precision is lost. Compression should be tuned according to your data distribution, but a value of 100 (the default) is often good enough.
A higher compression value means holding more centroids in memory (thus: better precision), which means a bigger serialization payload, higher memory footprint and slower addition of new samples.
Compression must be a value greater of equal to 1, will yield an error otherwise.
func LocalRandomNumberGenerator ¶
func LocalRandomNumberGenerator(seed int64) tdigestOption
LocalRandomNumberGenerator makes the TDigest use the default `math/random` functions but with an unshared source that is seeded with the given `seed` parameter.
func RandomNumberGenerator ¶
func RandomNumberGenerator(rng RNG) tdigestOption
RandomNumberGenerator sets the RNG to be used internally
This allows changing which random number source is used when using the TDigest structure (rngs are used when deciding which candidate centroid to merge with and when compressing or merging with another digest for it increases accuracy). This functionality is particularly useful for testing or when you want to disconnect your sample collection from the (default) shared random source to minimize lock contention.
Types ¶
type RNG ¶
RNG is an interface that wraps the needed random number generator calls that tdigest uses during its runtime
type TDigest ¶
type TDigest struct {
// contains filtered or unexported fields
}
TDigest is a quantile approximation data structure.
func FromBytes ¶
FromBytes reads a byte buffer with a serialized digest (from AsBytes) and deserializes it.
This function creates a new tdigest instance with the provided options, but ignores the compression setting since the correct value comes from the buffer.
func New ¶
New creates a new digest.
By default the digest is constructed with a configuration that should be useful for most use-cases. It comes with compression set to 100 and uses a local random number generator for performance reasons.
func (*TDigest) Add ¶ added in v0.3.0
Add is an alias for AddWeighted(x,1) Read the documentation for AddWeighted for more details.
func (*TDigest) AddWeighted ¶
AddWeighted registers a new sample in the digest.
It's the main entry point for the digest and very likely the only method to be used for collecting samples. The count parameter is for when you are registering a sample that occurred multiple times - the most common value for this is 1.
This will emit an error if `value` is NaN of if `count` is zero.
func (TDigest) AsBytes ¶
AsBytes serializes the digest into a byte array so it can be saved to disk or sent over the wire.
func (*TDigest) CDF ¶
CDF computes the fraction in which all samples are less than or equal to the given value.
func (*TDigest) Compress ¶
Compress tries to reduce the number of individual centroids stored in the digest.
Compression trades off accuracy for performance and happens automatically after a certain amount of distinct samples have been stored.
At any point in time you may call Compress on a digest, but you may completely ignore this and it will compress itself automatically after it grows too much. If you are minimizing network traffic it might be a good idea to compress before serializing.
func (*TDigest) Compression ¶
Compression returns the TDigest compression.
func (TDigest) Count ¶
Count returns the total number of samples this digest represents
The result represents how many times Add() was called on a digest plus how many samples the digests it has been merged with had. This is useful mainly for two scenarios:
- Knowing if there is enough data so you can trust the quantiles
- Knowing if you've registered too many samples already and deciding what to do about it.
For the second case one approach would be to create a side empty digest and start registering samples on it as well as on the old (big) one and then discard the bigger one after a certain criterion is reached (say, minimum number of samples or a small relative error between new and old digests).
func (*TDigest) ForEachCentroid ¶ added in v1.1.0
ForEachCentroid calls the specified function for each centroid.
Iteration stops when the supplied function returns false, or when all centroids have been iterated.
func (*TDigest) FromBytes ¶
FromBytes deserializes into the supplied TDigest struct, re-using and overwriting any existing buffers.
This method reinitializes the digest from the provided buffer discarding any previously collected data. Notice that in case of errors this may leave the digest in a unusable state.
func (*TDigest) Merge ¶
Merge joins a given digest into itself.
Merging is useful when you have multiple TDigest instances running in separate threads and you want to compute quantiles over all the samples. This is particularly important on a scatter-gather/map-reduce scenario.
func (*TDigest) MergeDestructive ¶
MergeDestructive joins a given digest into itself rendering the other digest invalid.
This works as Merge above but its faster. Using this method requires caution as it makes 'other' useless - you must make sure you discard it without making further uses of it.
func (*TDigest) Quantile ¶ added in v1.0.0
Quantile returns the desired percentile estimation.
Values of p must be between 0 and 1 (inclusive), will panic otherwise.
func (*TDigest) ToBytes ¶
ToBytes serializes into the supplied slice, avoiding allocation if the slice is large enough. The result slice is returned.
func (*TDigest) TrimmedMean ¶
TrimmedMean returns the mean of the distribution between the two percentiles p1 and p2.
Values of p1 and p2 must be beetween 0 and 1 (inclusive) and p1 must be less than p2. Will panic otherwise.