topk

package
v0.0.0-...-113f59a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2024 License: BSD-3-Clause Imports: 5 Imported by: 0

Documentation

Overview

Package topk defines a count-min sketch and a cheap probabilistic top-K data structure that uses the count-min sketch to track the top K items in constant memory and O(log(k)) time.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func PickParams

func PickParams(err, probability float64) (hashes, buckets int)

PickParams provides good parameters for 'hashes' and 'buckets' when constructing a CountMinSketch, given an estimated total number of counts (i.e. the sum of all counts ever stored), the error factor ϵ as a float (e.g. 1% is 0.001), and the probability factor δ.

Parameters are chosen such that with a probability of 1−δ, the error is at most ϵ∗totalCount. Or, in other words: if N is the true count of an event, E is the estimate given by a sketch and T the total count of items in the sketch, E ≤ N + T*ϵ with probability (1 - δ).

Types

type CountMinSketch

type CountMinSketch struct {
	// contains filtered or unexported fields
}

CountMinSketch implements a count-min sketch, a probabilistic data structure that tracks the frequency of events in a stream of data.

See: https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch

func NewCountMinSketch

func NewCountMinSketch(hashes, buckets int) *CountMinSketch

NewCountMinSketch creates a new CountMinSketch with the provided number of hashes and buckets. Hashes and buckets are often called "depth" and "width", or "d" and "w", respectively.

func (*CountMinSketch) Add

func (cms *CountMinSketch) Add(val []byte) uint64

Add calls AddN(val, 1).

func (*CountMinSketch) AddN

func (cms *CountMinSketch) AddN(val []byte, count uint64) uint64

AddN increments the count for the given value by the provided count, returning the new count.

func (*CountMinSketch) Get

func (cms *CountMinSketch) Get(val []byte) uint64

Get returns the count for the provided value.

type SerializeFunc

type SerializeFunc[T any] func([]byte, T) []byte

HashFunc is responsible for providing a []byte serialization of a value, appended to the provided byte slice. This is used for hashing the value when adding to a CountMinSketch.

type TopK

type TopK[T any] struct {
	// contains filtered or unexported fields
}

TopK is a probabilistic counter of the top K items, using a count-min sketch to keep track of item counts and a heap to track the top K of them.

func New

func New[T any](k int, sf SerializeFunc[T]) *TopK[T]

New creates a new TopK that stores k values. Parameters for the underlying count-min sketch are chosen for a 0.1% error rate and a 0.1% probability of error.

func NewWithParams

func NewWithParams[T any](k int, sf SerializeFunc[T], numHashes, numCols int) *TopK[T]

NewWithParams creates a new TopK that stores k values, and additionally allows customizing the parameters for the underlying count-min sketch.

func (*TopK[T]) Add

func (tk *TopK[T]) Add(val T) uint64

Add calls AddN(val, 1).

func (*TopK[T]) AddN

func (tk *TopK[T]) AddN(val T, count uint64) uint64

AddN adds the given item to the set with the provided count, returning the new estimated count.

func (*TopK[T]) AppendTop

func (tk *TopK[T]) AppendTop(sl []T) []T

AppendTop appends the estimated top K items as stored by this TopK to the provided slice, allocating only if the slice does not have enough capacity to store all items. The provided slice can be nil.

func (*TopK[T]) Top

func (tk *TopK[T]) Top() []T

Top returns the estimated top K items as stored by this TopK.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL