cml

package module
Version: v0.0.0-...-45031c0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2017 License: MIT Imports: 4 Imported by: 3

README

Count-Min-Log

GoDoc

Count-Min-Log sketch: Approximately counting with approximate counters - Guillaume Pitel & Geoffroy Fouquier

TL;DR: Count-Min-Log Sketch for improved Average Relative Error on low frequency events

Count-Min Sketch is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested in the low-frequency items, such as in text- mining related tasks. Several variants of CMS have been proposed to compensate for the high relative error for low-frequency events, but the proposed solutions tend to correct the errors instead of preventing them. In this paper, we propose the Count-Min-Log sketch, which uses logarithm-based, approximate counters instead of linear counters to improve the average relative error of CMS at constant memory footprint.

This version implements the 16 bit register version. Will add back the 8-bit version soon.

Example Usage

import cml

...
sk, err := cml.NewDefaultSketch()
sk.IncreaseCount([]byte("scott pilgrim"))
...

sk.Frequency([]byte("scott pilgrim")) // ==> 1

Documentation

Overview

Package cml implments the Count-Min-Log datastructure.

It is based on Count-Min-Log sketch: Approximately counting with approximate counters - Guillaume Pitel & Geoffroy Fouquier

http://iswag-symposium.org/2015/pdfs/shortpaper1.pdf

TL;DR: Count-Min-Log Sketch for improved Average Relative Error on low frequency events

Count-Min Sketch is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested in the low-frequency items, such as in text- mining related tasks. Several variants of CMS have been proposed to compensate for the high relative error for low-frequency events, but the proposed solutions tend to correct the errors instead of preventing them. In this paper, we propose the Count-Min-Log sketch, which uses logarithm-based, approximate counters instead of linear counters to improve the average relative error of CMS at constant memory footprint.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Sketch

type Sketch struct {
	// contains filtered or unexported fields
}

Sketch is a Count-Min-Log Sketch 16-bit registers

func NewForCapacity16

func NewForCapacity16(capacity uint64, e float64) (*Sketch, error)

NewForCapacity16 returns a new Count-Min-Log Sketch with 16-bit registers optimized for a given max capacity and expected error rate

func NewSketch

func NewSketch(w uint, d uint, exp float64) (*Sketch, error)

NewSketch returns a new Count-Min-Log Sketch with 16-bit registers

func NewSketchForEpsilonDelta

func NewSketchForEpsilonDelta(epsilon, delta float64) (*Sketch, error)

NewSketchForEpsilonDelta for a given error rate epsiolen with a probability of delta

func (*Sketch) BulkUpdate

func (cml *Sketch) BulkUpdate(e []byte, freq uint) bool

BulkUpdate increases the count of `s` by one, return true if added and the current count of `s`

func (*Sketch) Query

func (cml *Sketch) Query(e []byte) float64

Query returns the count of `e`

func (*Sketch) Update

func (cml *Sketch) Update(e []byte) bool

Update increases the count of `s` by one, return true if added and the current count of `s`

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL