algo

package
v25.0.0-split-vector2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 26, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package algo contains algorithms such as merging, intersecting sorted lists.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ApplyFilter

func ApplyFilter(u *pb.List, f func(uint64, int) bool)

ApplyFilter applies a filter to our UIDList.

func ApplyFilterPacked

func ApplyFilterPacked(u *pb.UidPack, f func(uint64, int) bool) *pb.UidPack

ApplyFilterPacked applies the filter to a list of packed uids.

func Difference

func Difference(u, v *pb.List) *pb.List

Difference returns the difference of two lists.

func DifferencePacked

func DifferencePacked(u, v *pb.UidPack) *pb.UidPack

DifferencePacked performs the difference operation between two UidPack objects.

func IndexOf

func IndexOf(u *pb.List, uid uint64) int

IndexOf performs a binary search on the uids slice and returns the index at which it finds the uid, else returns -1

func IndexOfPacked

func IndexOfPacked(u *pb.UidPack, uid uint64) int

IndexOfPacked finds the index of the given uid in the UidPack. If it doesn't find it, it returns -1.

func IntersectCompressedWith

func IntersectCompressedWith(pack *pb.UidPack, afterUID uint64, v, o *pb.List)

IntersectCompressedWith intersects a packed list of UIDs with another list and writes the output to o.

func IntersectCompressedWithBin

func IntersectCompressedWithBin(dec *codec.Decoder, q []uint64, o *[]uint64)

IntersectCompressedWithBin is based on the paper "Fast Intersection Algorithms for Sorted Sequences" https://link.springer.com/chapter/10.1007/978-3-642-12476-1_3 Call seek on dec before calling this function

func IntersectCompressedWithLinJump

func IntersectCompressedWithLinJump(dec *codec.Decoder, v []uint64, o *[]uint64)

IntersectCompressedWithLinJump performs the intersection linearly.

func IntersectSorted

func IntersectSorted(lists []*pb.List) *pb.List

IntersectSorted calculates the intersection of multiple lists and performs the intersections from the smallest to the largest list.

func IntersectSortedPacked

func IntersectSortedPacked(lists []*pb.UidPack) *pb.UidPack

IntersectSortedPacked calculates the intersection of multiple lists and performs the intersections from the smallest to the largest list.

func IntersectWith

func IntersectWith(u, v, o *pb.List)

IntersectWith intersects u with v. The update is made to o. u, v should be sorted.

func IntersectWithBin

func IntersectWithBin(d, q []uint64, o *[]uint64) int

IntersectWithBin is based on the paper "Fast Intersection Algorithms for Sorted Sequences" https://link.springer.com/chapter/10.1007/978-3-642-12476-1_3 Returns where to move the second array(q) to. O means not found

func IntersectWithJump

func IntersectWithJump(u, v []uint64, o *[]uint64) (int, int)

IntersectWithJump performs the intersection linearly but jumping jump steps between iterations.

func IntersectWithLin

func IntersectWithLin(u, v []uint64, o *[]uint64) (int, int)

IntersectWithLin performs the intersection linearly.

func IntersectWithLinPacked

func IntersectWithLinPacked(u, v *pb.UidPack) *pb.UidPack

IntersectWithLinPacked performs the liner intersection between two compressed uid lists.

func MergeSorted

func MergeSorted(lists []*pb.List) *pb.List

func MergeSortedMoreMem

func MergeSortedMoreMem(lists []*pb.List) *pb.List

func MergeSortedPacked

func MergeSortedPacked(lists []*pb.UidPack) *pb.UidPack

MergeSortedPacked merges already sorted UidPack objects into a single UidPack.

func ToUintsListForTest

func ToUintsListForTest(ul []*pb.List) [][]uint64

ToUintsListForTest converts to list of uints for testing purpose only.

Types

type CountMinSketch

type CountMinSketch struct {
	// contains filtered or unexported fields
}

This code is copied from https://www.github.com/tylertreat/BoomFilters/refs/heads/master/countmin.go CountMinSketch implements a Count-Min Sketch as described by Cormode and Muthukrishnan in An Improved Data Stream Summary: The Count-Min Sketch and its Applications:

http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf

A Count-Min Sketch (CMS) is a probabilistic data structure which approximates the frequency of events in a data stream. Unlike a hash map, a CMS uses sub-linear space at the expense of a configurable error factor. Similar to Counting Bloom filters, items are hashed to a series of buckets, which increment a counter. The frequency of an item is estimated by taking the minimum of each of the item's respective counter values.

Count-Min Sketches are useful for counting the frequency of events in massive data sets or unbounded streams online. In these situations, storing the entire data set or allocating counters for every event in memory is impractical. It may be possible for offline processing, but real-time processing requires fast, space-efficient solutions like the CMS. For approximating set cardinality, refer to the HyperLogLog.

func NewCountMinSketch

func NewCountMinSketch(epsilon, delta float64) *CountMinSketch

NewCountMinSketch creates a new Count-Min Sketch whose relative accuracy is within a factor of epsilon with probability delta. Both of these parameters affect the space and time complexity.

func (*CountMinSketch) Add

func (c *CountMinSketch) Add(data []byte) *CountMinSketch

Add will add the data to the set. Returns the CountMinSketch to allow for chaining.

func (*CountMinSketch) AddInt

func (c *CountMinSketch) AddInt(data []byte, n uint64) *CountMinSketch

AddInt will add the data to the set n times. Returns the CountMinSketch to allow for chaining.

func (*CountMinSketch) Count

func (c *CountMinSketch) Count(data []byte) uint64

Count returns the approximate count for the specified item, correct within epsilon * total count with a probability of delta.

func (*CountMinSketch) Delta

func (c *CountMinSketch) Delta() float64

Delta returns the relative-accuracy probability, delta.

func (*CountMinSketch) Epsilon

func (c *CountMinSketch) Epsilon() float64

Epsilon returns the relative-accuracy factor, epsilon.

func (*CountMinSketch) Merge

func (c *CountMinSketch) Merge(other *CountMinSketch) error

Merge combines this CountMinSketch with another. Returns an error if the matrix width and depth are not equal.

func (*CountMinSketch) ReadDataFrom

func (c *CountMinSketch) ReadDataFrom(stream io.Reader) (int, error)

ReadDataFrom reads a binary representation of the CMS data written by WriteDataTo() from io stream. It returns the number of bytes read and error If serialized CMS configuration is different it returns error with expected params

func (*CountMinSketch) Reset

func (c *CountMinSketch) Reset() *CountMinSketch

Reset restores the CountMinSketch to its original state. It returns itself to allow for chaining.

func (*CountMinSketch) SetHash

func (c *CountMinSketch) SetHash(h hash.Hash64)

SetHash sets the hashing function used.

func (*CountMinSketch) TestAndRemove

func (c *CountMinSketch) TestAndRemove(data []byte, n uint64) bool

TestAndRemove attemps to remove n counts of data from the CMS. If n is greater than the data count, TestAndRemove is a no-op and returns false. Else, return true and decrement count by n.

func (*CountMinSketch) TestAndRemoveAll

func (c *CountMinSketch) TestAndRemoveAll(data []byte) bool

TestAndRemoveAll counts data frequency, performs TestAndRemove(data, count), and returns true if count is positive. If count is 0, TestAndRemoveAll is a no-op and returns false.

func (*CountMinSketch) TotalCount

func (c *CountMinSketch) TotalCount() uint64

TotalCount returns the number of items added to the sketch.

func (*CountMinSketch) WriteDataTo

func (c *CountMinSketch) WriteDataTo(stream io.Writer) (int, error)

WriteDataTo writes a binary representation of the CMS data to an io stream. It returns the number of bytes written and error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL