Documentation
¶
Overview ¶
Package algo contains algorithms such as merging, intersecting sorted lists.
Index ¶
- func ApplyFilter(u *pb.List, f func(uint64, int) bool)
- func ApplyFilterPacked(u *pb.UidPack, f func(uint64, int) bool) *pb.UidPack
- func Difference(u, v *pb.List) *pb.List
- func DifferencePacked(u, v *pb.UidPack) *pb.UidPack
- func IndexOf(u *pb.List, uid uint64) int
- func IndexOfPacked(u *pb.UidPack, uid uint64) int
- func IntersectCompressedWith(pack *pb.UidPack, afterUID uint64, v, o *pb.List)
- func IntersectCompressedWithBin(dec *codec.Decoder, q []uint64, o *[]uint64)
- func IntersectCompressedWithLinJump(dec *codec.Decoder, v []uint64, o *[]uint64)
- func IntersectSorted(lists []*pb.List) *pb.List
- func IntersectSortedPacked(lists []*pb.UidPack) *pb.UidPack
- func IntersectWith(u, v, o *pb.List)
- func IntersectWithBin(d, q []uint64, o *[]uint64) int
- func IntersectWithJump(u, v []uint64, o *[]uint64) (int, int)
- func IntersectWithLin(u, v []uint64, o *[]uint64) (int, int)
- func IntersectWithLinPacked(u, v *pb.UidPack) *pb.UidPack
- func MergeSorted(lists []*pb.List) *pb.List
- func MergeSortedMoreMem(lists []*pb.List) *pb.List
- func MergeSortedPacked(lists []*pb.UidPack) *pb.UidPack
- func ToUintsListForTest(ul []*pb.List) [][]uint64
- type CountMinSketch
- func (c *CountMinSketch) Add(data []byte) *CountMinSketch
- func (c *CountMinSketch) AddInt(data []byte, n uint64) *CountMinSketch
- func (c *CountMinSketch) Count(data []byte) uint64
- func (c *CountMinSketch) Delta() float64
- func (c *CountMinSketch) Epsilon() float64
- func (c *CountMinSketch) Merge(other *CountMinSketch) error
- func (c *CountMinSketch) ReadDataFrom(stream io.Reader) (int, error)
- func (c *CountMinSketch) Reset() *CountMinSketch
- func (c *CountMinSketch) SetHash(h hash.Hash64)
- func (c *CountMinSketch) TestAndRemove(data []byte, n uint64) bool
- func (c *CountMinSketch) TestAndRemoveAll(data []byte) bool
- func (c *CountMinSketch) TotalCount() uint64
- func (c *CountMinSketch) WriteDataTo(stream io.Writer) (int, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ApplyFilter ¶
ApplyFilter applies a filter to our UIDList.
func ApplyFilterPacked ¶
ApplyFilterPacked applies the filter to a list of packed uids.
func Difference ¶
Difference returns the difference of two lists.
func DifferencePacked ¶
DifferencePacked performs the difference operation between two UidPack objects.
func IndexOf ¶
IndexOf performs a binary search on the uids slice and returns the index at which it finds the uid, else returns -1
func IndexOfPacked ¶
IndexOfPacked finds the index of the given uid in the UidPack. If it doesn't find it, it returns -1.
func IntersectCompressedWith ¶
IntersectCompressedWith intersects a packed list of UIDs with another list and writes the output to o.
func IntersectCompressedWithBin ¶
IntersectCompressedWithBin is based on the paper "Fast Intersection Algorithms for Sorted Sequences" https://link.springer.com/chapter/10.1007/978-3-642-12476-1_3 Call seek on dec before calling this function
func IntersectCompressedWithLinJump ¶
IntersectCompressedWithLinJump performs the intersection linearly.
func IntersectSorted ¶
IntersectSorted calculates the intersection of multiple lists and performs the intersections from the smallest to the largest list.
func IntersectSortedPacked ¶
IntersectSortedPacked calculates the intersection of multiple lists and performs the intersections from the smallest to the largest list.
func IntersectWith ¶
IntersectWith intersects u with v. The update is made to o. u, v should be sorted.
func IntersectWithBin ¶
IntersectWithBin is based on the paper "Fast Intersection Algorithms for Sorted Sequences" https://link.springer.com/chapter/10.1007/978-3-642-12476-1_3 Returns where to move the second array(q) to. O means not found
func IntersectWithJump ¶
IntersectWithJump performs the intersection linearly but jumping jump steps between iterations.
func IntersectWithLin ¶
IntersectWithLin performs the intersection linearly.
func IntersectWithLinPacked ¶
IntersectWithLinPacked performs the liner intersection between two compressed uid lists.
func MergeSortedPacked ¶
MergeSortedPacked merges already sorted UidPack objects into a single UidPack.
func ToUintsListForTest ¶
ToUintsListForTest converts to list of uints for testing purpose only.
Types ¶
type CountMinSketch ¶
type CountMinSketch struct {
// contains filtered or unexported fields
}
This code is copied from https://www.github.com/tylertreat/BoomFilters/refs/heads/master/countmin.go CountMinSketch implements a Count-Min Sketch as described by Cormode and Muthukrishnan in An Improved Data Stream Summary: The Count-Min Sketch and its Applications:
http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf
A Count-Min Sketch (CMS) is a probabilistic data structure which approximates the frequency of events in a data stream. Unlike a hash map, a CMS uses sub-linear space at the expense of a configurable error factor. Similar to Counting Bloom filters, items are hashed to a series of buckets, which increment a counter. The frequency of an item is estimated by taking the minimum of each of the item's respective counter values.
Count-Min Sketches are useful for counting the frequency of events in massive data sets or unbounded streams online. In these situations, storing the entire data set or allocating counters for every event in memory is impractical. It may be possible for offline processing, but real-time processing requires fast, space-efficient solutions like the CMS. For approximating set cardinality, refer to the HyperLogLog.
func NewCountMinSketch ¶
func NewCountMinSketch(epsilon, delta float64) *CountMinSketch
NewCountMinSketch creates a new Count-Min Sketch whose relative accuracy is within a factor of epsilon with probability delta. Both of these parameters affect the space and time complexity.
func (*CountMinSketch) Add ¶
func (c *CountMinSketch) Add(data []byte) *CountMinSketch
Add will add the data to the set. Returns the CountMinSketch to allow for chaining.
func (*CountMinSketch) AddInt ¶
func (c *CountMinSketch) AddInt(data []byte, n uint64) *CountMinSketch
AddInt will add the data to the set n times. Returns the CountMinSketch to allow for chaining.
func (*CountMinSketch) Count ¶
func (c *CountMinSketch) Count(data []byte) uint64
Count returns the approximate count for the specified item, correct within epsilon * total count with a probability of delta.
func (*CountMinSketch) Delta ¶
func (c *CountMinSketch) Delta() float64
Delta returns the relative-accuracy probability, delta.
func (*CountMinSketch) Epsilon ¶
func (c *CountMinSketch) Epsilon() float64
Epsilon returns the relative-accuracy factor, epsilon.
func (*CountMinSketch) Merge ¶
func (c *CountMinSketch) Merge(other *CountMinSketch) error
Merge combines this CountMinSketch with another. Returns an error if the matrix width and depth are not equal.
func (*CountMinSketch) ReadDataFrom ¶
func (c *CountMinSketch) ReadDataFrom(stream io.Reader) (int, error)
ReadDataFrom reads a binary representation of the CMS data written by WriteDataTo() from io stream. It returns the number of bytes read and error If serialized CMS configuration is different it returns error with expected params
func (*CountMinSketch) Reset ¶
func (c *CountMinSketch) Reset() *CountMinSketch
Reset restores the CountMinSketch to its original state. It returns itself to allow for chaining.
func (*CountMinSketch) SetHash ¶
func (c *CountMinSketch) SetHash(h hash.Hash64)
SetHash sets the hashing function used.
func (*CountMinSketch) TestAndRemove ¶
func (c *CountMinSketch) TestAndRemove(data []byte, n uint64) bool
TestAndRemove attemps to remove n counts of data from the CMS. If n is greater than the data count, TestAndRemove is a no-op and returns false. Else, return true and decrement count by n.
func (*CountMinSketch) TestAndRemoveAll ¶
func (c *CountMinSketch) TestAndRemoveAll(data []byte) bool
TestAndRemoveAll counts data frequency, performs TestAndRemove(data, count), and returns true if count is positive. If count is 0, TestAndRemoveAll is a no-op and returns false.
func (*CountMinSketch) TotalCount ¶
func (c *CountMinSketch) TotalCount() uint64
TotalCount returns the number of items added to the sketch.
func (*CountMinSketch) WriteDataTo ¶
func (c *CountMinSketch) WriteDataTo(stream io.Writer) (int, error)
WriteDataTo writes a binary representation of the CMS data to an io stream. It returns the number of bytes written and error