Documentation
¶
Overview ¶
Package hll provides an HLL data type and operations that can be de/serialized by any code implementing the Aggregate Knowledge Storage Spec.
Index ¶
- Constants
- Variables
- func Defaults(settings Settings) error
- type Hll
- func (h *Hll) AddRaw(value uint64)
- func (h *Hll) Cardinality() uint64
- func (h *Hll) Clear()
- func (h *Hll) GetStorageTypeAndSizeInBytes() (storageType, int)
- func (h *Hll) Settings() Settings
- func (h *Hll) StrictUnion(other Hll) error
- func (h *Hll) ToBytes() []byte
- func (h *Hll) ToBytesInPlace(inputBytes []byte) []byte
- func (h *Hll) Union(other Hll)
- type Settings
Constants ¶
const ( // AutoExplicitThreshold indicates that the threshold at which an Hll goes // from using an explicit to a probabalistic representation should be // calculated based on the configuration. Using the calculated threshold is // generally preferable. One exception would be working with a pre-existing // data set that uses a particular explicit threshold setting in which case // it may be desirable to use the same explicit threshold. AutoExplicitThreshold = -1 )
Variables ¶
var ErrIncompatible = errors.New("cannot StrictUnion Hlls with different regwidth or log2m settings")
ErrIncompatible is returned by StrictUnion in cases where the two Hlls have incompatible settings that prevent the operation from occurring.
var ErrInsufficientBytes = errors.New("insufficient bytes to deserialize Hll")
ErrInsufficientBytes is returned by FromBytes in cases where the provided byte slice is truncated.
Functions ¶
Types ¶
type Hll ¶
type Hll struct {
// contains filtered or unexported fields
}
Hll is a probabilistic set of hashed elements. It supports add and union operations in addition to estimating the cardinality. The zero value is an empty set, provided that Defaults has been invoked with default settings. Otherwise, operations on the zero value will cause a panic as it would be a coding error to attempt operations without first configuring the library.
func FromBytes ¶
FromBytes deserializes the provided byte slice into an Hll. It will return an error if the version is anything other than 1, if the leading bytes specify an invalid configuration, or if the byte slice is truncated.
func NewHll ¶
NewHll creates a new Hll with the provided settings. It will return an error if the settings are invalid. Since an application usually deals with homogeneous Hlls, it's preferable to install default settings and use the zero value. This function is provided in case an application must juggle different configurations.
func (*Hll) AddRaw ¶
AddRaw adds the observed value into the Hll. The value is expected to have been hashed with a good hash function such as Murmur3 or xxHash. If the value does not have sufficient entropy, then the resulting cardinality estimations will not be accurate.
There is an edge case where the raw value of 0 is not added to the Hll. In the sparse or dense representation, a zero value would not affect the cardinality calculations because there are no set bits to observe. In order to be consistent, the explicit representation will therefore ignore a 0 value.
func (*Hll) Cardinality ¶
Cardinality estimates the number of values that have been added to this Hll.
func (*Hll) Clear ¶
func (h *Hll) Clear()
Clear resets this Hll. Unlike other implementations that leave the backing storage in place, this resets the Hll to the empty, zero value.
func (*Hll) GetStorageTypeAndSizeInBytes ¶
GetStorageTypeAndSizeInBytes returns the storage type and the number of bytes needed to serialize this Hll. This includes the 3 header bytes and the bytes needed for the storage.
func (*Hll) StrictUnion ¶
StrictUnion will calculate the union of this Hll and the other Hll and store the results into the receiver. It will return an error if the two Hlls are not compatible where compatibility is defined as having the same register width and log2m. explicit and sparse thresholds don't factor into compatibility.
func (*Hll) ToBytes ¶
ToBytes returns a byte slice with the serialized Hll value per the storage spec https://github.com/aggregateknowledge/hll-storage-spec/blob/master/STORAGE.md.
func (*Hll) ToBytesInPlace ¶
ToBytesInPlace serializes the Hll value into the provided byte slice. If the provided byte slice is nil or not large enough, a new byte slice will be allocated and returned.
func (*Hll) Union ¶
Union will calculate the union of this Hll and the other Hll and store the results into the receiver.
Unlike StrictUnion, it allows unions between Hlls with different settings to be combined, though doing so is not recommended because it will result in a loss of accuracy.
As long as your application uses a single group of settings, it is safe to use this function. If there is a possibility that you may union two Hlls with incompatible settings, then it's safer to use StrictUnion and check for errors.
type Settings ¶
type Settings struct { // Log2m determines the number of registers in the Hll. The minimum value // is 4 and the maximum value is 31. The number of registers in the Hll // will be calculated as 2^Log2m. Log2m int // Regwidth is the number of bits dedicated to each register value. The // minimum value is 1 and the maximum value is 8. Regwidth int // ExplicitThreshold is the cardinality at which the Hll will go from // storing explicit values to using a probabilistic model. A value of 0 // disables explicit storage entirely. The value AutoExplicitThreshold can // be used to signal the library to calculate an appropriate threshold // (recommended). The maximum allowed value is 131,072. ExplicitThreshold int // SparseEnabled controls whether the Hll will use the sparse // representation. The thresholds for conversion are automatically // calculated by the library when this field is set to true (recommended). SparseEnabled bool }
Settings are used to configure the Hll and how it transitions between the backing storage types.