hashing

package
v18.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 14, 2025 License: Apache-2.0, BSD-3-Clause Imports: 9 Imported by: 0

Documentation

Overview

Package hashing provides utilities for and an implementation of a hash table which is more performant than the default go map implementation by leveraging xxh3 and some custom hash functions.

Index

Constants

View Source
const KeyNotFound = -1

KeyNotFound is the constant returned by memo table functions when a key isn't found in the table

Variables

This section is empty.

Functions

func Hash

func Hash(b []byte, alg uint64) uint64

for smaller amounts of bytes this is faster than even calling into xxh3 to do the Hash, so we specialize in order to get the benefits of that performance.

Types

type BinaryBuilderIFace

type BinaryBuilderIFace interface {
	Reserve(int)
	ReserveData(int)
	Retain()
	Resize(int)
	ResizeData(int)
	Release()
	DataLen() int
	Value(int) []byte
	Len() int
	AppendNull()
	AppendString(string)
	Append([]byte)
}

type BinaryMemoTable

type BinaryMemoTable struct {
	// contains filtered or unexported fields
}

BinaryMemoTable is our hashtable for binary data using the BinaryBuilder to construct the actual data in an easy to pass around way with minimal copies while using a hash table to keep track of the indexes into the dictionary that is created as we go.

func NewBinaryMemoTable

func NewBinaryMemoTable(initial, valuesize int, bldr BinaryBuilderIFace) *BinaryMemoTable

NewBinaryMemoTable returns a hash table for Binary data, the passed in allocator will be utilized for the BinaryBuilder, if nil then memory.DefaultAllocator will be used. initial and valuesize can be used to pre-allocate the table to reduce allocations. With initial being the initial number of entries to allocate for and valuesize being the starting amount of space allocated for writing the actual binary data.

func (*BinaryMemoTable) CopyFixedWidthValues

func (b *BinaryMemoTable) CopyFixedWidthValues(start, width int, out []byte)

CopyFixedWidthValues exists to cope with the fact that the table doesn't keep track of the fixed width when inserting the null value the databuffer holds a zero length byte slice for the null value (if found)

func (*BinaryMemoTable) CopyLargeOffsets

func (b *BinaryMemoTable) CopyLargeOffsets(out []int64)

CopyLargeOffsets copies the list of offsets into the passed in slice, the offsets being the start and end values of the underlying allocated bytes in the builder for the individual values of the table. out should be at least sized to Size()+1

func (*BinaryMemoTable) CopyLargeOffsetsSubset

func (b *BinaryMemoTable) CopyLargeOffsetsSubset(start int, out []int64)

CopyLargeOffsetsSubset is like CopyOffsets but instead of copying all of the offsets, it gets a subset of the offsets in the table starting at the index provided by "start".

func (*BinaryMemoTable) CopyOffsets

func (b *BinaryMemoTable) CopyOffsets(out []int32)

CopyOffsets copies the list of offsets into the passed in slice, the offsets being the start and end values of the underlying allocated bytes in the builder for the individual values of the table. out should be at least sized to Size()+1

func (*BinaryMemoTable) CopyOffsetsSubset

func (b *BinaryMemoTable) CopyOffsetsSubset(start int, out []int32)

CopyOffsetsSubset is like CopyOffsets but instead of copying all of the offsets, it gets a subset of the offsets in the table starting at the index provided by "start".

func (*BinaryMemoTable) CopyValues

func (b *BinaryMemoTable) CopyValues(out interface{})

CopyValues copies the raw binary data bytes out, out should be a []byte with at least ValuesSize bytes allocated to copy into.

func (*BinaryMemoTable) CopyValuesSubset

func (b *BinaryMemoTable) CopyValuesSubset(start int, out interface{})

CopyValuesSubset copies the raw binary data bytes out starting with the value at the index start, out should be a []byte with at least ValuesSize bytes allocated

func (*BinaryMemoTable) Exists added in v18.3.0

func (b *BinaryMemoTable) Exists(val []byte) bool

func (*BinaryMemoTable) Get

func (b *BinaryMemoTable) Get(val interface{}) (int, bool)

Get returns the index of the specified value in the table or KeyNotFound, and a boolean indicating whether it was found in the table.

func (*BinaryMemoTable) GetNull

func (s *BinaryMemoTable) GetNull() (int, bool)

GetNull returns the index of a null that has been inserted into the table or KeyNotFound. The bool returned will be true if there was a null inserted into the table, and false otherwise.

func (*BinaryMemoTable) GetOrInsert

func (b *BinaryMemoTable) GetOrInsert(val interface{}) (idx int, found bool, err error)

func (*BinaryMemoTable) GetOrInsertBytes

func (b *BinaryMemoTable) GetOrInsertBytes(val []byte) (idx int, found bool, err error)

GetOrInsertBytes returns the index of the given value in the table, if not found it is inserted into the table. The return value 'found' indicates whether the value was found in the table (true) or inserted (false) along with any possible error.

func (*BinaryMemoTable) GetOrInsertNull

func (b *BinaryMemoTable) GetOrInsertNull() (idx int, found bool)

GetOrInsertNull retrieves the index of a null in the table or inserts null into the table, returning the index and a boolean indicating if it was found in the table (true) or was inserted (false).

func (*BinaryMemoTable) InsertOrGet added in v18.3.0

func (b *BinaryMemoTable) InsertOrGet(val []byte) (idx int, found bool, err error)

GetOrInsert returns the index of the given value in the table, if not found it is inserted into the table. The return value 'found' indicates whether the value was found in the table (true) or inserted (false) along with any possible error.

func (*BinaryMemoTable) Release

func (b *BinaryMemoTable) Release()

Release is used to tell the underlying builder that it can release the memory allocated when the reference count reaches 0, this is safe to be called from multiple goroutines simultaneously

func (*BinaryMemoTable) Reset

func (s *BinaryMemoTable) Reset()

Reset dumps all of the data in the table allowing it to be reutilized.

func (*BinaryMemoTable) Retain

func (b *BinaryMemoTable) Retain()

Retain increases the ref count, it is safe to call it from multiple goroutines simultaneously.

func (*BinaryMemoTable) Size

func (s *BinaryMemoTable) Size() int

Size returns the current size of the memo table including the null value if one has been inserted.

func (BinaryMemoTable) TypeTraits

func (BinaryMemoTable) TypeTraits() TypeTraits

func (*BinaryMemoTable) Value

func (b *BinaryMemoTable) Value(i int) []byte

func (*BinaryMemoTable) ValuesSize

func (b *BinaryMemoTable) ValuesSize() int

ValuesSize returns the current total size of all the raw bytes that have been inserted into the memotable so far.

func (*BinaryMemoTable) VisitValues

func (b *BinaryMemoTable) VisitValues(start int, visitFn func([]byte))

VisitValues exists to run the visitFn on each value currently in the hash table.

func (*BinaryMemoTable) WriteOut

func (b *BinaryMemoTable) WriteOut(out []byte)

func (*BinaryMemoTable) WriteOutSubset

func (b *BinaryMemoTable) WriteOutSubset(start int, out []byte)

type ByteSlice

type ByteSlice interface {
	Bytes() []byte
}

type Float32MemoTable

type Float32MemoTable = Table[float32]

type Float64MemoTable

type Float64MemoTable = Table[float64]

type HashTable added in v18.4.0

type HashTable[T fixedLenMemoTypes] struct {
	// contains filtered or unexported fields
}

func NewHashTable added in v18.4.0

func NewHashTable[T fixedLenMemoTypes](cap uint64) *HashTable[T]

func (*HashTable[T]) CopyValues added in v18.4.0

func (h *HashTable[T]) CopyValues(out []T)

func (*HashTable[T]) CopyValuesSubset added in v18.4.0

func (h *HashTable[T]) CopyValuesSubset(start int, out []T)

func (*HashTable[T]) Insert added in v18.4.0

func (h *HashTable[T]) Insert(e *entry[T], v uint64, val T, memoIdx int32) error

func (*HashTable[T]) Lookup added in v18.4.0

func (h *HashTable[T]) Lookup(v uint64, cmp func(T) bool) (*entry[T], bool)

func (*HashTable[T]) Reset added in v18.4.0

func (h *HashTable[T]) Reset(cap uint64)

func (*HashTable[T]) VisitEntries added in v18.4.0

func (h *HashTable[T]) VisitEntries(visit func(*entry[T]))

func (*HashTable[T]) WriteOut added in v18.4.0

func (h *HashTable[T]) WriteOut(out []byte)

func (*HashTable[T]) WriteOutSubset added in v18.4.0

func (h *HashTable[T]) WriteOutSubset(start int, out []byte)

type Int16MemoTable

type Int16MemoTable = Table[int16]

type Int32MemoTable

type Int32MemoTable = Table[int32]

type Int64MemoTable

type Int64MemoTable = Table[int64]

type Int8MemoTable

type Int8MemoTable = Table[int8]

type MemoTable

type MemoTable interface {
	TypeTraits() TypeTraits
	// Reset drops everything in the table allowing it to be reused
	Reset()
	// Size returns the current number of unique values stored in
	// the table, including whether or not a null value has been
	// inserted via GetOrInsertNull.
	Size() int
	// CopyValues populates out with the values currently in the table, out must
	// be a slice of the appropriate type for the table type.
	CopyValues(out any)
	// CopyValuesSubset is like CopyValues but only copies a subset of values starting
	// at the indicated index.
	CopyValuesSubset(start int, out any)
	// Get returns the index of the table the specified value is, and a boolean indicating
	// whether or not the value was found in the table. Will panic if val is not the appropriate
	// type for the underlying table.
	Get(val interface{}) (int, bool)
	// GetOrInsert returns the index of the table the specified value is,
	// and a boolean indicating whether or not the value was found in
	// the table (if false, the value was inserted). An error is returned
	// if val is not the appropriate type for the table.
	GetOrInsert(val interface{}) (idx int, existed bool, err error)
	// GetOrInsertBytes returns the index of the table the specified value is,
	// and a boolean indicating whether or not the value was found in
	// the table (if false, the value was inserted). An error is returned
	// if val is not the appropriate type for the table. This function is intended to be used by
	// the BinaryMemoTable to prevent unnecessary allocations of the data when converting from a []byte to interface{}.
	GetOrInsertBytes(val []byte) (idx int, existed bool, err error)
	// GetOrInsertNull returns the index of the null value in the table,
	// inserting one if it hasn't already been inserted. It returns a boolean
	// indicating if the null value already existed or not in the table.
	GetOrInsertNull() (idx int, existed bool)
	// GetNull returns the index of the null value in the table, but does not
	// insert one if it doesn't already exist. Will return -1 if it doesn't exist
	// indicated by a false value for the boolean.
	GetNull() (idx int, exists bool)
	// WriteOut copies the unique values of the memotable out to the byte slice
	// provided. Must have allocated enough bytes for all the values.
	WriteOut(out []byte)
	// WriteOutSubset is like WriteOut, but only writes a subset of values
	// starting with the index offset.
	WriteOutSubset(offset int, out []byte)
}

MemoTable interface for hash tables and dictionary encoding.

Values will remember the order they are inserted to generate a valid dictionary.

type MemoTypes added in v18.3.0

type MemoTypes interface {
	int8 | int16 | int32 | int64 |
		uint8 | uint16 | uint32 | uint64 |
		float32 | float64 | []byte
}

type NumericMemoTable

type NumericMemoTable interface {
	MemoTable
	WriteOutLE(out []byte)
	WriteOutSubsetLE(offset int, out []byte)
}

type Table added in v18.4.0

type Table[T fixedLenMemoTypes] struct {
	// contains filtered or unexported fields
}

func NewMemoTable added in v18.4.0

func NewMemoTable[T fixedLenMemoTypes](num int64) *Table[T]

func (*Table[T]) CopyValues added in v18.4.0

func (t *Table[T]) CopyValues(out any)

func (*Table[T]) CopyValuesSubset added in v18.4.0

func (t *Table[T]) CopyValuesSubset(start int, out any)

func (*Table[T]) Exists added in v18.4.0

func (t *Table[T]) Exists(val T) bool

func (*Table[T]) Get added in v18.4.0

func (t *Table[T]) Get(val any) (int, bool)

func (*Table[T]) GetNull added in v18.4.0

func (t *Table[T]) GetNull() (int, bool)

func (*Table[T]) GetOrInsert added in v18.4.0

func (t *Table[T]) GetOrInsert(val any) (idx int, found bool, err error)

func (*Table[T]) GetOrInsertBytes added in v18.4.0

func (t *Table[T]) GetOrInsertBytes(val []byte) (idx int, found bool, err error)

func (*Table[T]) GetOrInsertNull added in v18.4.0

func (t *Table[T]) GetOrInsertNull() (idx int, found bool)

func (*Table[T]) InsertOrGet added in v18.4.0

func (t *Table[T]) InsertOrGet(val T) (idx int, found bool, err error)

func (*Table[T]) Reset added in v18.4.0

func (t *Table[T]) Reset()

func (*Table[T]) Size added in v18.4.0

func (t *Table[T]) Size() int

func (*Table[T]) TypeTraits added in v18.4.0

func (t *Table[T]) TypeTraits() TypeTraits

func (*Table[T]) WriteOut added in v18.4.0

func (t *Table[T]) WriteOut(out []byte)

func (*Table[T]) WriteOutLE added in v18.4.0

func (t *Table[T]) WriteOutLE(out []byte)

func (*Table[T]) WriteOutSubset added in v18.4.0

func (t *Table[T]) WriteOutSubset(start int, out []byte)

func (*Table[T]) WriteOutSubsetLE added in v18.4.0

func (t *Table[T]) WriteOutSubsetLE(start int, out []byte)

type TypeTraits

type TypeTraits interface {
	BytesRequired(n int) int
}

type TypedMemoTable added in v18.3.0

type TypedMemoTable[T MemoTypes] interface {
	MemoTable
	Exists(T) bool
	InsertOrGet(val T) (idx int, found bool, err error)
}

type Uint16MemoTable

type Uint16MemoTable = Table[uint16]

type Uint32MemoTable

type Uint32MemoTable = Table[uint32]

type Uint64MemoTable

type Uint64MemoTable = Table[uint64]

type Uint8MemoTable

type Uint8MemoTable = Table[uint8]

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL