Documentation
¶
Overview ¶
Package turboquant implements the TurboQuant online vector quantization algorithm from "TurboQuant: Online Vector Quantization" (arXiv:2504.19874) by Google Research.
TurboQuant compresses float32 vectors into compact low-bit representations using a two-step approach: random orthogonal rotation followed by Lloyd-Max scalar quantization on a Beta distribution. The algorithm is data-oblivious, meaning no training data is needed — codebooks are derived analytically from the statistical properties of uniformly distributed unit-sphere vectors.
Algorithm ¶
After rotating a normalized vector by a random orthogonal matrix, each coordinate is approximately distributed as Beta((d-1)/2, (d-1)/2) where d is the vector dimension. A Lloyd-Max optimal scalar quantizer is pre-computed on this distribution, and each rotated coordinate is independently quantized by looking up the nearest centroid. Dequantization reverses the process: centroid lookup, inverse rotation (transpose), and norm rescaling.
Supported Bit Widths ¶
The SDK supports 2-bit, 3-bit, and 4-bit quantization. Higher bit widths yield better reconstruction quality (higher cosine similarity) at the cost of larger compressed size. For vectors with dimension ≥ 64, typical cosine similarities are ≥ 0.90 (2-bit), ≥ 0.96 (3-bit), and ≥ 0.99 (4-bit).
Usage ¶
The main entry point is NewTurboQuant, which builds (or retrieves from cache) the Lloyd-Max codebook and generates the rotation matrix:
tq, err := turboquant.NewTurboQuant(128, 4, 42) // dim=128, 4-bit, seed=42
if err != nil {
log.Fatal(err)
}
// Quantize a vector
qv, err := tq.Quantize(vec)
// Serialize to compact binary
data, err := tq.Serialize(qv)
// Deserialize and dequantize
qv2, err := tq.Deserialize(data)
restored, err := tq.Dequantize(qv2)
Batch operations (TurboQuant.QuantizeBatch, TurboQuant.DequantizeBatch) process multiple vectors concurrently using goroutines.
Codebooks are cached globally by (dimension, bitWidth), so creating multiple TurboQuant instances with the same parameters reuses the codebook.
Index ¶
- Constants
- func BetaPDF(x, alpha, beta float64) float64
- func BytesToFloat32s(src []byte) []float32
- func CompressionRatio(dimension, bitWidth int) float64
- func CosineSimilarity(a, b []float32) (float64, error)
- func Float32sToBytes(src []float32) []byte
- func Float32sToFloat64s(src []float32) []float64
- func Float32sToInts(src []float32) []int
- func Float32sToString(src []float32) string
- func Float64sToFloat32s(src []float64) []float32
- func IntsToFloat32s(src []int) []float32
- func ResetCodebookCache()
- func SerializeQuantizedVector(qv *QuantizedVector, bitWidth int) ([]byte, error)
- func SerializeQuantizedVectorTo(qv *QuantizedVector, bitWidth int, w io.Writer) error
- func StringToFloat32s(s string) []float32
- func ValidateBitWidth(bitWidth int) error
- func ValidateDimension(dimension int) error
- type Codebook
- type CodebookBuilder
- type Matrix
- type Option
- type QuantizedVector
- type TurboQuant
- func (tq *TurboQuant) BitWidth() int
- func (tq *TurboQuant) CompressionRatio() float64
- func (tq *TurboQuant) Concurrency() int
- func (tq *TurboQuant) Dequantize(qv *QuantizedVector) ([]float32, error)
- func (tq *TurboQuant) DequantizeBatch(qvs []*QuantizedVector) ([][]float32, error)
- func (tq *TurboQuant) DequantizeBatchFloat64(qvs []*QuantizedVector) ([][]float64, error)
- func (tq *TurboQuant) DequantizeFloat64(qv *QuantizedVector) ([]float64, error)
- func (tq *TurboQuant) Deserialize(data []byte) (*QuantizedVector, error)
- func (tq *TurboQuant) DeserializeBatchFrom(r io.Reader) ([]*QuantizedVector, error)
- func (tq *TurboQuant) DeserializeFrom(r io.Reader) (*QuantizedVector, error)
- func (tq *TurboQuant) Dimension() int
- func (tq *TurboQuant) Quantize(vec []float32) (*QuantizedVector, error)
- func (tq *TurboQuant) QuantizeBatch(vecs [][]float32) ([]*QuantizedVector, error)
- func (tq *TurboQuant) QuantizeBatchFloat64(vecs [][]float64) ([]*QuantizedVector, error)
- func (tq *TurboQuant) QuantizeFloat64(vec []float64) (*QuantizedVector, error)
- func (tq *TurboQuant) Serialize(qv *QuantizedVector) ([]byte, error)
- func (tq *TurboQuant) SerializeBatchTo(qvs []*QuantizedVector, w io.Writer) error
- func (tq *TurboQuant) SerializeTo(qv *QuantizedVector, w io.Writer) error
Examples ¶
Constants ¶
const ( Bit2 = 2 Bit3 = 3 Bit4 = 4 )
BitWidth constants for supported quantization bit widths.
Variables ¶
This section is empty.
Functions ¶
func BetaPDF ¶
BetaPDF computes the probability density of the Beta(alpha, beta) distribution at x. Uses log-space computation via math.Lgamma to avoid numerical overflow. Returns 0.0 for x outside the open interval (0, 1).
func BytesToFloat32s ¶
BytesToFloat32s converts a []byte slice to []float32. Each byte value (0-255) is stored as a float32.
func CompressionRatio ¶
CompressionRatio computes the theoretical compression ratio for a given dimension and bit width. Formula: (dimension * 32) / (32 + dimension * bitWidth) Original size: dimension * 32 bits (one float32 per element). Compressed size: 32 bits (float32 norm) + dimension * bitWidth bits.
func CosineSimilarity ¶
CosineSimilarity computes the cosine similarity between two float32 vectors. Returns a float64 value in [-1, 1]. Returns an error if the vectors have different dimensions. Returns 0.0 if either vector is a zero vector.
Example ¶
ExampleCosineSimilarity demonstrates computing cosine similarity between an original vector and its quantized-then-dequantized version.
package main
import (
"fmt"
"log"
"github.com/mredencom/turboquant"
)
func main() {
tq, err := turboquant.NewTurboQuant(8, 4, 42)
if err != nil {
log.Fatal(err)
}
vec := []float32{1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0}
qv, err := tq.Quantize(vec)
if err != nil {
log.Fatal(err)
}
restored, err := tq.Dequantize(qv)
if err != nil {
log.Fatal(err)
}
sim, err := turboquant.CosineSimilarity(vec, restored)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Cosine similarity: %.4f\n", sim)
}
Output: Cosine similarity: 0.9979
func Float32sToBytes ¶
Float32sToBytes converts a []float32 slice back to []byte by rounding and clamping to [0, 255].
func Float32sToFloat64s ¶
Float32sToFloat64s converts a []float32 slice to []float64.
func Float32sToInts ¶
Float32sToInts converts a []float32 slice back to []int by rounding.
func Float32sToString ¶
Float32sToString converts a []float32 slice back to a string by rounding each value to a byte.
func Float64sToFloat32s ¶
Float64sToFloat32s converts a []float64 slice to []float32.
func IntsToFloat32s ¶
IntsToFloat32s converts a []int slice to []float32.
func ResetCodebookCache ¶
func ResetCodebookCache()
ResetCodebookCache clears the global codebook cache. Intended for testing.
func SerializeQuantizedVector ¶
func SerializeQuantizedVector(qv *QuantizedVector, bitWidth int) ([]byte, error)
SerializeQuantizedVector serializes a QuantizedVector into a compact binary format. Format: [4 bytes float32 norm (little-endian)][bit-packed indices] Packing rules:
- 2-bit: 4 indices per byte, low bits first
- 3-bit: bitstream, indices packed continuously across byte boundaries
- 4-bit: 2 indices per byte, low nibble first
func SerializeQuantizedVectorTo ¶ added in v0.0.2
func SerializeQuantizedVectorTo(qv *QuantizedVector, bitWidth int, w io.Writer) error
SerializeQuantizedVectorTo writes a QuantizedVector directly to an io.Writer using the same binary format as SerializeQuantizedVector.
func StringToFloat32s ¶
StringToFloat32s converts a string to []float32 by treating each byte as a float32 value. This is a raw byte-level conversion, not a semantic embedding.
func ValidateBitWidth ¶
ValidateBitWidth returns an error if bitWidth is not one of the supported values (2, 3, or 4).
func ValidateDimension ¶
ValidateDimension returns an error if dimension is less than 2.
Types ¶
type Codebook ¶
type Codebook struct {
Centroids []float64 // 2^BitWidth centroids, sorted ascending
Boundaries []float64 // 2^BitWidth - 1 partition boundaries
BitWidth int
}
Codebook contains the centroids and partition boundaries of a Lloyd-Max quantizer.
func GetOrBuildCodebook ¶
GetOrBuildCodebook returns a cached Codebook for the given parameters, or builds a new one using NewCodebookBuilder().Build and caches it. Thread-safe via sync.Map.
func (*Codebook) FindNearestIndex ¶
FindNearestIndex finds the index of the nearest centroid for the given value using binary search on the partition boundaries. The boundaries divide the real line into intervals, each mapped to a centroid index. Interval mapping: (-inf, b[0]) -> 0, [b[0], b[1]) -> 1, ..., [b[n-1], +inf) -> n
type CodebookBuilder ¶
type CodebookBuilder struct {
// contains filtered or unexported fields
}
CodebookBuilder constructs a Codebook by running Lloyd-Max optimization on the Beta distribution derived from the vector dimension.
func NewCodebookBuilder ¶
func NewCodebookBuilder() *CodebookBuilder
NewCodebookBuilder returns a CodebookBuilder with default parameters (gridPoints=50000, iterations=300).
func (*CodebookBuilder) Build ¶
func (cb *CodebookBuilder) Build(dimension, bitWidth int) (*Codebook, error)
Build constructs a Codebook for the given dimension and bitWidth using Lloyd-Max optimization on the Beta((d-1)/2, (d-1)/2) distribution.
The Beta distribution is defined on (0,1) and mapped to (-1,1) via x_mapped = 2*x - 1. Returns an error if bitWidth is not 2, 3, or 4, or if dimension < 2.
type Matrix ¶
type Matrix struct {
// contains filtered or unexported fields
}
Matrix represents a dense matrix, internally using gonum/mat.Dense.
func NewRandomOrthogonalMatrix ¶
NewRandomOrthogonalMatrix generates a random orthogonal matrix. Obtained by QR decomposition of a random Gaussian matrix. Same seed produces the same matrix. Returns an error if dimension < 2.
func (*Matrix) ApplyInto ¶ added in v0.0.2
ApplyInto multiplies the matrix by a vector, writing the result into dst. dst must have length >= m.dim.
func (*Matrix) ApplyTranspose ¶
ApplyTranspose multiplies the matrix transpose by a vector: result = M^T * vec.
func (*Matrix) ApplyTransposeInto ¶ added in v0.0.2
ApplyTransposeInto multiplies the matrix transpose by a vector, writing the result into dst. dst must have length >= m.dim.
type Option ¶ added in v0.0.2
type Option func(*options)
Option is a functional option for configuring NewTurboQuant.
func WithConcurrency ¶ added in v0.0.2
WithConcurrency sets the maximum number of concurrent goroutines used by QuantizeBatch and DequantizeBatch. The default (0) resolves to runtime.NumCPU(). Values less than 1 are treated as runtime.NumCPU().
func WithGridPoints ¶ added in v0.0.2
WithGridPoints sets the number of grid points for numerical integration in the Lloyd-Max codebook builder. Default is 50000.
func WithIterations ¶ added in v0.0.2
WithIterations sets the number of Lloyd-Max iterations for codebook construction. Default is 300.
type QuantizedVector ¶
type QuantizedVector struct {
Norm float32 // L2 norm of the original vector
Indices []uint8 // Quantization index for each coordinate
}
QuantizedVector represents a quantized vector, containing the original L2 norm and an array of quantization indices.
func DeserializeQuantizedVector ¶
func DeserializeQuantizedVector(data []byte, bitWidth, dimension int) (*QuantizedVector, error)
DeserializeQuantizedVector deserializes a compact binary byte slice back into a QuantizedVector. Format: [4 bytes float32 norm (little-endian)][bit-packed indices] Returns a format error if the byte slice length does not match the expected size.
func DeserializeQuantizedVectorFrom ¶ added in v0.0.2
func DeserializeQuantizedVectorFrom(r io.Reader, bitWidth, dimension int) (*QuantizedVector, error)
DeserializeQuantizedVectorFrom reads and deserializes a QuantizedVector from an io.Reader. It uses the same binary format as DeserializeQuantizedVector.
type TurboQuant ¶
type TurboQuant struct {
// contains filtered or unexported fields
}
TurboQuant is the core entry point of the SDK, encapsulating all quantization functionality.
func NewTurboQuant ¶
func NewTurboQuant(dimension, bitWidth int, seed int64, opts ...Option) (*TurboQuant, error)
NewTurboQuant creates and initializes a quantizer instance. dimension: vector dimension, must be >= 2 bitWidth: quantization bit width, must be 2, 3, or 4 seed: random seed for rotation matrix generation; same seed produces same matrix opts: optional functional options (WithGridPoints, WithIterations)
Example ¶
ExampleNewTurboQuant demonstrates creating a TurboQuant quantizer instance.
package main
import (
"fmt"
"log"
"github.com/mredencom/turboquant"
)
func main() {
tq, err := turboquant.NewTurboQuant(8, 4, 42) // dim=8, 4-bit, seed=42
if err != nil {
log.Fatal(err)
}
fmt.Printf("Dimension: %d\n", tq.Dimension())
fmt.Printf("BitWidth: %d\n", tq.BitWidth())
fmt.Printf("CompressionRatio: %.2f\n", tq.CompressionRatio())
}
Output: Dimension: 8 BitWidth: 4 CompressionRatio: 4.00
func (*TurboQuant) BitWidth ¶
func (tq *TurboQuant) BitWidth() int
BitWidth returns the quantization bit width of this quantizer.
func (*TurboQuant) CompressionRatio ¶
func (tq *TurboQuant) CompressionRatio() float64
CompressionRatio returns the theoretical compression ratio for the current configuration.
func (*TurboQuant) Concurrency ¶ added in v0.0.2
func (tq *TurboQuant) Concurrency() int
Concurrency returns the maximum number of concurrent goroutines used by batch operations.
func (*TurboQuant) Dequantize ¶
func (tq *TurboQuant) Dequantize(qv *QuantizedVector) ([]float32, error)
Dequantize reconstructs a float32 vector from a QuantizedVector.
func (*TurboQuant) DequantizeBatch ¶
func (tq *TurboQuant) DequantizeBatch(qvs []*QuantizedVector) ([][]float32, error)
DequantizeBatch dequantizes multiple QuantizedVectors concurrently using a worker pool. Concurrency is controlled by the WithConcurrency option (default: runtime.NumCPU()).
func (*TurboQuant) DequantizeBatchFloat64 ¶ added in v0.0.2
func (tq *TurboQuant) DequantizeBatchFloat64(qvs []*QuantizedVector) ([][]float64, error)
DequantizeBatchFloat64 batch-dequantizes multiple QuantizedVectors, returning float64 vectors. It delegates to DequantizeBatch, then converts each result to float64.
func (*TurboQuant) DequantizeFloat64 ¶ added in v0.0.2
func (tq *TurboQuant) DequantizeFloat64(qv *QuantizedVector) ([]float64, error)
DequantizeFloat64 reconstructs a float64 vector from a QuantizedVector. It delegates to Dequantize, then converts the result to float64 using Float32sToFloat64s.
func (*TurboQuant) Deserialize ¶
func (tq *TurboQuant) Deserialize(data []byte) (*QuantizedVector, error)
Deserialize deserializes a binary byte slice into a QuantizedVector.
func (*TurboQuant) DeserializeBatchFrom ¶ added in v0.0.2
func (tq *TurboQuant) DeserializeBatchFrom(r io.Reader) ([]*QuantizedVector, error)
DeserializeBatchFrom reads multiple QuantizedVectors from an io.Reader. Expects a 4-byte uint32 count header followed by that many serialized vectors.
func (*TurboQuant) DeserializeFrom ¶ added in v0.0.2
func (tq *TurboQuant) DeserializeFrom(r io.Reader) (*QuantizedVector, error)
DeserializeFrom reads and deserializes a QuantizedVector from an io.Reader.
func (*TurboQuant) Dimension ¶
func (tq *TurboQuant) Dimension() int
Dimension returns the vector dimension of this quantizer.
func (*TurboQuant) Quantize ¶
func (tq *TurboQuant) Quantize(vec []float32) (*QuantizedVector, error)
Quantize quantizes a single float32 vector into a QuantizedVector.
Example ¶
ExampleTurboQuant_Quantize demonstrates quantizing a float32 vector.
package main
import (
"fmt"
"log"
"github.com/mredencom/turboquant"
)
func main() {
tq, err := turboquant.NewTurboQuant(8, 4, 42)
if err != nil {
log.Fatal(err)
}
vec := []float32{1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0}
qv, err := tq.Quantize(vec)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Norm: %.4f\n", qv.Norm)
fmt.Printf("Indices length: %d\n", len(qv.Indices))
}
Output: Norm: 14.2829 Indices length: 8
func (*TurboQuant) QuantizeBatch ¶
func (tq *TurboQuant) QuantizeBatch(vecs [][]float32) ([]*QuantizedVector, error)
QuantizeBatch quantizes multiple vectors concurrently using a worker pool. Concurrency is controlled by the WithConcurrency option (default: runtime.NumCPU()). All vectors must have the same dimension as the TurboQuant instance. If any vector has a mismatched dimension, returns an error indicating the first such index.
func (*TurboQuant) QuantizeBatchFloat64 ¶ added in v0.0.2
func (tq *TurboQuant) QuantizeBatchFloat64(vecs [][]float64) ([]*QuantizedVector, error)
QuantizeBatchFloat64 batch-quantizes multiple float64 vectors with concurrent execution. Each vector is converted to float32 before quantization.
func (*TurboQuant) QuantizeFloat64 ¶ added in v0.0.2
func (tq *TurboQuant) QuantizeFloat64(vec []float64) (*QuantizedVector, error)
QuantizeFloat64 quantizes a single float64 vector into a QuantizedVector. It converts the input to float32 using Float64sToFloat32s, then delegates to Quantize.
func (*TurboQuant) Serialize ¶
func (tq *TurboQuant) Serialize(qv *QuantizedVector) ([]byte, error)
Serialize serializes a QuantizedVector into a compact binary byte slice.
Example ¶
ExampleTurboQuant_Serialize demonstrates the full serialize/deserialize round-trip.
package main
import (
"fmt"
"log"
"github.com/mredencom/turboquant"
)
func main() {
tq, err := turboquant.NewTurboQuant(8, 4, 42)
if err != nil {
log.Fatal(err)
}
vec := []float32{1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0}
qv, err := tq.Quantize(vec)
if err != nil {
log.Fatal(err)
}
// Serialize to compact binary
data, err := tq.Serialize(qv)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Serialized bytes: %d\n", len(data))
// Deserialize back
qv2, err := tq.Deserialize(data)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Deserialized norm: %.4f\n", qv2.Norm)
fmt.Printf("Indices match: %v\n", indicesEqual(qv.Indices, qv2.Indices))
}
func indicesEqual(a, b []uint8) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i] != b[i] {
return false
}
}
return true
}
Output: Serialized bytes: 8 Deserialized norm: 14.2829 Indices match: true
func (*TurboQuant) SerializeBatchTo ¶ added in v0.0.2
func (tq *TurboQuant) SerializeBatchTo(qvs []*QuantizedVector, w io.Writer) error
SerializeBatchTo writes multiple QuantizedVectors sequentially to an io.Writer. Format: 4-byte uint32 count (little-endian) followed by count serialized vectors.
func (*TurboQuant) SerializeTo ¶ added in v0.0.2
func (tq *TurboQuant) SerializeTo(qv *QuantizedVector, w io.Writer) error
SerializeTo writes a QuantizedVector directly to an io.Writer using the compact binary format.