Documentation
¶
Overview ¶
Package hit provides hash-based utilities for dsco's deduplication and caching systems.
Overview ¶
The hit package offers efficient hash-based operations that support dsco's need for detecting duplicate configurations, caching computed values, and ensuring consistent behavior across multiple configuration processing cycles. It provides fast, collision-resistant hashing suitable for configuration-scale data.
Core Functions ¶
The package provides hash computation and comparison utilities optimized for dsco's data structures and usage patterns.
## Hash Computation
Generate consistent hash values for various Go types:
// Basic types
hashValue := hit.Hash("my-string")
hashValue = hit.Hash(42)
hashValue = hit.Hash(true)
// Complex types
config := MyConfig{Host: "localhost", Port: 8080}
hashValue = hit.Hash(config)
// Slices and maps
hashValue = hit.Hash([]string{"a", "b", "c"})
hashValue = hit.Hash(map[string]int{"key": 123})
## Content Addressing
Create content-addressable identifiers for configuration objects:
configID := hit.ContentID(configStruct)
// Use configID as a stable identifier for this configuration
if hit.ContentID(newConfig) == configID {
// Configuration unchanged, can reuse cached results
}
Deduplication Support ¶
The package enables efficient deduplication of configuration sources:
## Layer Deduplication
Prevent duplicate layers from being registered:
type layerRegistry struct {
registered map[uint64]bool // Using hit.Hash values
}
func (r *layerRegistry) Register(layer Layer) error {
layerHash := hit.Hash(layer.Identifier())
if r.registered[layerHash] {
return fmt.Errorf("layer already registered: %s", layer.Identifier())
}
r.registered[layerHash] = true
return nil
}
## Value Deduplication
Detect when the same configuration values are provided multiple times:
func detectDuplicateValues(values []ConfigValue) []ConfigValue {
seen := make(map[uint64]bool)
unique := make([]ConfigValue, 0)
for _, value := range values {
hash := hit.Hash(value.Content)
if !seen[hash] {
seen[hash] = true
unique = append(unique, value)
}
}
return unique
}
Caching Integration ¶
The package supports dsco's caching mechanisms:
## Model Caching
Cache computed struct models based on type signatures:
type ModelCache struct {
cache map[uint64]Model
}
func (c *ModelCache) GetOrBuild(t reflect.Type) Model {
typeHash := hit.Hash(t.String()) // Type signature hash
if model, exists := c.cache[typeHash]; exists {
return model
}
model := buildModel(t)
c.cache[typeHash] = model
return model
}
## Value Conversion Caching
Cache expensive type conversions:
type ConversionCache struct {
conversions map[uint64]interface{}
}
func (c *ConversionCache) Convert(value string, targetType reflect.Type) (interface{}, error) {
cacheKey := hit.Hash(struct{
Value string
Type string
}{
Value: value,
Type: targetType.String(),
})
if cached, exists := c.conversions[cacheKey]; exists {
return cached, nil
}
converted, err := performConversion(value, targetType)
if err != nil {
return nil, err
}
c.conversions[cacheKey] = converted
return converted, nil
}
Hash Algorithm ¶
The package uses a fast, collision-resistant hash algorithm suitable for configuration processing:
## Algorithm Choice
- **Speed**: Optimized for frequent hash computations during configuration processing - **Distribution**: Good distribution characteristics for typical configuration data - **Stability**: Hash values remain consistent across program runs - **Collision Resistance**: Sufficient resistance for configuration-scale datasets
## Data Serialization
Complex Go types are serialized consistently before hashing:
// Struct serialization preserves field order and types
type Config struct {
Host string `json:"host"`
Port int `json:"port"`
}
// These produce the same hash:
config1 := Config{Host: "localhost", Port: 8080}
config2 := Config{Host: "localhost", Port: 8080}
// These produce different hashes:
config3 := Config{Host: "localhost", Port: 8081}
## Type Handling
Different Go types are handled appropriately:
- **Basic types**: Direct value hashing - **Strings**: UTF-8 byte sequence hashing - **Structs**: Field-by-field hashing with type information - **Slices/Arrays**: Element hashing with length and type - **Maps**: Key-value pair hashing with deterministic ordering - **Pointers**: Dereference and hash pointed-to value - **Interfaces**: Hash concrete type and value
Performance Characteristics ¶
The package is optimized for dsco's usage patterns:
## Speed Optimization
- Fast hash computation for small to medium configuration objects - Minimal memory allocation during hashing process - Efficient serialization of Go data structures - Batch processing support for multiple values
## Memory Usage
- Minimal memory overhead per hash operation - No persistent memory usage (stateless operations) - Efficient handling of large configuration objects - Garbage collector friendly allocation patterns
## Scalability
The package scales well with: - Configuration object size (up to several MB) - Number of concurrent hash operations - Frequency of hash computations - Variety of Go types being hashed
Integration with dsco Components ¶
## Layer System
Layers use hit package for deduplication:
func (l *EnvLayer) register(to *layerBuilder) error {
layerID := hit.Hash(struct{
Type string
Prefix string
}{
Type: "env",
Prefix: l.prefix,
})
if to.hasLayer(layerID) {
return fmt.Errorf("env layer with prefix '%s' already registered", l.prefix)
}
to.registerLayer(layerID, l)
return nil
}
## Model System
Models use hit package for caching:
func buildModelWithCache(t reflect.Type) Model {
modelKey := hit.Hash(struct{
Package string
Name string
Fields []string
}{
Package: t.PkgPath(),
Name: t.Name(),
Fields: extractFieldNames(t),
})
if cached := getFromCache(modelKey); cached != nil {
return cached
}
model := buildModelFromType(t)
saveToCache(modelKey, model)
return model
}
## Value Processing
Values use hit package for change detection:
func processConfigurationValues(values []Value) ProcessingResult {
currentHash := hit.Hash(values)
if currentHash == previousHash {
// Configuration unchanged, return cached result
return cachedResult
}
result := performProcessing(values)
previousHash = currentHash
cachedResult = result
return result
}
Testing Coverage ¶
This package maintains 100% test coverage, including: - Hash computation for all supported Go types - Hash consistency across multiple computations - Hash distribution quality for typical configuration data - Collision resistance testing with large datasets - Performance benchmarking for various data sizes - Memory usage validation for large objects - Concurrent access safety testing
The test suite covers edge cases: - Empty values and nil pointers - Very large configuration objects - Deeply nested data structures - Hash collision probability estimation - Cross-platform hash consistency
Thread Safety ¶
All functions in the hit package are thread-safe: - No shared mutable state - Stateless hash computations - Safe for concurrent use from multiple goroutines - No coordination or locking required
This makes the package safe for use in dsco's concurrent configuration processing scenarios.
Error Handling ¶
The package is designed for robust operation: - Hash functions never panic on valid Go values - Graceful handling of complex or unusual data structures - Consistent behavior for edge cases (nil, empty values) - Deterministic error modes for invalid inputs
Future Extensions ¶
The package design allows for enhancements:
## Algorithm Selection
Support for different hash algorithms based on use case:
hashValue := hit.HashWith(data, hit.AlgorithmFast) // Speed priority hashValue = hit.HashWith(data, hit.AlgorithmSecure) // Security priority hashValue = hit.HashWith(data, hit.AlgorithmBalanced) // Default
## Custom Serialization
Support for custom serialization of specific types:
hit.RegisterSerializer(MyCustomType{}, customSerializer)
## Hash Validation
Support for hash validation and integrity checking:
isValid := hit.ValidateHash(data, expectedHash)
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type HashProvider ¶
type HashProvider[T hash.Hash] interface { // Get retrieves a hash instance from the pool or creates a new one. // The returned instance is ready for use and should be returned via PutBack // when done. Get() T // PutBack returns a hash instance to the pool for reuse. // This enables efficient resource management and reduces garbage collection // pressure. PutBack(h T) }
HashProvider defines an interface for providing hash instances with pooling support. This generic interface allows efficient reuse of hash instances, reducing allocation overhead during repeated hash computations.
type IntNode ¶
type IntNode struct {
// contains filtered or unexported fields
}
IntNode represents a hash tree node that contains an integer value. It implements the MerkelNode interface and provides content-addressable hashing for integer data.
func NewIntNode ¶
func NewIntNode( hashProvider HashProvider[hash.Hash], salt []byte, id string, value int, ) *IntNode
NewIntNode creates a new IntNode with the specified value and computes its hash. The hash is computed using the provided salt for security and collision resistance.
Parameters:
- hashProvider: Provider for hash instances to avoid allocation overhead
- salt: Random bytes to prevent hash collision attacks
- id: Unique identifier for this node
- value: The integer value to store and hash
Returns a new IntNode with computed hash value.
type MerkelNode ¶
type MerkelNode interface {
// GetID returns the unique identifier for this node.
// The ID can be used to reference and locate the node within the tree
// structure.
GetID() string
// GetHash returns the computed hash value for this node.
// The hash represents the content and structure of the node and its
// children.
GetHash() []byte
}
MerkelNode defines the interface for nodes in a Merkle-like hash tree structure. Each node can provide its unique identifier and computed hash value, enabling efficient tree traversal and verification.
type NodeID ¶
type NodeID string
NodeID represents a unique identifier for hash nodes in the Merkle-like tree structure. It provides a string-based identifier that can be used to reference and locate specific nodes within the hash tree.
type StringNode ¶
type StringNode struct {
// contains filtered or unexported fields
}
StringNode represents a hash tree node that contains a string value. It implements the MerkelNode interface and provides content-addressable hashing for string data.
func NewStringNode ¶
func NewStringNode( hashProvider HashProvider[hash.Hash], salt []byte, id string, value string, ) *StringNode
NewStringNode creates a new StringNode with the specified value and computes its hash. The hash is computed using the provided salt for security and collision resistance.
Parameters:
- hashProvider: Provider for hash instances to avoid allocation overhead
- salt: Random bytes to prevent hash collision attacks
- id: Unique identifier for this node
- value: The string value to store and hash
Returns a new StringNode with computed hash value.