Documentation
¶
Index ¶
- func HashBytes(b []byte) [32]byte
- func HashFile(path string) (string, error)
- func HashHex(b []byte) string
- func HashReader(r io.Reader) (string, error)
- func HasherName() string
- func ResetDefault()
- func SetContentHasher(h ContentHasher)
- type ContentHasher
- type Memo
- func (m *Memo) Clear()
- func (m *Memo) HashContent(path string, content []byte) string
- func (m *Memo) HashFile(path string, provider func() ([]byte, error)) (string, error)
- func (m *Memo) HashFileRaw(path string, provider func() ([]byte, error)) ([32]byte, error)
- func (m *Memo) Len() int
- func (m *Memo) Stats() (hits, misses uint64)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func HashFile ¶
HashFile is a convenience wrapper that opens path, streams it through HashReader, and returns the hex digest. Returns the underlying os error on open failure (do not wrap — oracle.ContentHash's current callers check os.IsNotExist on the return).
func HashHex ¶
HashHex returns the digest of b as a lowercase hex string. Use this for on-disk cache keys, fingerprints, and any user-visible identifier.
func HashReader ¶
HashReader streams r into the active hasher and returns the lowercase hex digest. Used by the oracle content-hash path to avoid reading whole files into memory.
func HasherName ¶
func HasherName() string
HasherName returns the Name() of the active ContentHasher. Embedded in cache version tokens so an algorithm swap auto-invalidates every cache that keys on content hash.
func ResetDefault ¶
func ResetDefault()
ResetDefault clears the shared Memo. The CLI calls this at the start of each scan invocation so memoized hashes do not bleed into a subsequent run where files may have changed.
func SetContentHasher ¶
func SetContentHasher(h ContentHasher)
SetContentHasher swaps the active hasher. Intended for benchmarks and future algorithm experiments; production code should leave the default (xxh3-256) in place. The swap is NOT safe for concurrent use — call it during process init or from a test's SetUp before any hashing runs.
Types ¶
type ContentHasher ¶
type ContentHasher interface {
// Name is a stable identifier embedded in cache version tokens so
// an algorithm swap automatically invalidates prior entries.
Name() string
// Sum returns the 32-byte digest of b.
Sum(b []byte) [32]byte
// New returns a streaming hash.Hash whose Sum(nil) matches
// Sum(b)[:] when fed the same bytes. Size() is 32.
New() hash.Hash
}
ContentHasher abstracts the hash function used for content fingerprints (parse cache, cross-file cache, oracle cache, incremental cache). Implementations must return a 32-byte digest so existing store.Key FileHash [32]byte layouts keep working.
The default is xxh3-128 widened to 256 bits via two distinct seeds — non-crypto but SIMD-accelerated on both amd64 (AVX / AVX-512) and arm64 (NEON), so it stays ahead of hardware SHA-256 on Apple Silicon (~5×) and beats software SHA-256 on typical Linux CI hardware by ~10×. The interface leaves room for swapping in blake3, wyhash, or a successor via SetContentHasher without disturbing any callers.
func Hasher ¶
func Hasher() ContentHasher
Hasher returns the currently installed ContentHasher. Subsystems that need streaming semantics (e.g. oracle closureFingerprint) call Hasher().New() instead of importing a specific algorithm.
type Memo ¶
type Memo struct {
// contains filtered or unexported fields
}
Memo memoizes file content hashes for the duration of a single run. The cache is keyed by (path, size, mtime); a file whose stat fingerprint changes is re-hashed on the next lookup. A Memo is safe for concurrent use.
Memo is deliberately scoped to a single invocation — callers build one, pass it to every subsystem that would otherwise hash the same file independently, and discard it when the run completes. Persisting a Memo across runs would return stale hashes if a file was modified between invocations under the same (path, size, mtime) triple.
The nil *Memo is a valid disabled memo: every method falls through to the unmemoized hashutil helpers and no entries are retained. This keeps callers that haven't yet been wired to a shared Memo working unchanged.
func Default ¶
func Default() *Memo
Default returns the process-scoped shared Memo. Subsystems that need to cooperate on file hashing should use Default() rather than instantiating a private Memo, so all redundant content-hash computations within a single run collapse to one per unique file.
func (*Memo) HashContent ¶
HashContent returns the hex digest of content under the active ContentHasher and, if path is non-empty and a stat succeeds, memoizes the result so subsequent HashFile(path) calls within this Memo return the same digest without re-reading or re-hashing. Use this from callers that already hold the file bytes (e.g. after reading once into memory for parsing).
func (*Memo) HashFile ¶
HashFile returns the lowercase hex digest of the file at path using the active ContentHasher. The returned digest is memoized against the file's (size, mtime); a later call for the same unchanged file hits the cache. If provider is non-nil and a hash actually needs to be computed, it is invoked to obtain the bytes instead of re-reading from disk — useful for callers that already have the file content in memory (e.g. the parse / cross-file caches).
A nil *Memo falls through to an unmemoized hash.
func (*Memo) HashFileRaw ¶
HashFileRaw is like HashFile but returns the raw 32-byte digest.