Documentation
¶
Overview ¶
Package blobfs provides a content-addressable blob storage system that stores files using SHA-256 hashing for deduplication and integrity.
Design ¶
The storage uses a two-level directory structure based on the SHA-256 hash of the blob key, which prevents filesystem limitations with too many files in a single directory while maintaining fast lookups.
Why content-addressable: Keys are hashed to create storage paths, which provides:
- Uniform distribution of files across directories
- Protection against path traversal attacks
- Predictable storage locations
Why separate metadata: Metadata is stored in a separate JSON file to allow atomic updates and queries without reading the blob content.
Usage ¶
Basic operations:
storage, err := NewBlobStorage("/data/blobs")
if err != nil {
return err
}
// Store a blob
err = storage.Put(ctx, "documents/invoice.pdf", reader)
// Retrieve a blob
rc, err := storage.Get(ctx, "documents/invoice.pdf")
defer rc.Close()
// List blobs with prefix
iter := storage.List(ctx, "documents/")
defer iter.Close()
for iter.Next() {
meta := iter.Meta()
// process metadata...
}
Concurrency ¶
All operations are safe for concurrent use. Multiple goroutines may call methods on the same BlobStorage instance simultaneously.
Error Handling ¶
All methods return errors that can be unwrapped using errors.Is and errors.As for standard error types like os.ErrNotExist.
Index ¶
- Variables
- func BucketShardFunc(key string) string
- func DefaultShardFunc(key string) string
- type Blob
- type BlobResultdeprecated
- type ID
- type Meta
- type OptionFunc
- type Options
- type ShardFunc
- type Storage
- func (bs *Storage) Delete(ctx context.Context, key string) error
- func (bs *Storage) Exists(ctx context.Context, key string) (bool, error)
- func (bs *Storage) Get(ctx context.Context, key string) (io.ReadCloser, error)
- func (bs *Storage) List(ctx context.Context, prefix string) *BlobResultdeprecated
- func (bs *Storage) NewBlob() (*Blob, error)
- func (bs *Storage) Put(ctx context.Context, key string, r io.Reader) error
- func (bs *Storage) Stat(ctx context.Context, key string) (*Meta, error)
- func (bs *Storage) Walk(ctx context.Context, prefix string, fn WalkFn) error
- type WalkFn
Constants ¶
This section is empty.
Variables ¶
var ( ErrBlobClosed = errors.New("blob is closed") ErrBlobNotReady = errors.New("blob not ready for finalization") )
Functions ¶
func BucketShardFunc ¶
BucketShardFunc organizes blobs into buckets based on key prefix. Extracts the first path segment as a bucket name, then applies two-level hash sharding within that bucket: "bucket/a3/f2/a3f29d4e8c..."
Keys without "/" are placed in "misc" bucket. Example: "users/avatar.jpg" → "users/a3/f2/a3f29d4e8c..."
"document.pdf" → "misc/b7/e4/b7e4c2a1f8..."
func DefaultShardFunc ¶
DefaultShardFunc is the default two-level sharding strategy. Uses first 2 bytes of SHA-256 hash for two-level directory structure: "a3/f2/a3f29d4e8c..."
Why two levels: Creates 256 * 256 = 65,536 possible directories, preventing filesystem performance degradation with large blob counts.
Types ¶
type Blob ¶
type Blob struct {
// contains filtered or unexported fields
}
Blob represents a writable blob in the storage. It implements io.Writer and io.Closer, allowing it to be used with io.Copy and other standard Go interfaces.
Why separate temp file: Using a temporary file during writes allows atomic creation - the blob either fully exists or doesn't, preventing partial writes from being visible. This is critical for data consistency.
Why hash during write: Computing the hash while writing avoids reading the entire file again after writing, which would be inefficient for large files.
Example usage:
blob, err := storage.NewBlob()
if err != nil {
return err
}
defer blob.Discard() // Safety: cleanup if not committed
if _, err = io.Copy(blob, sourceReader); err != nil {
return err
}
// Commit with specific key (e.g., content hash for CAS)
return blob.CommitAs(blob.Hash())
func (*Blob) Close ¶
Close is an alias for Discard(). It closes the blob and removes the temporary file without committing. To persist the blob, use CommitAs() instead.
This allows Blob to satisfy io.Closer for compatibility with defer patterns, but does NOT commit the blob to storage.
func (*Blob) CommitAs ¶
CommitAs finalizes the blob by committing it with the specified key. The blob is atomically moved from the temporary location to the final storage location.
This method:
- Validates the key
- Detects content type from buffered data
- Creates metadata
- Atomically moves temp file to final location
Returns ErrEmptyKey if key is empty, or ErrBlobClosed if already closed/committed. After successful commit, the blob is closed and cannot be reused.
Why atomic move: os.Rename is atomic on most filesystems, ensuring the blob either fully exists with metadata or doesn't exist at all. This prevents other processes from reading partial or inconsistent data.
func (*Blob) Discard ¶
Discard closes the blob and removes the temporary file without committing. This is safe to call even if the blob has already been closed or committed. Idempotent - safe to call multiple times.
func (*Blob) Hash ¶
Hash returns the computed SHA-256 hash of the blob content as a hex string. This can be called before committing to determine the content hash. Returns empty string if no data has been written yet.
func (*Blob) Meta ¶ added in v0.1.2
Meta returns the metadata of the committed blob. Returns nil if the blob has not been successfully committed yet. This allows access to metadata without re-reading the meta file.
func (*Blob) Size ¶
Size returns the current size of the blob in bytes. This can be called before committing to get the current written size.
func (*Blob) Write ¶
Write implements io.Writer, writing data to the blob. The first 512 bytes are buffered for content type detection.
Why buffer first 512 bytes: http.DetectContentType needs up to 512 bytes to accurately determine MIME type from file signatures, but we don't want to buffer the entire file in memory for large files.
type BlobResult
deprecated
type BlobResult struct {
// contains filtered or unexported fields
}
BlobResult provides an iterator over blob storage entries. It follows the standard Go iterator pattern for streaming results.
Why streaming: Loading all blobs into memory at once would be inefficient for large storages. The iterator pattern allows processing blobs one at a time.
Why context in struct: While storing context in a struct is generally an anti-pattern, here it's necessary for the iterator to respect cancellation throughout its lifetime, not just at creation time.
Deprecated: Use Storage.Walk instead. BlobResult and List will be removed in a future version.
func (*BlobResult) Close ¶
func (br *BlobResult) Close() error
Close stops the iteration and releases resources. It's safe to call multiple times.
func (*BlobResult) Err ¶
func (br *BlobResult) Err() error
Err returns any error that occurred during iteration. Should be checked after Next() returns false.
func (*BlobResult) Key ¶
func (br *BlobResult) Key() string
Key returns the current blob's key. Only valid after Next() returns true.
func (*BlobResult) Meta ¶
func (br *BlobResult) Meta() *Meta
Meta returns the current blob's metadata. Only valid after Next() returns true.
func (*BlobResult) Next ¶
func (br *BlobResult) Next() bool
Next advances the iterator to the next blob. Returns true if there is a blob available, false if iteration is complete or an error occurred.
type Meta ¶
type Meta struct {
Key string `json:"key"` // Original user-provided key
Size int64 `json:"size"` // Size in bytes
Sha256 string `json:"sha256"` // SHA-256 hash of content
ContentType string `json:"contentType"` // MIME type detected from content
CreatedAt time.Time `json:"createdAt"` // Original creation timestamp (preserved on update)
ModifiedAt time.Time `json:"modifiedAt"` // Last modification timestamp
}
Meta contains metadata about a stored blob. This is stored separately from the blob data to enable efficient metadata queries without reading the blob content.
type OptionFunc ¶
type OptionFunc func(opts *Options)
OptionFunc is a functional option for configuring BlobStorage.
func WithBlobDir ¶ added in v0.2.0
func WithBlobDir(dir string) OptionFunc
WithBlobDir sets the subdirectory name for blob storage within the root directory. An empty string uses the root directory directly without a subdirectory. Default is "blobs".
Examples:
// Use custom subdirectory
WithBlobDir("objects")
// Store blobs directly in root (no subdirectory)
WithBlobDir("")
func WithDirMode ¶
func WithDirMode(mode os.FileMode) OptionFunc
WithDirMode sets the directory permission mode for storage directories. Default is 0755 (owner read/write/execute, group and others read/execute).
func WithFileMode ¶
func WithFileMode(mode os.FileMode) OptionFunc
WithFileMode sets the file permission mode for blob data files. Default is 0644 (owner read/write, group and others read-only).
func WithShardFunc ¶
func WithShardFunc(fn ShardFunc) OptionFunc
WithShardFunc sets a custom sharding function for generating storage paths. The function receives a key and returns a relative path (without filename).
Default sharding: Two-level SHA-256 hash (e.g., "blobs/a3/f2/a3f29d4e8c...")
Example custom sharding by date:
WithShardFunc(func(key string) string {
now := time.Now()
hash := sha256.Sum256([]byte(key))
hexHash := hex.EncodeToString(hash[:])
return filepath.Join("blobs", now.Format("2006/01/02"), hexHash)
})
Example flat sharding (single directory per hash):
WithShardFunc(func(key string) string {
hash := sha256.Sum256([]byte(key))
hexHash := hex.EncodeToString(hash[:])
return filepath.Join("blobs", hexHash)
})
type Options ¶
type Options struct {
FileMode os.FileMode // Permission bits for blob data files
DirMode os.FileMode // Permission bits for directories
ShardFunc ShardFunc // Function to generate storage paths from keys
BlobDir string // Subdirectory for blob storage (empty string for root)
}
Options configures BlobStorage behavior.
type ShardFunc ¶
ShardFunc is a function that generates a storage path from a key. It receives the key and returns the relative path where the blob should be stored (without the filename). The returned path will be joined with the root directory.
The function should:
- Return a path relative to root (e.g., "blobs/a3/f2/a3f29d4e8c...")
- Create a deterministic path based on the key
- Distribute keys evenly to avoid filesystem hotspots
- Be safe from path traversal attacks
Example: For key "users/avatar.jpg", might return "blobs/a3/f2/a3f29d4e8c..."
type Storage ¶ added in v0.1.2
type Storage struct {
// contains filtered or unexported fields
}
func NewStorage ¶
func NewStorage(root string, opts ...OptionFunc) (*Storage, error)
func (*Storage) List
deprecated
added in
v0.1.2
func (bs *Storage) List(ctx context.Context, prefix string) *BlobResult
List returns an iterator over all blobs matching the given prefix. The prefix is matched against the original blob keys, not the hashed storage paths. An empty prefix matches all blobs.
Deprecated: Use Storage.Walk instead. List will be removed in a future version.
The iterator must be closed when done to prevent resource leaks:
iter := storage.List(ctx, "prefix/")
defer iter.Close()
for iter.Next() {
meta := iter.Meta()
// process meta...
}
if err := iter.Err(); err != nil {
// handle error
}
func (*Storage) NewBlob ¶ added in v0.1.2
NewBlob creates a new writable blob with a temporary internal ID. The blob must be explicitly committed with CommitAs(key) to persist it, or discarded with Discard() or Close() to clean up the temporary file.
The blob automatically:
- Creates a temporary file for writing
- Computes SHA-256 hash while writing
- Detects content type from the first 512 bytes
Example usage:
blob, err := storage.NewBlob()
if err != nil {
return err
}
defer blob.Discard() // Safety: cleanup if we don't commit
io.Copy(blob, reader)
hash := blob.Hash()
// Check if blob with this hash already exists
if exists, _ := storage.Exists(ctx, hash); exists {
return nil // Already stored
}
// Commit with hash as key
return blob.CommitAs(hash)
func (*Storage) Put ¶ added in v0.1.2
Put stores a blob with the given key by reading from the provided reader. If the key already exists, it will be overwritten while preserving the original creation timestamp.
Why preserve createdAt: This maintains the original creation time even when a blob is updated, allowing tracking of when an object was first created vs. when it was last modified.
Why temp file: Writing to a temporary file first ensures atomicity - the blob either fully exists or doesn't, preventing partial writes from being visible.
func (*Storage) Walk ¶ added in v0.3.0
Walk calls fn for each blob whose key has the given prefix, in filesystem order. An empty prefix matches all blobs. Return filepath.SkipAll from fn to stop early without error.
Unlike Storage.List, Walk is synchronous and requires no cleanup.
Example:
err := storage.Walk(ctx, "documents/", func(key string, meta *Meta, err error) error {
if err != nil {
return err
}
fmt.Println(key, meta.Size)
return nil
})
type WalkFn ¶ added in v0.3.0
WalkFn is the callback signature for Storage.Walk. Return filepath.SkipAll to stop iteration early without an error. Any other non-nil return value aborts the walk and is returned by Walk.