blobfs

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2026 License: MIT Imports: 17 Imported by: 0

README

blobfs

A simple blob storage library for Go that stores files with keys on disk.

Note: This is a personal project for my own use. It works well for my needs, but it's not battle-tested for production environments.

What it does

This library stores files (blobs) on disk using keys you provide. The keys are hashed to create a directory structure that keeps your filesystem organized, even with thousands of files.

Important: This is not a content-addressable storage (CAS). Your keys are not based on file content. Instead, the keys you provide are hashed only for creating an organized directory structure (sharding). You can use any key you want, like "documents/invoice.pdf" or "user123/avatar.jpg".

However, this library can be used as a foundation to build:

  • A content-addressable storage (by using content hashes as keys)
  • Any other key-value blob storage you need

Installation

go get github.com/alexjoedt/blobfs

Basic Usage

Simple example
package main

import (
    "context"
    "log"
    "strings"

    "github.com/alexjoedt/blobfs"
)

func main() {
    ctx := context.Background()
    
    // Create storage in ./data directory
    storage, err := blobfs.NewStorage("./data")
    if err != nil {
        log.Fatal(err)
    }
    
    // Store a blob
    content := strings.NewReader("Hello, World!")
    err = storage.Put(ctx, "greetings/hello.txt", content)
    if err != nil {
        log.Fatal(err)
    }
    
    // Retrieve a blob
    reader, err := storage.Get(ctx, "greetings/hello.txt")
    if err != nil {
        log.Fatal(err)
    }
    defer reader.Close()
    
    // Read the content
    data, _ := io.ReadAll(reader)
    println(string(data)) // Output: Hello, World!
}
Example with options
package main

import (
    "context"
    "log"
    "strings"

    "github.com/alexjoedt/blobfs"
)

func main() {
    ctx := context.Background()
    
    // Create storage with custom options
    storage, err := blobfs.NewStorage("./data",
        blobfs.WithFileMode(0600),              // Only owner can read/write
        blobfs.WithDirMode(0700),               // Only owner can access directories
        blobfs.WithShardFunc(blobfs.BucketShardFunc), // Use bucket-style sharding
    )
    if err != nil {
        log.Fatal(err)
    }
    
    // Store multiple blobs
    blobs := map[string]string{
        "users/alice/profile.json": `{"name": "Alice"}`,
        "users/bob/profile.json":   `{"name": "Bob"}`,
        "docs/readme.md":           "# Documentation",
    }
    
    for key, content := range blobs {
        err := storage.Put(ctx, key, strings.NewReader(content))
        if err != nil {
            log.Printf("Failed to store %s: %v", key, err)
        }
    }
    
    // Walk all blobs with "users/" prefix
    err = storage.Walk(ctx, "users/", func(key string, meta *blobfs.Meta, err error) error {
        if err != nil {
            return err
        }
        println("Found:", meta.Key, "Size:", meta.Size, "bytes")
        return nil
    })
    if err != nil {
        log.Fatal(err)
    }
}
Walk all blobs (no prefix filter)
err := storage.Walk(ctx, "", func(key string, meta *blobfs.Meta, err error) error {
    if err != nil {
        return err // or return nil to skip corrupted entries
    }
    fmt.Println(key, meta.Size)
    return nil
})

Return filepath.SkipAll from the callback to stop iteration early without an error.

API

Method Description
Put(ctx, key, reader) Store a blob
Get(ctx, key) Retrieve a blob as io.ReadCloser
Delete(ctx, key) Delete a blob
Stat(ctx, key) Read metadata without fetching content
Exists(ctx, key) Check whether a blob exists
Walk(ctx, prefix, fn) Iterate blobs matching a prefix via callback

Deprecated: List is deprecated in favour of Walk and will be removed in a future version.

Available Options

  • WithFileMode(mode) - Set file permissions (default: 0644)
  • WithDirMode(mode) - Set directory permissions (default: 0755)
  • WithShardFunc(fn) - Set custom sharding strategy (see below)

Sharding Strategies

The library includes several sharding functions to organize your files:

  • DefaultShardFunc - Two-level hash-based sharding (e.g., bb/4d/bb4de5c4...)
  • BucketShardFunc - Extracts bucket from key (e.g., users/alice/...users/hash...)

You can also write your own ShardFunc to customize how files are organized.

License

MIT

Documentation

Overview

Package blobfs provides a content-addressable blob storage system that stores files using SHA-256 hashing for deduplication and integrity.

Design

The storage uses a two-level directory structure based on the SHA-256 hash of the blob key, which prevents filesystem limitations with too many files in a single directory while maintaining fast lookups.

Why content-addressable: Keys are hashed to create storage paths, which provides:

  • Uniform distribution of files across directories
  • Protection against path traversal attacks
  • Predictable storage locations

Why separate metadata: Metadata is stored in a separate JSON file to allow atomic updates and queries without reading the blob content.

Usage

Basic operations:

storage, err := NewBlobStorage("/data/blobs")
if err != nil {
	return err
}

// Store a blob
err = storage.Put(ctx, "documents/invoice.pdf", reader)

// Retrieve a blob
rc, err := storage.Get(ctx, "documents/invoice.pdf")
defer rc.Close()

// List blobs with prefix
iter := storage.List(ctx, "documents/")
defer iter.Close()
for iter.Next() {
	meta := iter.Meta()
	// process metadata...
}

Concurrency

All operations are safe for concurrent use. Multiple goroutines may call methods on the same BlobStorage instance simultaneously.

Error Handling

All methods return errors that can be unwrapped using errors.Is and errors.As for standard error types like os.ErrNotExist.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrBlobClosed   = errors.New("blob is closed")
	ErrBlobNotReady = errors.New("blob not ready for finalization")
)
View Source
var (
	ErrKeyLengthExceeds = errors.New("maximal key length exceeds")
	ErrNotFound         = errors.New("blob with key not found")
	ErrEmptyKey         = errors.New("key cannot be empty")
	ErrInvalidKey       = errors.New("key contains invalid characters")
)

Functions

func BucketShardFunc

func BucketShardFunc(key string) string

BucketShardFunc organizes blobs into buckets based on key prefix. Extracts the first path segment as a bucket name, then applies two-level hash sharding within that bucket: "bucket/a3/f2/a3f29d4e8c..."

Keys without "/" are placed in "misc" bucket. Example: "users/avatar.jpg" → "users/a3/f2/a3f29d4e8c..."

"document.pdf"    → "misc/b7/e4/b7e4c2a1f8..."

func DefaultShardFunc

func DefaultShardFunc(key string) string

DefaultShardFunc is the default two-level sharding strategy. Uses first 2 bytes of SHA-256 hash for two-level directory structure: "a3/f2/a3f29d4e8c..."

Why two levels: Creates 256 * 256 = 65,536 possible directories, preventing filesystem performance degradation with large blob counts.

Types

type Blob

type Blob struct {
	// contains filtered or unexported fields
}

Blob represents a writable blob in the storage. It implements io.Writer and io.Closer, allowing it to be used with io.Copy and other standard Go interfaces.

Why separate temp file: Using a temporary file during writes allows atomic creation - the blob either fully exists or doesn't, preventing partial writes from being visible. This is critical for data consistency.

Why hash during write: Computing the hash while writing avoids reading the entire file again after writing, which would be inefficient for large files.

Example usage:

blob, err := storage.NewBlob()
if err != nil {
	return err
}
defer blob.Discard() // Safety: cleanup if not committed

if _, err = io.Copy(blob, sourceReader); err != nil {
	return err
}

// Commit with specific key (e.g., content hash for CAS)
return blob.CommitAs(blob.Hash())

func (*Blob) Close

func (b *Blob) Close() error

Close is an alias for Discard(). It closes the blob and removes the temporary file without committing. To persist the blob, use CommitAs() instead.

This allows Blob to satisfy io.Closer for compatibility with defer patterns, but does NOT commit the blob to storage.

func (*Blob) Closed

func (b *Blob) Closed() bool

Closed returns true if the blob has been closed or committed.

func (*Blob) CommitAs

func (b *Blob) CommitAs(key string) error

CommitAs finalizes the blob by committing it with the specified key. The blob is atomically moved from the temporary location to the final storage location.

This method:

  1. Validates the key
  2. Detects content type from buffered data
  3. Creates metadata
  4. Atomically moves temp file to final location

Returns ErrEmptyKey if key is empty, or ErrBlobClosed if already closed/committed. After successful commit, the blob is closed and cannot be reused.

Why atomic move: os.Rename is atomic on most filesystems, ensuring the blob either fully exists with metadata or doesn't exist at all. This prevents other processes from reading partial or inconsistent data.

func (*Blob) Discard

func (b *Blob) Discard() error

Discard closes the blob and removes the temporary file without committing. This is safe to call even if the blob has already been closed or committed. Idempotent - safe to call multiple times.

func (*Blob) Hash

func (b *Blob) Hash() string

Hash returns the computed SHA-256 hash of the blob content as a hex string. This can be called before committing to determine the content hash. Returns empty string if no data has been written yet.

func (*Blob) Meta added in v0.1.2

func (b *Blob) Meta() *Meta

Meta returns the metadata of the committed blob. Returns nil if the blob has not been successfully committed yet. This allows access to metadata without re-reading the meta file.

func (*Blob) Size

func (b *Blob) Size() int64

Size returns the current size of the blob in bytes. This can be called before committing to get the current written size.

func (*Blob) Write

func (b *Blob) Write(p []byte) (n int, err error)

Write implements io.Writer, writing data to the blob. The first 512 bytes are buffered for content type detection.

Why buffer first 512 bytes: http.DetectContentType needs up to 512 bytes to accurately determine MIME type from file signatures, but we don't want to buffer the entire file in memory for large files.

type BlobResult deprecated

type BlobResult struct {
	// contains filtered or unexported fields
}

BlobResult provides an iterator over blob storage entries. It follows the standard Go iterator pattern for streaming results.

Why streaming: Loading all blobs into memory at once would be inefficient for large storages. The iterator pattern allows processing blobs one at a time.

Why context in struct: While storing context in a struct is generally an anti-pattern, here it's necessary for the iterator to respect cancellation throughout its lifetime, not just at creation time.

Deprecated: Use Storage.Walk instead. BlobResult and List will be removed in a future version.

func (*BlobResult) Close

func (br *BlobResult) Close() error

Close stops the iteration and releases resources. It's safe to call multiple times.

func (*BlobResult) Err

func (br *BlobResult) Err() error

Err returns any error that occurred during iteration. Should be checked after Next() returns false.

func (*BlobResult) Key

func (br *BlobResult) Key() string

Key returns the current blob's key. Only valid after Next() returns true.

func (*BlobResult) Meta

func (br *BlobResult) Meta() *Meta

Meta returns the current blob's metadata. Only valid after Next() returns true.

func (*BlobResult) Next

func (br *BlobResult) Next() bool

Next advances the iterator to the next blob. Returns true if there is a blob available, false if iteration is complete or an error occurred.

type ID

type ID [12]byte

ID represents a 12-byte unique identifier similar to MongoDB's ObjectID.

type Meta

type Meta struct {
	Key         string    `json:"key"`         // Original user-provided key
	Size        int64     `json:"size"`        // Size in bytes
	Sha256      string    `json:"sha256"`      // SHA-256 hash of content
	ContentType string    `json:"contentType"` // MIME type detected from content
	CreatedAt   time.Time `json:"createdAt"`   // Original creation timestamp (preserved on update)
	ModifiedAt  time.Time `json:"modifiedAt"`  // Last modification timestamp
}

Meta contains metadata about a stored blob. This is stored separately from the blob data to enable efficient metadata queries without reading the blob content.

type OptionFunc

type OptionFunc func(opts *Options)

OptionFunc is a functional option for configuring BlobStorage.

func WithBlobDir added in v0.2.0

func WithBlobDir(dir string) OptionFunc

WithBlobDir sets the subdirectory name for blob storage within the root directory. An empty string uses the root directory directly without a subdirectory. Default is "blobs".

Examples:

// Use custom subdirectory
WithBlobDir("objects")

// Store blobs directly in root (no subdirectory)
WithBlobDir("")

func WithDirMode

func WithDirMode(mode os.FileMode) OptionFunc

WithDirMode sets the directory permission mode for storage directories. Default is 0755 (owner read/write/execute, group and others read/execute).

func WithFileMode

func WithFileMode(mode os.FileMode) OptionFunc

WithFileMode sets the file permission mode for blob data files. Default is 0644 (owner read/write, group and others read-only).

func WithShardFunc

func WithShardFunc(fn ShardFunc) OptionFunc

WithShardFunc sets a custom sharding function for generating storage paths. The function receives a key and returns a relative path (without filename).

Default sharding: Two-level SHA-256 hash (e.g., "blobs/a3/f2/a3f29d4e8c...")

Example custom sharding by date:

WithShardFunc(func(key string) string {
    now := time.Now()
    hash := sha256.Sum256([]byte(key))
    hexHash := hex.EncodeToString(hash[:])
    return filepath.Join("blobs", now.Format("2006/01/02"), hexHash)
})

Example flat sharding (single directory per hash):

WithShardFunc(func(key string) string {
    hash := sha256.Sum256([]byte(key))
    hexHash := hex.EncodeToString(hash[:])
    return filepath.Join("blobs", hexHash)
})

type Options

type Options struct {
	FileMode  os.FileMode // Permission bits for blob data files
	DirMode   os.FileMode // Permission bits for directories
	ShardFunc ShardFunc   // Function to generate storage paths from keys
	BlobDir   string      // Subdirectory for blob storage (empty string for root)
}

Options configures BlobStorage behavior.

type ShardFunc

type ShardFunc func(key string) string

ShardFunc is a function that generates a storage path from a key. It receives the key and returns the relative path where the blob should be stored (without the filename). The returned path will be joined with the root directory.

The function should:

  • Return a path relative to root (e.g., "blobs/a3/f2/a3f29d4e8c...")
  • Create a deterministic path based on the key
  • Distribute keys evenly to avoid filesystem hotspots
  • Be safe from path traversal attacks

Example: For key "users/avatar.jpg", might return "blobs/a3/f2/a3f29d4e8c..."

type Storage added in v0.1.2

type Storage struct {
	// contains filtered or unexported fields
}

func NewStorage

func NewStorage(root string, opts ...OptionFunc) (*Storage, error)

func (*Storage) Delete added in v0.1.2

func (bs *Storage) Delete(ctx context.Context, key string) error

func (*Storage) Exists added in v0.1.2

func (bs *Storage) Exists(ctx context.Context, key string) (bool, error)

func (*Storage) Get added in v0.1.2

func (bs *Storage) Get(ctx context.Context, key string) (io.ReadCloser, error)

func (*Storage) List deprecated added in v0.1.2

func (bs *Storage) List(ctx context.Context, prefix string) *BlobResult

List returns an iterator over all blobs matching the given prefix. The prefix is matched against the original blob keys, not the hashed storage paths. An empty prefix matches all blobs.

Deprecated: Use Storage.Walk instead. List will be removed in a future version.

The iterator must be closed when done to prevent resource leaks:

iter := storage.List(ctx, "prefix/")
defer iter.Close()
for iter.Next() {
    meta := iter.Meta()
    // process meta...
}
if err := iter.Err(); err != nil {
    // handle error
}

func (*Storage) NewBlob added in v0.1.2

func (bs *Storage) NewBlob() (*Blob, error)

NewBlob creates a new writable blob with a temporary internal ID. The blob must be explicitly committed with CommitAs(key) to persist it, or discarded with Discard() or Close() to clean up the temporary file.

The blob automatically:

  • Creates a temporary file for writing
  • Computes SHA-256 hash while writing
  • Detects content type from the first 512 bytes

Example usage:

blob, err := storage.NewBlob()
if err != nil {
	return err
}
defer blob.Discard() // Safety: cleanup if we don't commit

io.Copy(blob, reader)
hash := blob.Hash()

// Check if blob with this hash already exists
if exists, _ := storage.Exists(ctx, hash); exists {
	return nil // Already stored
}

// Commit with hash as key
return blob.CommitAs(hash)

func (*Storage) Put added in v0.1.2

func (bs *Storage) Put(ctx context.Context, key string, r io.Reader) error

Put stores a blob with the given key by reading from the provided reader. If the key already exists, it will be overwritten while preserving the original creation timestamp.

Why preserve createdAt: This maintains the original creation time even when a blob is updated, allowing tracking of when an object was first created vs. when it was last modified.

Why temp file: Writing to a temporary file first ensures atomicity - the blob either fully exists or doesn't, preventing partial writes from being visible.

func (*Storage) Stat added in v0.1.2

func (bs *Storage) Stat(ctx context.Context, key string) (*Meta, error)

func (*Storage) Walk added in v0.3.0

func (bs *Storage) Walk(ctx context.Context, prefix string, fn WalkFn) error

Walk calls fn for each blob whose key has the given prefix, in filesystem order. An empty prefix matches all blobs. Return filepath.SkipAll from fn to stop early without error.

Unlike Storage.List, Walk is synchronous and requires no cleanup.

Example:

err := storage.Walk(ctx, "documents/", func(key string, meta *Meta, err error) error {
	if err != nil {
		return err
	}
	fmt.Println(key, meta.Size)
	return nil
})

type WalkFn added in v0.3.0

type WalkFn func(key string, meta *Meta, err error) error

WalkFn is the callback signature for Storage.Walk. Return filepath.SkipAll to stop iteration early without an error. Any other non-nil return value aborts the walk and is returned by Walk.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL