blob

package

v1.0.0 Latest Latest Go to latest Published: Jan 21, 2026 License: Apache-2.0, MIT Imports: 25 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/meigma/blob

Links

Open Source Insights

Documentation ¶

Overview ¶

Package blob provides a file archive format optimized for random access via HTTP range requests against OCI registries.

Archives consist of two OCI blobs:

Index blob: FlatBuffers-encoded file metadata enabling O(log n) lookups
Data blob: Concatenated file contents, sorted by path for efficient directory fetches

The package implements fs.FS and related interfaces for stdlib compatibility.

Index ¶

Constants
Variables
func Create(ctx context.Context, dir string, indexW, dataW io.Writer, opts ...CreateOption) error
type Blob
- func New(indexData []byte, source ByteSource, opts ...Option) (*Blob, error)
- func (b *Blob) CopyDir(destDir, prefix string, opts ...CopyOption) error
- func (b *Blob) CopyTo(destDir string, paths ...string) error
- func (b *Blob) CopyToWithOptions(destDir string, paths []string, opts ...CopyOption) error
- func (b *Blob) DataHash() ([]byte, bool)
- func (b *Blob) DataSize() (uint64, bool)
- func (b *Blob) Entries() iter.Seq[EntryView]
- func (b *Blob) EntriesWithPrefix(prefix string) iter.Seq[EntryView]
- func (b *Blob) Entry(path string) (EntryView, bool)
- func (b *Blob) IndexData() []byte
- func (b *Blob) Len() int
- func (b *Blob) Open(name string) (fs.File, error)
- func (b *Blob) ReadDir(name string) ([]fs.DirEntry, error)
- func (b *Blob) ReadFile(name string) ([]byte, error)
- func (b *Blob) Reader() *file.Reader
- func (b *Blob) Save(indexPath, dataPath string) error
- func (b *Blob) Size() int64
- func (b *Blob) Stat(name string) (fs.FileInfo, error)
- func (b *Blob) Stream() io.Reader
type BlobFile
- func CreateBlob(ctx context.Context, srcDir, destDir string, opts ...CreateBlobOption) (*BlobFile, error)
- func OpenFile(indexPath, dataPath string, opts ...Option) (*BlobFile, error)
- func (bf *BlobFile) Close() error
type ByteSource
type ChangeDetection
type Compression
type CopyOption
- func CopyWithCleanDest(enabled bool) CopyOption
- func CopyWithOverwrite(overwrite bool) CopyOption
- func CopyWithPreserveMode(preserve bool) CopyOption
- func CopyWithPreserveTimes(preserve bool) CopyOption
- func CopyWithReadAheadBytes(limit uint64) CopyOption
- func CopyWithReadConcurrency(n int) CopyOption
- func CopyWithWorkers(n int) CopyOption
type CreateBlobOption
- func CreateBlobWithChangeDetection(cd ChangeDetection) CreateBlobOption
- func CreateBlobWithCompression(compression Compression) CreateBlobOption
- func CreateBlobWithDataName(name string) CreateBlobOption
- func CreateBlobWithIndexName(name string) CreateBlobOption
- func CreateBlobWithMaxFiles(n int) CreateBlobOption
- func CreateBlobWithSkipCompression(fns ...SkipCompressionFunc) CreateBlobOption
type CreateOption
- func CreateWithChangeDetection(cd ChangeDetection) CreateOption
- func CreateWithCompression(c Compression) CreateOption
- func CreateWithLogger(logger *slog.Logger) CreateOption
- func CreateWithMaxFiles(n int) CreateOption
- func CreateWithSkipCompression(fns ...SkipCompressionFunc) CreateOption
type Entry
type EntryView
type File
type IndexView
- func NewIndexView(indexData []byte) (*IndexView, error)
- func (v *IndexView) DataHash() ([]byte, bool)
- func (v *IndexView) DataSize() (uint64, bool)
- func (v *IndexView) Entries() iter.Seq[blobtype.EntryView]
- func (v *IndexView) EntriesWithPrefix(prefix string) iter.Seq[blobtype.EntryView]
- func (v *IndexView) Entry(path string) (blobtype.EntryView, bool)
- func (v *IndexView) IndexData() []byte
- func (v *IndexView) Len() int
- func (v *IndexView) Version() uint32
type Option
- func WithCache(c cache.Cache) Option
- func WithDecoderConcurrency(n int) Option
- func WithDecoderLowmem(enabled bool) Option
- func WithLogger(logger *slog.Logger) Option
- func WithMaxDecoderMemory(limit uint64) Option
- func WithMaxFileSize(limit uint64) Option
- func WithVerifyOnClose(enabled bool) Option
type SkipCompressionFunc

Constants ¶

View Source

const (
	CompressionNone = blobtype.CompressionNone
	CompressionZstd = blobtype.CompressionZstd
)

Re-export compression constants.

View Source

const (
	DefaultIndexName = "index.blob"
	DefaultDataName  = "data.blob"
)

Default file names for blob archives.

View Source

const DefaultMaxFiles = 200_000

DefaultMaxFiles is the default limit used when no MaxFiles option is set.

Variables ¶

View Source

var (
	// ErrHashMismatch is returned when file content does not match its hash.
	ErrHashMismatch = blobtype.ErrHashMismatch

	// ErrDecompression is returned when decompression fails.
	ErrDecompression = blobtype.ErrDecompression

	// ErrSizeOverflow is returned when byte counts exceed supported limits.
	ErrSizeOverflow = blobtype.ErrSizeOverflow
)

Sentinel errors re-exported from internal/blobtype.

View Source

var (
	// ErrSymlink is returned when a symlink is encountered where not allowed.
	ErrSymlink = errors.New("blob: symlink")

	// ErrTooManyFiles is returned when the file count exceeds the configured limit.
	ErrTooManyFiles = errors.New("blob: too many files")
)

Sentinel errors specific to the blob package.

View Source

var DefaultSkipCompression = write.DefaultSkipCompression

DefaultSkipCompression returns a SkipCompressionFunc that skips small files and known already-compressed extensions.

View Source

var EntryFromViewWithPath = blobtype.EntryFromViewWithPath

EntryFromViewWithPath creates an Entry from an EntryView with the given path.

Functions ¶

func Create ¶

func Create(ctx context.Context, dir string, indexW, dataW io.Writer, opts ...CreateOption) error

Create builds an archive from the contents of dir.

Files are written to the data writer in path-sorted order, enabling efficient directory fetches via single range requests. The index is written as a FlatBuffers-encoded blob to the index writer.

Create builds the entire index in memory; memory use scales with entry count and path length. Rough guide: ~30-50MB for 100k files with ~60B average paths (entries plus FlatBuffers buffer).

Create walks dir recursively, including all regular files. Empty directories are not preserved. Symbolic links are not followed.

The context can be used for cancellation of long-running archive creation.

Types ¶

type Blob ¶

type Blob struct {
	// contains filtered or unexported fields
}

Blob provides random access to archive files.

Blob implements fs.FS, fs.StatFS, fs.ReadFileFS, and fs.ReadDirFS for compatibility with the standard library.

func New ¶

func New(indexData []byte, source ByteSource, opts ...Option) (*Blob, error)

New creates a Blob for accessing files in the archive.

The indexData is the FlatBuffers-encoded index blob and source provides access to file content. Options can be used to configure size and decoder limits.

func (*Blob) CopyDir ¶

func (b *Blob) CopyDir(destDir, prefix string, opts ...CopyOption) error

CopyDir extracts all files under a directory prefix to a destination.

If prefix is "" or ".", all files in the archive are extracted.

Files are written atomically using temp files and renames by default. CopyWithCleanDest clears the destination prefix and writes directly to the final path. This is more performant but less safe.

Parent directories are created as needed.

By default:

Existing files are skipped (use CopyWithOverwrite to overwrite)
File modes and times are not preserved (use CopyWithPreserveMode/Times)
Range reads are pipelined (when beneficial) with concurrency 4 (use CopyWithReadConcurrency to change)

func (*Blob) CopyTo ¶

func (b *Blob) CopyTo(destDir string, paths ...string) error

CopyTo extracts specific files to a destination directory.

Parent directories are created as needed.

By default:

Existing files are skipped (use CopyWithOverwrite to overwrite)
File modes and times are not preserved (use CopyWithPreserveMode/Times)
Range reads are pipelined (when beneficial) with concurrency 4 (use CopyWithReadConcurrency to change)

func (*Blob) CopyToWithOptions ¶

func (b *Blob) CopyToWithOptions(destDir string, paths []string, opts ...CopyOption) error

CopyToWithOptions extracts specific files with options.

func (*Blob) DataHash ¶

func (b *Blob) DataHash() ([]byte, bool)

DataHash returns the hash of the data blob bytes from the index. The returned slice aliases the index buffer and must be treated as immutable. ok is false when the index did not record data metadata.

func (*Blob) DataSize ¶

func (b *Blob) DataSize() (uint64, bool)

DataSize returns the size of the data blob in bytes from the index. ok is false when the index did not record data metadata.

func (*Blob) Entries ¶

func (b *Blob) Entries() iter.Seq[EntryView]

Entries returns an iterator over all entries as read-only views.

The returned views are only valid while the Blob remains alive.

func (*Blob) EntriesWithPrefix ¶

func (b *Blob) EntriesWithPrefix(prefix string) iter.Seq[EntryView]

EntriesWithPrefix returns an iterator over entries with the given prefix as read-only views.

The returned views are only valid while the Blob remains alive.

func (*Blob) Entry ¶

func (b *Blob) Entry(path string) (EntryView, bool)

Entry returns a read-only view of the entry for the given path.

The returned view is only valid while the Blob remains alive.

func (*Blob) IndexData ¶

func (b *Blob) IndexData() []byte

IndexData returns the raw FlatBuffers-encoded index data. This is useful for creating new Blobs with different data sources.

func (*Blob) Len ¶

func (b *Blob) Len() int

Len returns the number of entries in the archive.

func (*Blob) Open ¶

func (b *Blob) Open(name string) (fs.File, error)

Open implements fs.FS.

Open returns an fs.File for reading the named file. The returned file verifies the content hash on Close (unless disabled by WithVerifyOnClose) and returns ErrHashMismatch if verification fails. Callers must read to EOF or Close to ensure integrity; partial reads may return unverified data.

When caching is enabled (via WithCache), cached content is verified while reading and may return ErrHashMismatch if the cache was corrupted.

func (*Blob) ReadDir ¶

func (b *Blob) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir implements fs.ReadDirFS.

ReadDir returns directory entries for the named directory, sorted by name. Directory entries are synthesized from file paths—the archive does not store directories explicitly.

func (*Blob) ReadFile ¶

func (b *Blob) ReadFile(name string) ([]byte, error)

ReadFile implements fs.ReadFileFS.

ReadFile reads and returns the entire contents of the named file. The content is decompressed if necessary and verified against its hash.

When caching is enabled, concurrent calls for the same content are deduplicated using singleflight, preventing redundant network requests.

func (*Blob) Reader ¶

func (b *Blob) Reader() *file.Reader

Reader returns the underlying file reader. This is useful for cached readers that need to share the decompression pool.

func (*Blob) Save ¶

func (b *Blob) Save(indexPath, dataPath string) error

Save writes the blob archive to the specified paths.

Uses atomic writes (temp file + rename) to prevent partial writes on failure. Parent directories are created as needed.

func (*Blob) Size ¶

func (b *Blob) Size() int64

Size returns the total size of the data blob in bytes.

func (*Blob) Stat ¶

func (b *Blob) Stat(name string) (fs.FileInfo, error)

Stat implements fs.StatFS.

Stat returns file info for the named file without reading its content. For directories (paths that are prefixes of other entries), Stat returns synthetic directory info.

func (*Blob) Stream ¶

func (b *Blob) Stream() io.Reader

Stream returns a reader that streams the entire data blob from beginning to end. This is useful for copying or transmitting the complete data content.

type BlobFile ¶

type BlobFile struct {
	*Blob
	// contains filtered or unexported fields
}

BlobFile wraps a Blob with its underlying data file handle. Close must be called to release file resources.

func CreateBlob ¶

func CreateBlob(ctx context.Context, srcDir, destDir string, opts ...CreateBlobOption) (*BlobFile, error)

CreateBlob creates a blob archive from srcDir and writes it to destDir.

By default, files are named "index.blob" and "data.blob". Use CreateBlobWithIndexName and CreateBlobWithDataName to override.

Returns a BlobFile that must be closed to release file handles.

func OpenFile ¶

func OpenFile(indexPath, dataPath string, opts ...Option) (*BlobFile, error)

OpenFile opens a blob archive from index and data files.

The index file is read into memory; the data file is opened for random access. The returned BlobFile must be closed to release file resources.

func (*BlobFile) Close ¶

func (bf *BlobFile) Close() error

Close closes the underlying data file.

type ByteSource ¶

type ByteSource interface {
	io.ReaderAt
	Size() int64
	SourceID() string
}

ByteSource provides random access to the data blob.

Implementations exist for local files (*os.File) and HTTP range requests. SourceID must return a stable identifier for the underlying content.

type ChangeDetection ¶

type ChangeDetection uint8

ChangeDetection controls how strictly file changes are detected during creation.

const (
	// ChangeDetectionNone disables file change detection during archive creation.
	ChangeDetectionNone ChangeDetection = iota
	// ChangeDetectionStrict verifies files did not change during archive creation.
	ChangeDetectionStrict
)

Change detection modes.

type Compression ¶

type Compression = blobtype.Compression

Compression identifies the compression algorithm used for a file.

type CopyOption ¶

type CopyOption func(*copyConfig)

CopyOption configures CopyTo and CopyDir operations.

func CopyWithCleanDest ¶

func CopyWithCleanDest(enabled bool) CopyOption

CopyWithCleanDest clears the destination prefix before copying and writes directly to the final path (no temp files). This is only supported by CopyDir.

func CopyWithOverwrite ¶

func CopyWithOverwrite(overwrite bool) CopyOption

CopyWithOverwrite allows overwriting existing files. By default, existing files are skipped.

func CopyWithPreserveMode ¶

func CopyWithPreserveMode(preserve bool) CopyOption

CopyWithPreserveMode preserves file permission modes from the archive. By default, modes are not preserved (files use umask defaults).

func CopyWithPreserveTimes ¶

func CopyWithPreserveTimes(preserve bool) CopyOption

CopyWithPreserveTimes preserves file modification times from the archive. By default, times are not preserved (files use current time).

func CopyWithReadAheadBytes ¶

func CopyWithReadAheadBytes(limit uint64) CopyOption

CopyWithReadAheadBytes caps the total size of buffered group data. A value of 0 disables the byte budget.

func CopyWithReadConcurrency ¶

func CopyWithReadConcurrency(n int) CopyOption

CopyWithReadConcurrency sets the number of concurrent range reads. Use 1 to force serial reads. Zero uses the default concurrency (4).

func CopyWithWorkers ¶

func CopyWithWorkers(n int) CopyOption

CopyWithWorkers sets the number of workers for parallel processing. Values < 0 force serial processing. Zero uses automatic heuristics. Values > 0 force a specific worker count.

type CreateBlobOption ¶

type CreateBlobOption func(*createBlobConfig)

CreateBlobOption configures CreateBlob.

func CreateBlobWithChangeDetection ¶

func CreateBlobWithChangeDetection(cd ChangeDetection) CreateBlobOption

CreateBlobWithChangeDetection sets the change detection mode.

func CreateBlobWithCompression ¶

func CreateBlobWithCompression(compression Compression) CreateBlobOption

CreateBlobWithCompression sets the compression algorithm.

func CreateBlobWithDataName ¶

func CreateBlobWithDataName(name string) CreateBlobOption

CreateBlobWithDataName sets the data file name (default: "data.blob").

func CreateBlobWithIndexName ¶

func CreateBlobWithIndexName(name string) CreateBlobOption

CreateBlobWithIndexName sets the index file name (default: "index.blob").

func CreateBlobWithMaxFiles ¶

func CreateBlobWithMaxFiles(n int) CreateBlobOption

CreateBlobWithMaxFiles limits the number of files in the archive.

func CreateBlobWithSkipCompression ¶

func CreateBlobWithSkipCompression(fns ...SkipCompressionFunc) CreateBlobOption

CreateBlobWithSkipCompression adds skip compression predicates.

type CreateOption ¶

type CreateOption func(*createConfig)

CreateOption configures archive creation via the Create function.

func CreateWithChangeDetection ¶

func CreateWithChangeDetection(cd ChangeDetection) CreateOption

CreateWithChangeDetection controls whether the writer verifies files did not change during archive creation. The zero value disables change detection to reduce syscalls; enable ChangeDetectionStrict for stronger guarantees.

func CreateWithCompression ¶

func CreateWithCompression(c Compression) CreateOption

CreateWithCompression sets the compression algorithm to use. Use CompressionNone to store files uncompressed, CompressionZstd for zstd.

func CreateWithLogger ¶

func CreateWithLogger(logger *slog.Logger) CreateOption

CreateWithLogger sets the logger for archive creation. If not set, logging is disabled.

func CreateWithMaxFiles ¶

func CreateWithMaxFiles(n int) CreateOption

CreateWithMaxFiles limits the number of files included in the archive. Zero uses DefaultMaxFiles. Negative means no limit.

func CreateWithSkipCompression ¶

func CreateWithSkipCompression(fns ...SkipCompressionFunc) CreateOption

CreateWithSkipCompression adds predicates that decide to store a file uncompressed. If any predicate returns true, compression is skipped for that file. These checks are on the hot path, so keep them cheap.

type Entry ¶

type Entry = blobtype.Entry

Entry represents a file in the archive.

type EntryView ¶

type EntryView = blobtype.EntryView

EntryView provides a read-only view of an index entry.

type File ¶

type File interface {
	fs.File
	io.ReaderAt
}

File represents an archive file with optional random access. ReadAt is only supported for uncompressed entries.

type IndexView ¶

type IndexView struct {
	// contains filtered or unexported fields
}

IndexView provides read-only access to archive file metadata.

It exposes index iteration and lookup without requiring the data blob to be available. This is useful for inspecting archive contents before deciding to download file data.

IndexView methods mirror those on Blob for consistency.

func NewIndexView ¶

func NewIndexView(indexData []byte) (*IndexView, error)

NewIndexView creates an IndexView from raw FlatBuffers-encoded index data.

The provided data is retained by the IndexView; callers must not modify it after calling NewIndexView.

func (*IndexView) DataHash ¶

func (v *IndexView) DataHash() ([]byte, bool)

DataHash returns the SHA256 hash of the data blob. The returned slice aliases the index buffer and must be treated as immutable. ok is false when the index did not record data metadata.

func (*IndexView) DataSize ¶

func (v *IndexView) DataSize() (uint64, bool)

DataSize returns the size of the data blob in bytes. ok is false when the index did not record data metadata.

func (*IndexView) Entries ¶

func (v *IndexView) Entries() iter.Seq[blobtype.EntryView]

Entries returns an iterator over all file entries.

The returned views are only valid while the IndexView remains alive.

func (*IndexView) EntriesWithPrefix ¶

func (v *IndexView) EntriesWithPrefix(prefix string) iter.Seq[blobtype.EntryView]

EntriesWithPrefix returns an iterator over entries with the given prefix.

The returned views are only valid while the IndexView remains alive.

func (*IndexView) Entry ¶

func (v *IndexView) Entry(path string) (blobtype.EntryView, bool)

Entry returns a read-only view of the entry for the given path.

The returned view is only valid while the IndexView remains alive.

func (*IndexView) IndexData ¶

func (v *IndexView) IndexData() []byte

IndexData returns the raw FlatBuffers-encoded index. This is useful for caching or transmitting the index.

func (*IndexView) Len ¶

func (v *IndexView) Len() int

Len returns the number of files in the archive.

func (*IndexView) Version ¶

func (v *IndexView) Version() uint32

Version returns the index format version.

type Option ¶

type Option func(*Blob)

Option configures a Blob.

func WithCache ¶

func WithCache(c cache.Cache) Option

WithCache enables content-addressed caching.

When enabled, file content is cached after first read and served from cache on subsequent reads. Concurrent requests for the same content are deduplicated.

func WithDecoderConcurrency ¶

func WithDecoderConcurrency(n int) Option

WithDecoderConcurrency sets the zstd decoder concurrency (default: 1). Values < 0 are treated as 0 (use GOMAXPROCS).

func WithDecoderLowmem ¶

func WithDecoderLowmem(enabled bool) Option

WithDecoderLowmem sets whether the zstd decoder should use low-memory mode (default: false).

func WithLogger ¶

func WithLogger(logger *slog.Logger) Option

WithLogger sets the logger for blob operations. If not set, logging is disabled.

func WithMaxDecoderMemory ¶

func WithMaxDecoderMemory(limit uint64) Option

WithMaxDecoderMemory limits the maximum memory used by the zstd decoder. Set limit to 0 to disable the limit.

func WithMaxFileSize ¶

func WithMaxFileSize(limit uint64) Option

WithMaxFileSize limits the maximum per-file size (compressed and uncompressed). Set limit to 0 to disable the limit.

func WithVerifyOnClose ¶

func WithVerifyOnClose(enabled bool) Option

WithVerifyOnClose controls whether Close drains the file to verify the hash.

When false, Close returns without reading the remaining data. Integrity is only guaranteed when callers read to EOF.

type SkipCompressionFunc ¶

type SkipCompressionFunc = write.SkipCompressionFunc

SkipCompressionFunc returns true when a file should be stored uncompressed. It is called once per file and should be inexpensive.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cache Package cache provides content-addressed caching for blob archives.	Package cache provides content-addressed caching for blob archives.
disk Package disk provides a disk-backed cache implementation.	Package disk provides a disk-backed cache implementation.
http Package http provides a ByteSource backed by HTTP range requests.	Package http provides a ByteSource backed by HTTP range requests.
internal
batch Package batch provides batch processing for reading multiple entries from a blob archive.	Package batch provides batch processing for reading multiple entries from a blob archive.
blobtype Package blobtype defines shared types used across the blob package and its internal packages.	Package blobtype defines shared types used across the blob package and its internal packages.
fb Package fb contains FlatBuffers-generated code for the blob index format.	Package fb contains FlatBuffers-generated code for the blob index format.
file Package file provides internal file reading operations for the blob package.	Package file provides internal file reading operations for the blob package.
index Package index provides FlatBuffers index loading and lookup for blob archives.	Package index provides FlatBuffers index loading and lookup for blob archives.
platform Package platform provides platform-specific file operations.	Package platform provides platform-specific file operations.
sizing Package sizing provides safe size arithmetic and conversions to prevent overflow.	Package sizing provides safe size arithmetic and conversions to prevent overflow.
write Package write provides internal file writing operations for the blob package.	Package write provides internal file writing operations for the blob package.
testutil Package testutil provides test utilities for the blob package.	Package testutil provides test utilities for the blob package.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL