blob

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 21, 2026 License: Apache-2.0, MIT Imports: 25 Imported by: 0

Documentation

Overview

Package blob provides a file archive format optimized for random access via HTTP range requests against OCI registries.

Archives consist of two OCI blobs:

  • Index blob: FlatBuffers-encoded file metadata enabling O(log n) lookups
  • Data blob: Concatenated file contents, sorted by path for efficient directory fetches

The package implements fs.FS and related interfaces for stdlib compatibility.

Index

Constants

View Source
const (
	CompressionNone = blobtype.CompressionNone
	CompressionZstd = blobtype.CompressionZstd
)

Re-export compression constants.

View Source
const (
	DefaultIndexName = "index.blob"
	DefaultDataName  = "data.blob"
)

Default file names for blob archives.

View Source
const DefaultMaxFiles = 200_000

DefaultMaxFiles is the default limit used when no MaxFiles option is set.

Variables

View Source
var (
	// ErrHashMismatch is returned when file content does not match its hash.
	ErrHashMismatch = blobtype.ErrHashMismatch

	// ErrDecompression is returned when decompression fails.
	ErrDecompression = blobtype.ErrDecompression

	// ErrSizeOverflow is returned when byte counts exceed supported limits.
	ErrSizeOverflow = blobtype.ErrSizeOverflow
)

Sentinel errors re-exported from internal/blobtype.

View Source
var (
	// ErrSymlink is returned when a symlink is encountered where not allowed.
	ErrSymlink = errors.New("blob: symlink")

	// ErrTooManyFiles is returned when the file count exceeds the configured limit.
	ErrTooManyFiles = errors.New("blob: too many files")
)

Sentinel errors specific to the blob package.

View Source
var DefaultSkipCompression = write.DefaultSkipCompression

DefaultSkipCompression returns a SkipCompressionFunc that skips small files and known already-compressed extensions.

View Source
var EntryFromViewWithPath = blobtype.EntryFromViewWithPath

EntryFromViewWithPath creates an Entry from an EntryView with the given path.

Functions

func Create

func Create(ctx context.Context, dir string, indexW, dataW io.Writer, opts ...CreateOption) error

Create builds an archive from the contents of dir.

Files are written to the data writer in path-sorted order, enabling efficient directory fetches via single range requests. The index is written as a FlatBuffers-encoded blob to the index writer.

Create builds the entire index in memory; memory use scales with entry count and path length. Rough guide: ~30-50MB for 100k files with ~60B average paths (entries plus FlatBuffers buffer).

Create walks dir recursively, including all regular files. Empty directories are not preserved. Symbolic links are not followed.

The context can be used for cancellation of long-running archive creation.

Types

type Blob

type Blob struct {
	// contains filtered or unexported fields
}

Blob provides random access to archive files.

Blob implements fs.FS, fs.StatFS, fs.ReadFileFS, and fs.ReadDirFS for compatibility with the standard library.

func New

func New(indexData []byte, source ByteSource, opts ...Option) (*Blob, error)

New creates a Blob for accessing files in the archive.

The indexData is the FlatBuffers-encoded index blob and source provides access to file content. Options can be used to configure size and decoder limits.

func (*Blob) CopyDir

func (b *Blob) CopyDir(destDir, prefix string, opts ...CopyOption) error

CopyDir extracts all files under a directory prefix to a destination.

If prefix is "" or ".", all files in the archive are extracted.

Files are written atomically using temp files and renames by default. CopyWithCleanDest clears the destination prefix and writes directly to the final path. This is more performant but less safe.

Parent directories are created as needed.

By default:

  • Existing files are skipped (use CopyWithOverwrite to overwrite)
  • File modes and times are not preserved (use CopyWithPreserveMode/Times)
  • Range reads are pipelined (when beneficial) with concurrency 4 (use CopyWithReadConcurrency to change)

func (*Blob) CopyTo

func (b *Blob) CopyTo(destDir string, paths ...string) error

CopyTo extracts specific files to a destination directory.

Parent directories are created as needed.

By default:

  • Existing files are skipped (use CopyWithOverwrite to overwrite)
  • File modes and times are not preserved (use CopyWithPreserveMode/Times)
  • Range reads are pipelined (when beneficial) with concurrency 4 (use CopyWithReadConcurrency to change)

func (*Blob) CopyToWithOptions

func (b *Blob) CopyToWithOptions(destDir string, paths []string, opts ...CopyOption) error

CopyToWithOptions extracts specific files with options.

func (*Blob) DataHash

func (b *Blob) DataHash() ([]byte, bool)

DataHash returns the hash of the data blob bytes from the index. The returned slice aliases the index buffer and must be treated as immutable. ok is false when the index did not record data metadata.

func (*Blob) DataSize

func (b *Blob) DataSize() (uint64, bool)

DataSize returns the size of the data blob in bytes from the index. ok is false when the index did not record data metadata.

func (*Blob) Entries

func (b *Blob) Entries() iter.Seq[EntryView]

Entries returns an iterator over all entries as read-only views.

The returned views are only valid while the Blob remains alive.

func (*Blob) EntriesWithPrefix

func (b *Blob) EntriesWithPrefix(prefix string) iter.Seq[EntryView]

EntriesWithPrefix returns an iterator over entries with the given prefix as read-only views.

The returned views are only valid while the Blob remains alive.

func (*Blob) Entry

func (b *Blob) Entry(path string) (EntryView, bool)

Entry returns a read-only view of the entry for the given path.

The returned view is only valid while the Blob remains alive.

func (*Blob) IndexData

func (b *Blob) IndexData() []byte

IndexData returns the raw FlatBuffers-encoded index data. This is useful for creating new Blobs with different data sources.

func (*Blob) Len

func (b *Blob) Len() int

Len returns the number of entries in the archive.

func (*Blob) Open

func (b *Blob) Open(name string) (fs.File, error)

Open implements fs.FS.

Open returns an fs.File for reading the named file. The returned file verifies the content hash on Close (unless disabled by WithVerifyOnClose) and returns ErrHashMismatch if verification fails. Callers must read to EOF or Close to ensure integrity; partial reads may return unverified data.

When caching is enabled (via WithCache), cached content is verified while reading and may return ErrHashMismatch if the cache was corrupted.

func (*Blob) ReadDir

func (b *Blob) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir implements fs.ReadDirFS.

ReadDir returns directory entries for the named directory, sorted by name. Directory entries are synthesized from file paths—the archive does not store directories explicitly.

func (*Blob) ReadFile

func (b *Blob) ReadFile(name string) ([]byte, error)

ReadFile implements fs.ReadFileFS.

ReadFile reads and returns the entire contents of the named file. The content is decompressed if necessary and verified against its hash.

When caching is enabled, concurrent calls for the same content are deduplicated using singleflight, preventing redundant network requests.

func (*Blob) Reader

func (b *Blob) Reader() *file.Reader

Reader returns the underlying file reader. This is useful for cached readers that need to share the decompression pool.

func (*Blob) Save

func (b *Blob) Save(indexPath, dataPath string) error

Save writes the blob archive to the specified paths.

Uses atomic writes (temp file + rename) to prevent partial writes on failure. Parent directories are created as needed.

func (*Blob) Size

func (b *Blob) Size() int64

Size returns the total size of the data blob in bytes.

func (*Blob) Stat

func (b *Blob) Stat(name string) (fs.FileInfo, error)

Stat implements fs.StatFS.

Stat returns file info for the named file without reading its content. For directories (paths that are prefixes of other entries), Stat returns synthetic directory info.

func (*Blob) Stream

func (b *Blob) Stream() io.Reader

Stream returns a reader that streams the entire data blob from beginning to end. This is useful for copying or transmitting the complete data content.

type BlobFile

type BlobFile struct {
	*Blob
	// contains filtered or unexported fields
}

BlobFile wraps a Blob with its underlying data file handle. Close must be called to release file resources.

func CreateBlob

func CreateBlob(ctx context.Context, srcDir, destDir string, opts ...CreateBlobOption) (*BlobFile, error)

CreateBlob creates a blob archive from srcDir and writes it to destDir.

By default, files are named "index.blob" and "data.blob". Use CreateBlobWithIndexName and CreateBlobWithDataName to override.

Returns a BlobFile that must be closed to release file handles.

func OpenFile

func OpenFile(indexPath, dataPath string, opts ...Option) (*BlobFile, error)

OpenFile opens a blob archive from index and data files.

The index file is read into memory; the data file is opened for random access. The returned BlobFile must be closed to release file resources.

func (*BlobFile) Close

func (bf *BlobFile) Close() error

Close closes the underlying data file.

type ByteSource

type ByteSource interface {
	io.ReaderAt
	Size() int64
	SourceID() string
}

ByteSource provides random access to the data blob.

Implementations exist for local files (*os.File) and HTTP range requests. SourceID must return a stable identifier for the underlying content.

type ChangeDetection

type ChangeDetection uint8

ChangeDetection controls how strictly file changes are detected during creation.

const (
	// ChangeDetectionNone disables file change detection during archive creation.
	ChangeDetectionNone ChangeDetection = iota
	// ChangeDetectionStrict verifies files did not change during archive creation.
	ChangeDetectionStrict
)

Change detection modes.

type Compression

type Compression = blobtype.Compression

Compression identifies the compression algorithm used for a file.

type CopyOption

type CopyOption func(*copyConfig)

CopyOption configures CopyTo and CopyDir operations.

func CopyWithCleanDest

func CopyWithCleanDest(enabled bool) CopyOption

CopyWithCleanDest clears the destination prefix before copying and writes directly to the final path (no temp files). This is only supported by CopyDir.

func CopyWithOverwrite

func CopyWithOverwrite(overwrite bool) CopyOption

CopyWithOverwrite allows overwriting existing files. By default, existing files are skipped.

func CopyWithPreserveMode

func CopyWithPreserveMode(preserve bool) CopyOption

CopyWithPreserveMode preserves file permission modes from the archive. By default, modes are not preserved (files use umask defaults).

func CopyWithPreserveTimes

func CopyWithPreserveTimes(preserve bool) CopyOption

CopyWithPreserveTimes preserves file modification times from the archive. By default, times are not preserved (files use current time).

func CopyWithReadAheadBytes

func CopyWithReadAheadBytes(limit uint64) CopyOption

CopyWithReadAheadBytes caps the total size of buffered group data. A value of 0 disables the byte budget.

func CopyWithReadConcurrency

func CopyWithReadConcurrency(n int) CopyOption

CopyWithReadConcurrency sets the number of concurrent range reads. Use 1 to force serial reads. Zero uses the default concurrency (4).

func CopyWithWorkers

func CopyWithWorkers(n int) CopyOption

CopyWithWorkers sets the number of workers for parallel processing. Values < 0 force serial processing. Zero uses automatic heuristics. Values > 0 force a specific worker count.

type CreateBlobOption

type CreateBlobOption func(*createBlobConfig)

CreateBlobOption configures CreateBlob.

func CreateBlobWithChangeDetection

func CreateBlobWithChangeDetection(cd ChangeDetection) CreateBlobOption

CreateBlobWithChangeDetection sets the change detection mode.

func CreateBlobWithCompression

func CreateBlobWithCompression(compression Compression) CreateBlobOption

CreateBlobWithCompression sets the compression algorithm.

func CreateBlobWithDataName

func CreateBlobWithDataName(name string) CreateBlobOption

CreateBlobWithDataName sets the data file name (default: "data.blob").

func CreateBlobWithIndexName

func CreateBlobWithIndexName(name string) CreateBlobOption

CreateBlobWithIndexName sets the index file name (default: "index.blob").

func CreateBlobWithMaxFiles

func CreateBlobWithMaxFiles(n int) CreateBlobOption

CreateBlobWithMaxFiles limits the number of files in the archive.

func CreateBlobWithSkipCompression

func CreateBlobWithSkipCompression(fns ...SkipCompressionFunc) CreateBlobOption

CreateBlobWithSkipCompression adds skip compression predicates.

type CreateOption

type CreateOption func(*createConfig)

CreateOption configures archive creation via the Create function.

func CreateWithChangeDetection

func CreateWithChangeDetection(cd ChangeDetection) CreateOption

CreateWithChangeDetection controls whether the writer verifies files did not change during archive creation. The zero value disables change detection to reduce syscalls; enable ChangeDetectionStrict for stronger guarantees.

func CreateWithCompression

func CreateWithCompression(c Compression) CreateOption

CreateWithCompression sets the compression algorithm to use. Use CompressionNone to store files uncompressed, CompressionZstd for zstd.

func CreateWithLogger

func CreateWithLogger(logger *slog.Logger) CreateOption

CreateWithLogger sets the logger for archive creation. If not set, logging is disabled.

func CreateWithMaxFiles

func CreateWithMaxFiles(n int) CreateOption

CreateWithMaxFiles limits the number of files included in the archive. Zero uses DefaultMaxFiles. Negative means no limit.

func CreateWithSkipCompression

func CreateWithSkipCompression(fns ...SkipCompressionFunc) CreateOption

CreateWithSkipCompression adds predicates that decide to store a file uncompressed. If any predicate returns true, compression is skipped for that file. These checks are on the hot path, so keep them cheap.

type Entry

type Entry = blobtype.Entry

Entry represents a file in the archive.

type EntryView

type EntryView = blobtype.EntryView

EntryView provides a read-only view of an index entry.

type File

type File interface {
	fs.File
	io.ReaderAt
}

File represents an archive file with optional random access. ReadAt is only supported for uncompressed entries.

type IndexView

type IndexView struct {
	// contains filtered or unexported fields
}

IndexView provides read-only access to archive file metadata.

It exposes index iteration and lookup without requiring the data blob to be available. This is useful for inspecting archive contents before deciding to download file data.

IndexView methods mirror those on Blob for consistency.

func NewIndexView

func NewIndexView(indexData []byte) (*IndexView, error)

NewIndexView creates an IndexView from raw FlatBuffers-encoded index data.

The provided data is retained by the IndexView; callers must not modify it after calling NewIndexView.

func (*IndexView) DataHash

func (v *IndexView) DataHash() ([]byte, bool)

DataHash returns the SHA256 hash of the data blob. The returned slice aliases the index buffer and must be treated as immutable. ok is false when the index did not record data metadata.

func (*IndexView) DataSize

func (v *IndexView) DataSize() (uint64, bool)

DataSize returns the size of the data blob in bytes. ok is false when the index did not record data metadata.

func (*IndexView) Entries

func (v *IndexView) Entries() iter.Seq[blobtype.EntryView]

Entries returns an iterator over all file entries.

The returned views are only valid while the IndexView remains alive.

func (*IndexView) EntriesWithPrefix

func (v *IndexView) EntriesWithPrefix(prefix string) iter.Seq[blobtype.EntryView]

EntriesWithPrefix returns an iterator over entries with the given prefix.

The returned views are only valid while the IndexView remains alive.

func (*IndexView) Entry

func (v *IndexView) Entry(path string) (blobtype.EntryView, bool)

Entry returns a read-only view of the entry for the given path.

The returned view is only valid while the IndexView remains alive.

func (*IndexView) IndexData

func (v *IndexView) IndexData() []byte

IndexData returns the raw FlatBuffers-encoded index. This is useful for caching or transmitting the index.

func (*IndexView) Len

func (v *IndexView) Len() int

Len returns the number of files in the archive.

func (*IndexView) Version

func (v *IndexView) Version() uint32

Version returns the index format version.

type Option

type Option func(*Blob)

Option configures a Blob.

func WithCache

func WithCache(c cache.Cache) Option

WithCache enables content-addressed caching.

When enabled, file content is cached after first read and served from cache on subsequent reads. Concurrent requests for the same content are deduplicated.

func WithDecoderConcurrency

func WithDecoderConcurrency(n int) Option

WithDecoderConcurrency sets the zstd decoder concurrency (default: 1). Values < 0 are treated as 0 (use GOMAXPROCS).

func WithDecoderLowmem

func WithDecoderLowmem(enabled bool) Option

WithDecoderLowmem sets whether the zstd decoder should use low-memory mode (default: false).

func WithLogger

func WithLogger(logger *slog.Logger) Option

WithLogger sets the logger for blob operations. If not set, logging is disabled.

func WithMaxDecoderMemory

func WithMaxDecoderMemory(limit uint64) Option

WithMaxDecoderMemory limits the maximum memory used by the zstd decoder. Set limit to 0 to disable the limit.

func WithMaxFileSize

func WithMaxFileSize(limit uint64) Option

WithMaxFileSize limits the maximum per-file size (compressed and uncompressed). Set limit to 0 to disable the limit.

func WithVerifyOnClose

func WithVerifyOnClose(enabled bool) Option

WithVerifyOnClose controls whether Close drains the file to verify the hash.

When false, Close returns without reading the remaining data. Integrity is only guaranteed when callers read to EOF.

type SkipCompressionFunc

type SkipCompressionFunc = write.SkipCompressionFunc

SkipCompressionFunc returns true when a file should be stored uncompressed. It is called once per file and should be inexpensive.

Directories

Path Synopsis
Package cache provides content-addressed caching for blob archives.
Package cache provides content-addressed caching for blob archives.
disk
Package disk provides a disk-backed cache implementation.
Package disk provides a disk-backed cache implementation.
Package http provides a ByteSource backed by HTTP range requests.
Package http provides a ByteSource backed by HTTP range requests.
internal
batch
Package batch provides batch processing for reading multiple entries from a blob archive.
Package batch provides batch processing for reading multiple entries from a blob archive.
blobtype
Package blobtype defines shared types used across the blob package and its internal packages.
Package blobtype defines shared types used across the blob package and its internal packages.
fb
Package fb contains FlatBuffers-generated code for the blob index format.
Package fb contains FlatBuffers-generated code for the blob index format.
file
Package file provides internal file reading operations for the blob package.
Package file provides internal file reading operations for the blob package.
index
Package index provides FlatBuffers index loading and lookup for blob archives.
Package index provides FlatBuffers index loading and lookup for blob archives.
platform
Package platform provides platform-specific file operations.
Package platform provides platform-specific file operations.
sizing
Package sizing provides safe size arithmetic and conversions to prevent overflow.
Package sizing provides safe size arithmetic and conversions to prevent overflow.
write
Package write provides internal file writing operations for the blob package.
Package write provides internal file writing operations for the blob package.
Package testutil provides test utilities for the blob package.
Package testutil provides test utilities for the blob package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL