datastore

package
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 13, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Overview

Package datastore provides chunk storage, indexing, and backup catalog management for pxar archives.

The package implements the Proxmox Backup Server data model: backup data is split into chunks, each chunk is stored as a DataBlob (with optional zstd compression and CRC32 verification), and chunk references are tracked in dynamic or fixed index files.

Chunk Store

ChunkStore manages chunk storage on the local filesystem. Each chunk is identified by its SHA-256 digest and stored under a .chunks directory:

store, err := datastore.NewChunkStore("/backup/datastore")
if err != nil {
    log.Fatal(err)
}

// Store a chunk
digest := sha256.Sum256(data)
inserted, size, err := store.InsertChunk(digest, blobData)

// Load a chunk
blobData, err := store.LoadChunk(digest)

Data Blobs

All chunk data is wrapped in a DataBlob envelope containing a magic number and CRC32 checksum:

blob, err := datastore.EncodeBlob(rawChunk)
encoded := blob.Bytes()

// Decode
decoded, err := datastore.DecodeBlob(encoded)

Use EncodeCompressedBlob for zstd compression.

Index Files

Dynamic indexes (.didx) map variable-size chunks (from buzhash chunking) to their digests and offsets:

writer := datastore.NewDynamicIndexWriter(time.Now().Unix())
writer.Add(offset, digest)
indexData, err := writer.Finish()

// Read back
reader, err := datastore.ReadDynamicIndex(indexData)
count := reader.Count()
info, ok := reader.ChunkInfo(0)

Fixed indexes (.fidx) are used for fixed-size chunks (e.g., raw disk images).

Store Chunker

StoreChunker wires together buzhash chunking, blob encoding, and chunk storage into a single pipeline:

sc := datastore.NewStoreChunker(store, chunkCfg, true) // true = compress
results, idxWriter, err := sc.ChunkStream(archiveReader)

Backup Catalog

BackupType, BackupGroup, BackupDir, and BackupInfo model the PBS backup namespace hierarchy (type/id/timestamp). Manifest tracks all files in a backup snapshot:

manifest := &datastore.Manifest{
    BackupType: datastore.BackupHost.String(),
    BackupID:   "myhost",
    BackupTime: time.Now().Unix(),
    Files:      []datastore.FileInfo{...},
}
data, err := manifest.Marshal()

Index

Constants

View Source
const (
	BlobHeaderSize          = 12 // magic(8) + crc32(4)
	EncryptedBlobHeaderSize = 48 // magic(8) + crc32(4) + iv(16) + tag(16)
	IndexHeaderSize         = 4096
	DynamicEntrySize        = 40 // end_offset(8) + digest(32)
	FixedDigestSize         = 32
	MaxBlobSize             = 128 * 1024 * 1024 // 128MB
)

Variables

View Source
var (
	MagicUncompressedBlob  = [8]byte{66, 171, 56, 7, 190, 131, 112, 161}
	MagicCompressedBlob    = [8]byte{49, 185, 88, 66, 111, 182, 163, 127}
	MagicEncryptedBlob     = [8]byte{123, 103, 133, 190, 34, 45, 76, 240}
	MagicEncrComprBlob     = [8]byte{230, 89, 27, 191, 11, 191, 216, 11}
	MagicFixedChunkIndex   = [8]byte{47, 127, 65, 237, 145, 253, 15, 205}
	MagicDynamicChunkIndex = [8]byte{28, 145, 78, 165, 25, 186, 179, 205}
	MagicCatalogFile       = [8]byte{145, 253, 96, 249, 196, 103, 88, 213}
)

Magic numbers from Proxmox Backup Server (file_formats.rs).

Functions

func BlobHeaderSizeFor

func BlobHeaderSizeFor(magic [8]byte) int

BlobHeaderSizeFor returns the header size for the given blob magic. Panics for unknown magic values.

func DecodeBlob

func DecodeBlob(raw []byte) ([]byte, error)

DecodeBlob decodes a raw blob, verifies CRC, and returns the payload data.

func IsCompressedMagic

func IsCompressedMagic(magic [8]byte) bool

IsCompressedMagic returns true for compressed blob types.

func IsEncryptedMagic

func IsEncryptedMagic(magic [8]byte) bool

IsEncryptedMagic returns true for encrypted blob types.

Types

type BackupDir

type BackupDir struct {
	Group     BackupGroup
	Timestamp time.Time
}

BackupDir represents a single backup snapshot.

func (BackupDir) Create

func (d BackupDir) Create() error

Create creates the snapshot directory on disk.

func (BackupDir) FullPath

func (d BackupDir) FullPath() string

FullPath returns the absolute path under the base directory.

func (BackupDir) Info

func (d BackupDir) Info() (*BackupInfo, error)

Info returns detailed information about this backup snapshot.

func (BackupDir) Path

func (d BackupDir) Path() string

Path returns the relative path (e.g., "vm/100/2023-11-14T22:13:20Z").

type BackupGroup

type BackupGroup struct {
	Type BackupType
	ID   string
	Base string // base directory (datastore root)
}

BackupGroup represents a collection of backup snapshots (e.g., vm/100).

func ListBackupGroups

func ListBackupGroups(base string) ([]BackupGroup, error)

ListBackupGroups returns all backup groups in the datastore base directory.

func (BackupGroup) Destroy

func (g BackupGroup) Destroy() error

Destroy removes the backup group directory.

func (BackupGroup) FullPath

func (g BackupGroup) FullPath() string

FullPath returns the absolute path under the base directory.

func (BackupGroup) ListSnapshots

func (g BackupGroup) ListSnapshots() ([]BackupDir, error)

ListSnapshots returns all backup snapshots in this group.

func (BackupGroup) Path

func (g BackupGroup) Path() string

Path returns the relative path for this group (e.g., "vm/100").

type BackupInfo

type BackupInfo struct {
	Dir       BackupDir
	Files     []string
	Protected bool
}

BackupInfo holds metadata about a backup snapshot.

func (*BackupInfo) Protect

func (info *BackupInfo) Protect() error

Protect marks the backup as protected by creating a .protected file.

func (*BackupInfo) Unprotect

func (info *BackupInfo) Unprotect() error

Unprotect removes the protection marker.

type BackupType

type BackupType int

BackupType identifies the kind of backup.

const (
	BackupVM BackupType = iota
	BackupCT
	BackupHost
)

func ParseBackupType

func ParseBackupType(s string) (BackupType, error)

ParseBackupType parses a backup type string.

func (BackupType) String

func (bt BackupType) String() string

type BlobHeader

type BlobHeader struct {
	Magic [8]byte
	CRC   uint32
}

BlobHeader is the 12-byte header for uncompressed and compressed blobs.

func UnmarshalBlobHeader

func UnmarshalBlobHeader(data []byte) (BlobHeader, error)

UnmarshalBlobHeader parses a BlobHeader from raw bytes.

func (*BlobHeader) MarshalTo

func (h *BlobHeader) MarshalTo(buf []byte)

MarshalTo writes the header to buf (must be at least BlobHeaderSize bytes).

type ChunkInfo

type ChunkInfo struct {
	Start  uint64
	End    uint64
	Digest [32]byte
}

ChunkInfo describes a single chunk's position and digest.

type ChunkResult

type ChunkResult struct {
	Digest [32]byte // SHA-256 of raw chunk data
	Offset uint64   // start offset in the original stream
	Size   int      // chunk data size in bytes
	Exists bool     // true if chunk was already in the store
}

ChunkResult describes a single chunk produced by the chunker pipeline.

type ChunkSource

type ChunkSource interface {
	// GetChunk retrieves a chunk by its SHA-256 digest.
	// Returns the raw chunk data (not decoded/blob-wrapped).
	GetChunk(digest [32]byte) ([]byte, error)
}

ChunkSource provides access to chunks by their digest.

type ChunkStore

type ChunkStore struct {
	// contains filtered or unexported fields
}

ChunkStore manages chunk storage on the filesystem. Chunks are stored under base/.chunks/XX/XXYY... where XX are the first two hex characters of the SHA-256 digest.

func NewChunkStore

func NewChunkStore(base string) (*ChunkStore, error)

NewChunkStore creates a ChunkStore rooted at base, creating the .chunks directory if needed.

func (*ChunkStore) ChunkPath

func (cs *ChunkStore) ChunkPath(digest [32]byte) string

ChunkPath returns the filesystem path for a chunk identified by digest.

func (*ChunkStore) InsertChunk

func (cs *ChunkStore) InsertChunk(digest [32]byte, data []byte) (bool, int, error)

InsertChunk stores a chunk. Returns (exists, size, error). If the chunk already exists, returns (true, existingSize, nil).

func (*ChunkStore) LoadChunk

func (cs *ChunkStore) LoadChunk(digest [32]byte) ([]byte, error)

LoadChunk reads a chunk from disk.

func (*ChunkStore) TouchChunk

func (cs *ChunkStore) TouchChunk(digest [32]byte) error

TouchChunk updates the access time of a chunk file.

type ChunkStoreSource

type ChunkStoreSource struct {
	// contains filtered or unexported fields
}

ChunkStoreSource adapts a ChunkStore to the ChunkSource interface.

func NewChunkStoreSource

func NewChunkStoreSource(store *ChunkStore) *ChunkStoreSource

NewChunkStoreSource creates a chunk source from a local chunk store.

func (*ChunkStoreSource) GetChunk

func (s *ChunkStoreSource) GetChunk(digest [32]byte) ([]byte, error)

GetChunk retrieves a chunk from the local store.

type DataBlob

type DataBlob struct {
	// contains filtered or unexported fields
}

DataBlob represents a stored data blob with optional compression. The raw data contains the magic, CRC, and payload.

func EncodeBlob

func EncodeBlob(data []byte) (*DataBlob, error)

EncodeBlob creates an uncompressed blob from data.

func EncodeCompressedBlob

func EncodeCompressedBlob(data []byte) (*DataBlob, error)

EncodeCompressedBlob creates a compressed blob. Falls back to uncompressed if compression doesn't reduce size.

func (*DataBlob) Bytes

func (b *DataBlob) Bytes() []byte

Bytes returns the raw blob bytes (header + payload).

func (*DataBlob) CRC

func (b *DataBlob) CRC() uint32

CRC returns the stored CRC32 value.

func (*DataBlob) Digest

func (b *DataBlob) Digest() [32]byte

Digest returns the SHA-256 digest of the raw blob.

func (*DataBlob) Equal

func (b *DataBlob) Equal(other *DataBlob) bool

Equal reports whether two blobs have identical raw data.

func (*DataBlob) IsCompressed

func (b *DataBlob) IsCompressed() bool

IsCompressed returns true if the blob uses compression.

func (*DataBlob) IsEncrypted

func (b *DataBlob) IsEncrypted() bool

IsEncrypted returns true if the blob uses encryption.

func (*DataBlob) Magic

func (b *DataBlob) Magic() [8]byte

Magic returns the blob magic number.

func (*DataBlob) Size

func (b *DataBlob) Size() int

Size returns the total size of the raw blob including header.

type DynamicEntry

type DynamicEntry struct {
	EndOffset uint64
	Digest    [32]byte
}

DynamicEntry is a single entry in a dynamic index (40 bytes).

type DynamicIndexHeader

type DynamicIndexHeader struct {
	Magic     [8]byte
	UUID      [16]byte
	Ctime     int64
	IndexCsum [32]byte
}

DynamicIndexHeader is the 4096-byte header for dynamic chunk index files.

func UnmarshalDynamicIndexHeader

func UnmarshalDynamicIndexHeader(data []byte) (DynamicIndexHeader, error)

UnmarshalDynamicIndexHeader parses a DynamicIndexHeader from raw bytes.

func (*DynamicIndexHeader) MarshalTo

func (h *DynamicIndexHeader) MarshalTo(buf []byte)

MarshalTo writes the header to buf (must be at least IndexHeaderSize bytes).

type DynamicIndexReader

type DynamicIndexReader struct {
	// contains filtered or unexported fields
}

DynamicIndexReader reads a dynamic chunk index.

func ReadDynamicIndex

func ReadDynamicIndex(data []byte) (*DynamicIndexReader, error)

ReadDynamicIndex parses a dynamic index from raw bytes.

func (*DynamicIndexReader) CTime

func (r *DynamicIndexReader) CTime() int64

CTime returns the creation timestamp.

func (*DynamicIndexReader) ChunkFromOffset

func (r *DynamicIndexReader) ChunkFromOffset(offset uint64) (int, bool)

ChunkFromOffset returns the chunk index containing the given byte offset. Uses binary search for O(log n) lookup.

func (*DynamicIndexReader) ChunkInfo

func (r *DynamicIndexReader) ChunkInfo(pos int) (ChunkInfo, bool)

ChunkInfo returns the chunk info at position i.

func (*DynamicIndexReader) ComputeCsum

func (r *DynamicIndexReader) ComputeCsum() ([32]byte, uint64)

ComputeCsum computes the SHA-256 checksum over all entry data.

func (*DynamicIndexReader) Count

func (r *DynamicIndexReader) Count() int

Count returns the number of entries.

func (*DynamicIndexReader) Entry

func (r *DynamicIndexReader) Entry(i int) DynamicEntry

Entry returns the entry at position i.

func (*DynamicIndexReader) IndexBytes

func (r *DynamicIndexReader) IndexBytes() uint64

IndexBytes returns the total virtual size (end offset of last entry).

func (*DynamicIndexReader) IndexDigest

func (r *DynamicIndexReader) IndexDigest(pos int) ([32]byte, bool)

IndexDigest returns the digest at position pos.

type DynamicIndexWriter

type DynamicIndexWriter struct {
	// contains filtered or unexported fields
}

DynamicIndexWriter builds a dynamic chunk index.

func NewDynamicIndexWriter

func NewDynamicIndexWriter(ctime int64) *DynamicIndexWriter

NewDynamicIndexWriter creates a new writer with the given creation time.

func (*DynamicIndexWriter) Add

func (w *DynamicIndexWriter) Add(endOffset uint64, digest [32]byte)

Add appends an entry with the given end offset and digest.

func (*DynamicIndexWriter) Finish

func (w *DynamicIndexWriter) Finish() ([]byte, error)

Finish writes the complete index and returns the raw bytes.

type EncryptedBlobHeader

type EncryptedBlobHeader struct {
	Magic [8]byte
	CRC   uint32
	IV    [16]byte
	Tag   [16]byte
}

EncryptedBlobHeader is the 48-byte header for encrypted blobs.

func UnmarshalEncryptedBlobHeader

func UnmarshalEncryptedBlobHeader(data []byte) (EncryptedBlobHeader, error)

UnmarshalEncryptedBlobHeader parses an EncryptedBlobHeader from raw bytes.

func (*EncryptedBlobHeader) MarshalTo

func (h *EncryptedBlobHeader) MarshalTo(buf []byte)

MarshalTo writes the header to buf (must be at least EncryptedBlobHeaderSize bytes).

type FileInfo

type FileInfo struct {
	Filename  string `json:"filename"`
	CryptMode string `json:"crypt-mode,omitempty"`
	Size      uint64 `json:"size"`
	CSum      string `json:"csum"`
}

FileInfo describes a file in a backup manifest.

type FixedIndexHeader

type FixedIndexHeader struct {
	Magic     [8]byte
	UUID      [16]byte
	Ctime     int64
	IndexCsum [32]byte
	Size      uint64
	ChunkSize uint64
}

FixedIndexHeader is the 4096-byte header for fixed chunk index files.

func UnmarshalFixedIndexHeader

func UnmarshalFixedIndexHeader(data []byte) (FixedIndexHeader, error)

UnmarshalFixedIndexHeader parses a FixedIndexHeader from raw bytes.

func (*FixedIndexHeader) MarshalTo

func (h *FixedIndexHeader) MarshalTo(buf []byte)

MarshalTo writes the header to buf (must be at least IndexHeaderSize bytes).

type FixedIndexReader

type FixedIndexReader struct {
	// contains filtered or unexported fields
}

FixedIndexReader reads a fixed-size chunk index.

func ReadFixedIndex

func ReadFixedIndex(data []byte) (*FixedIndexReader, error)

ReadFixedIndex parses a fixed index from raw bytes.

func (*FixedIndexReader) CTime

func (r *FixedIndexReader) CTime() int64

CTime returns the creation timestamp.

func (*FixedIndexReader) ChunkFromOffset

func (r *FixedIndexReader) ChunkFromOffset(offset uint64) (int, bool)

ChunkFromOffset returns the chunk index for the given byte offset.

func (*FixedIndexReader) ChunkInfo

func (r *FixedIndexReader) ChunkInfo(pos int) (ChunkInfo, bool)

ChunkInfo returns chunk info at position pos.

func (*FixedIndexReader) ComputeCsum

func (r *FixedIndexReader) ComputeCsum() ([32]byte, uint64)

ComputeCsum computes the SHA-256 checksum over all digests.

func (*FixedIndexReader) Count

func (r *FixedIndexReader) Count() int

Count returns the number of chunks.

func (*FixedIndexReader) IndexBytes

func (r *FixedIndexReader) IndexBytes() uint64

IndexBytes returns the total virtual size.

func (*FixedIndexReader) IndexDigest

func (r *FixedIndexReader) IndexDigest(pos int) ([32]byte, bool)

IndexDigest returns the digest at position pos.

type FixedIndexWriter

type FixedIndexWriter struct {
	// contains filtered or unexported fields
}

FixedIndexWriter builds a fixed-size chunk index.

func NewFixedIndexWriter

func NewFixedIndexWriter(ctime int64, size, chunkSize uint64) (*FixedIndexWriter, error)

NewFixedIndexWriter creates a writer. ChunkSize must be a power of 2.

func (*FixedIndexWriter) Finish

func (w *FixedIndexWriter) Finish() ([]byte, error)

Finish writes the complete index and returns raw bytes.

func (*FixedIndexWriter) Set

func (w *FixedIndexWriter) Set(i int, digest [32]byte)

Set sets the digest for chunk at index i.

type IndexFile

type IndexFile interface {
	Count() int
	IndexBytes() uint64
	CTime() int64
	ChunkInfo(pos int) (ChunkInfo, bool)
	ChunkFromOffset(offset uint64) (int, bool)
	IndexDigest(pos int) ([32]byte, bool)
	ComputeCsum() ([32]byte, uint64)
}

IndexFile is the common interface for chunk index types.

type Manifest

type Manifest struct {
	BackupType string     `json:"backup-type"`
	BackupID   string     `json:"backup-id"`
	BackupTime int64      `json:"backup-time"`
	Files      []FileInfo `json:"files"`
	Signature  string     `json:"signature,omitempty"`
}

Manifest represents a backup manifest (index.json).

func UnmarshalManifest

func UnmarshalManifest(data []byte) (*Manifest, error)

UnmarshalManifest parses a manifest from JSON.

func (*Manifest) AddFile

func (m *Manifest) AddFile(filename string, size uint64, csum string)

AddFile adds a file entry to the manifest.

func (*Manifest) Marshal

func (m *Manifest) Marshal() ([]byte, error)

Marshal serializes the manifest to JSON.

func (*Manifest) VerifyFile

func (m *Manifest) VerifyFile(filename, csum string, size uint64) error

VerifyFile checks that a file's checksum and size match the manifest.

type Restorer

type Restorer struct {
	// contains filtered or unexported fields
}

Restorer reconstructs files from dynamic indexes using a chunk source.

func NewRestorer

func NewRestorer(source ChunkSource) *Restorer

NewRestorer creates a new restorer with the given chunk source.

func (*Restorer) FileSize

func (r *Restorer) FileSize(idx *DynamicIndexReader) uint64

FileSize returns the total size of the file represented by the index.

func (*Restorer) RestoreFile

func (r *Restorer) RestoreFile(idx *DynamicIndexReader, w io.Writer) error

RestoreFile reconstructs a complete file from a dynamic index. Writes the reconstructed file content to w.

func (*Restorer) RestoreRange

func (r *Restorer) RestoreRange(idx *DynamicIndexReader, offset, length uint64, w io.Writer) error

RestoreRange reconstructs a specific byte range from a dynamic index. Useful for partial reads without downloading the entire file.

type StoreChunker

type StoreChunker struct {
	// contains filtered or unexported fields
}

StoreChunker splits a data stream into variable-size chunks using buzhash content-defined chunking, computes digests, stores chunks via ChunkStore, and builds a DynamicIndexWriter.

func NewStoreChunker

func NewStoreChunker(store *ChunkStore, config buzhash.Config, compress bool) *StoreChunker

NewStoreChunker creates a chunker pipeline. If compress is true, chunks are stored as compressed DataBlobs; otherwise as uncompressed blobs.

func (*StoreChunker) ChunkStream

func (sc *StoreChunker) ChunkStream(r io.Reader) ([]ChunkResult, *DynamicIndexWriter, error)

ChunkStream reads all data from r, splits it into chunks, stores each chunk, and builds a dynamic index. Returns the chunk results and the completed index writer (Finish has NOT been called on it yet).

func (*StoreChunker) ChunkStreamCallback

func (sc *StoreChunker) ChunkStreamCallback(r io.Reader, fn func(ChunkResult) error) ([]ChunkResult, *DynamicIndexWriter, error)

ChunkStreamCallback is like ChunkStream but calls fn for each chunk after it is stored. If fn returns a non-nil error, chunking stops and the error is returned. If fn is nil, no callback is made.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL