sqlite

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2026 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package sqlite is the reference implementation of core.StoreIndex backed by SQLite.

Two concrete SQLite drivers are supported via build tags. Without any tag the package uses modernc.org/sqlite — a pure-Go implementation that requires no C toolchain and cross-compiles trivially. The build tag `sqlite_cgo` selects mattn/go-sqlite3, which links against the system SQLite library through cgo and is noticeably faster on write-heavy workloads. The choice happens at build time; the package's public API is identical either way.

File-based and in-memory databases are both supported. Pass ":memory:" as the path to Open to create a private in-memory database; this is the recommended setup for unit tests of higher layers.

Concurrency:

  • All Index methods are safe for concurrent use.
  • Mutating methods open and commit their own transactions internally; the caller never drives transactions explicitly.
  • Concurrent writers are coordinated by SQLite via busy_timeout (default 5s); a contention that exceeds the timeout is reported as errs.ErrLeaseHeld with a wrapped sqlite busy error and emitted as the index.contention_error event.

Schema migrations are applied automatically on Open. Forward-only: downgrades are not supported. Schema versions live in the schema_version table; a mismatch between the embedded current version and the on-disk version returns errs.ErrIndexSchemaMismatch.

DAG: this package imports core (the StoreIndex contract), driver (capabilities — read-only), and event (metric events). It does not import plugin, curator, agent, or higher layers.

Index

Constants

View Source
const CurrentSchemaVersion = 2

CurrentSchemaVersion is the schema version this build of the package writes and expects to read. Bumped whenever a migration is added to migrations[].

View Source
const DefaultBusyTimeout = 5 * time.Second

DefaultBusyTimeout is the default value applied via PRAGMA busy_timeout when no WithBusyTimeout option is supplied. Five seconds covers practically every legitimate writer contention without hiding real deadlocks for too long.

Variables

This section is empty.

Functions

This section is empty.

Types

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index is the SQLite-backed implementation of core.StoreIndex. Construct via NewStore; Close when done.

func NewStore

func NewStore(ctx context.Context, path string, opts ...index.IndexOption) (*Index, error)

NewStore opens (or creates) a SQLite-backed StoreIndex at the given path. Use ":memory:" for a private in-memory instance.

Accepts the umbrella index.IndexOption type — the package itself does not expose backend-specific options on its public API. Tunables like busy_timeout and journal/sync modes use safe defaults; tests inside this package may override them through internal helpers.

On a fresh database the schema is created at CurrentSchemaVersion. On an existing database the schema version is checked; missing migrations are applied forward-only. A version newer than CurrentSchemaVersion returns errs.ErrIndexSchemaMismatch.

The signature carries ctx and error even though the docs at 3. Contracts/02 §2.4.1 show a simplified form without them. Opening SQLite is real I/O: it can fail on bad paths, permission errors, mid-flight migrations, or mmap limits, and migrations are long-running and deserve cancellation. Doc amendment tracked separately.

func (*Index) Close

func (i *Index) Close() error

Close releases the underlying database/sql handle. Idempotent: the StoreIndex contract requires repeat Close calls to succeed, while database/sql.DB.Close itself errors on the second call — sync.Once captures the first outcome and returns it forever.

Registered extensions are closed in the reverse order of registration. Errors from extension Close are swallowed (logged would be the proper behaviour once a logger is wired in) — they must not prevent the underlying DB from being released.

func (*Index) DeleteManifest

func (i *Index) DeleteManifest(ctx context.Context, artifactID domain.ArtifactID, blobRefs []string) error

DeleteManifest performs the logical deletion of an artifact. Single transaction:

  1. Read the (artifact_id, blob_ref) edges from manifest_blobs.
  2. Decrement ref_count for each referenced blob.
  3. Delete the manifest_blobs edges.
  4. Delete the manifests row.

blobRefs argument: the caller passes the same set it intends to be decremented. Mismatches between manifest_blobs and blobRefs surface as a fatal error: the index has diverged from the caller's view, and continuing would corrupt ref_counts. RebuildIndex is the recovery tool.

Idempotency: deleting an already-deleted artifact is a no-op (returns nil) — DELETE FROM manifests with rows-affected=0 is not an error. Source-of-truth for "already deleted" is the manifests table, not manifest_blobs: Inline manifests have no edges in manifest_blobs by design (§9.2.1), so checking that table for "exists" gives the wrong answer for them.

func (*Index) DeletePacked

func (i *Index) DeletePacked(ctx context.Context, packBlobRef string) error

DeletePacked removes every packed_blobs row whose pack_blob_ref matches. Called by the GC Agent right before tombstoning a pack volume whose ref_count has dropped to zero (every packed entry has been logically deleted, the pack is now an orphan).

The pack's own row in `blobs` is NOT touched by this method: pack entries and pack metadata are different things, and the GC Agent removes them in separate, well-defined steps. This method owns only the `packed_blobs` cleanup.

Idempotent: a missing pack_blob_ref returns nil.

func (*Index) ExistsByContent

func (i *Index) ExistsByContent(ctx context.Context, hash domain.ContentHash, originalSize int64) (string, bool, error)

ExistsByContent is the deduplication primitive. It looks up a blob by the composite key (content_hash, original_size). The pair, not just the hash alone, because two distinct files of different sizes may share a hash prefix collision in pathological inputs — a defensive choice the format makes globally.

Returns (blobRef, true, nil) when found; ("", false, nil) when absent; and ("", false, err) for unexpected failures.

func (*Index) ExistsByHash

func (i *Index) ExistsByHash(ctx context.Context, hash domain.ContentHash) (domain.BlobExistStatus, error)

ExistsByHash is the chunk-deduplication primitive used by chunker.Wrapper. Unlike ExistsByContent it does not check size, because chunks are anonymous and the chunker has no manifest metadata to compare against — but it DOES distinguish a normal blob from a tombstoned one.

At the index level we don't currently track tombstones — the driver does (see localfs.MarkTombstone). The StoreIndex contract returns BlobIsTombstone when the index has a record marked as such; for now there is no schema field for it, so we always return BlobNotFound or BlobExists. The future schema migration that adds a tombstone column will make this method richer without changing its signature.

This is a deliberate gap, not a bug. The current architecture uses the index for liveness (ref_count > 0) and the driver for physical state. Until M3.2 (GC) ties them together, BlobIsTombstone returns are not produced.

func (*Index) Extensions

func (i *Index) Extensions() index.ExtensionRegistry

Extensions returns the registry for installing index extensions against this backend. Method on the concrete *Index type rather than on core.StoreIndex — see ADR-49 for the rationale (avoids a core ↔ index import cycle and respects backends that don't support extensions).

Implements index.ExtensionHost.

func (*Index) GetBySession

func (i *Index) GetBySession(ctx context.Context, sessionID string) ([]domain.ArtifactID, error)

GetBySession returns every ArtifactID with the given SessionID. Used by RollbackSession; the result set is small in practice (one user session, dozens to hundreds of artifacts), so we materialise it into a slice rather than streaming via callback.

An empty SessionID guarded at the engine level (errs.ErrEmptySessionID). The index itself does NOT enforce that — it would be a useful last-line check, but consistency demands the index honour any query the caller passes. The engine's RollbackSession is the place where mass-delete safety lives.

func (*Index) GetMeta

func (i *Index) GetMeta(ctx context.Context, key string) (string, error)

GetMeta reads a value from store_meta. A missing key returns errs.ErrMetaKeyNotFound.

Engine consumers (descriptor cache, last_orphan_scan_at, schema notes) treat store_meta as a typed singleton namespace; this method intentionally returns the raw string and lets the caller parse. Keeping serialisation out of the index keeps the store_meta contract trivial — encode/decode lives where the typed field lives.

func (*Index) GetRefCount

func (i *Index) GetRefCount(ctx context.Context, blobRef string) (int, error)

GetRefCount returns the current reference count of a blob. A missing blob returns errs.ErrArtifactNotFound — same rationale as Resolve: the index either has the blob or it does not.

Returning 0 on a missing blob (instead of an error) was tempting for "it's just a number, callers can treat it as no references" — but it would hide the difference between "blob is dead, GC can reap" and "blob never existed". Two very different conditions.

func (*Index) IndexManifest

func (i *Index) IndexManifest(
	ctx context.Context,
	m domain.Manifest,
	addr domain.PhysicalAddress,
	chunkRefs []string,
	packedEntries []domain.PackedEntry,
) error

IndexManifest registers an artifact in the index. Branches on manifest.Type:

  • blob: upserts the blob row, increments ref_count, inserts the manifest row, links manifest -> blob.
  • toc: same as blob plus increments ref_count for each chunkRef and links manifest -> chunks (positional).
  • pack: registers the pack itself as one blob and inserts a row into packed_blobs for each entry; manifests of packed artifacts are NOT inserted into the manifests table — packed artifacts are reachable through LookupPacked, not through Walk.

All work happens inside a single transaction; partial registration is impossible.

func (*Index) ListByNamespace

func (i *Index) ListByNamespace(
	ctx context.Context,
	ns string,
	cb func(domain.Manifest) error,
) error

ListByNamespace iterates over manifests whose namespace matches the filter. The callback is invoked once per manifest in (namespace, created_at) order; cancelling via errs.ErrStopWalk or any other error from the callback stops the iteration.

Filter semantics match the contract of Walk in core.DataStore:

  • "*" — every user namespace; system.* is excluded
  • "" — only the default (empty) namespace
  • <other> — exactly that namespace

Pack manifests are NEVER included; they live in packed_blobs and are reachable through LookupPacked instead. The manifests table already excludes them by construction (indexPackManifest does not insert a row), so this method does not need an explicit type filter — but the SQL keeps one for defence in depth.

func (*Index) ListExtensions

func (i *Index) ListExtensions() []index.ExtensionInfo

ListExtensions enumerates currently-registered extensions, returning each one's name and persisted schema version. Names appear in unspecified order — callers wanting deterministic listings sort the result. Useful for diagnostics and stats endpoints; not part of any contract surface.

Returns an empty slice (never nil) when no extensions are registered.

Implements index.ExtensionLister.

func (*Index) ListOrphanBlobs

func (i *Index) ListOrphanBlobs(
	ctx context.Context,
	cb func(blobRef string) error,
) error

ListOrphanBlobs iterates over blobs with ref_count = 0. Used by the GC Agent's Mark phase — every entry is a deletion candidate.

We rely on the partial index `blobs_orphan` (defined in schemaV1: ON blobs(ref_count) WHERE ref_count = 0) so the scan is cheap even on very large blob tables. SQLite uses the partial index automatically when the query predicate matches.

func (*Index) ListUnverified

func (i *Index) ListUnverified(ctx context.Context, before time.Time, cb func(blobRef string) error) error

ListUnverified iterates over blobs whose last_verified_at is strictly older than `before`, plus blobs that have never been scrubbed (last_verified_at IS NULL). Used by the Scrub Agent; the `before` cutoff is computed by the agent as now() - StoreConfig.MaxAge, possibly shifted upward for blobs on a CapNativeChecksum medium.

NULL last_verified_at means "never verified" — those rows take priority and always come first under the ORDER BY (SQLite sorts NULLs first ASC by default).

Order is by last_verified_at ascending: oldest first, which is what the scrub schedule wants. RFC 3339 second-precision strings (UTC) sort lexicographically the same as chronologically.

func (*Index) LookupPacked

func (i *Index) LookupPacked(ctx context.Context, artifactID domain.ArtifactID) (domain.PackedBlobInfo, bool, error)

LookupPacked returns the range-read information for an artifact stored inside a .pack volume. The boolean second result is the "found" flag — false (not an error) means the artifact lives outside any pack; the caller should reach for Resolve(BlobRef) instead.

On the read path, the engine consults LookupPacked first because it is the only way to know whether to open a sliced range read or a full blob. A missing packed_blobs row is the normal case: most artifacts are not packed.

func (*Index) ManifestExists

func (i *Index) ManifestExists(ctx context.Context, id domain.ArtifactID) (bool, error)

ManifestExists reports whether a manifest row with the given ArtifactID exists. Cheap point-lookup: SELECT 1 ... LIMIT 1 against the primary key. Returns (false, nil) when the row is absent — the caller distinguishes "not present" from "infrastructure error" via the boolean.

func (*Index) MarkVerified

func (i *Index) MarkVerified(ctx context.Context, blobRef string, timestamp time.Time) error

MarkVerified records that a Scrub Agent has just finished a successful checksum verification of blobRef. The timestamp is the moment the verification completed; future scrubs use it to prioritise the oldest-verified blobs first.

A missing blob is a no-op rather than an error: by the time the Scrub Agent reaches a blob, the GC may have already removed it in a parallel cycle. Failing here would create useless noise in scrub logs without helping anything.

func (*Index) RebindBlob

func (i *Index) RebindBlob(ctx context.Context, blobRef string, newAddr domain.PhysicalAddress) error

RebindBlob updates a blob's physical address after a successful Drain (HostStorage transit -> Location). ref_count and other counters are untouched. Idempotent: a missing blob_ref is a no-op (returns nil) — the same Drain may be retried after a crash once the rebind has already committed.

Wrapped in a tx so subscribed extensions can react atomically with the main update. A missing blob_ref still completes successfully (no rows affected → no event dispatched, since extensions saw nothing change).

func (*Index) Resolve

func (i *Index) Resolve(ctx context.Context, blobRef string) (domain.PhysicalAddress, error)

Resolve returns the physical address of a blob. It is the hot-path call on every Get; performance matters but correctness matters more — a stale address means reading a different file or a deleted one.

A missing blob_ref returns errs.ErrArtifactNotFound. The choice of sentinel deserves a note: ErrArtifactNotFound is the engine-level "this thing is not here" error; from the StoreIndex perspective there is no separate "blob not found" — the index either knows where to find a blob or it does not.

func (*Index) SchemaVersion

func (i *Index) SchemaVersion(ctx context.Context) (int, error)

SchemaVersion returns the version currently recorded on disk. Useful for diagnostics and tests.

func (*Index) SetMeta

func (i *Index) SetMeta(ctx context.Context, key string, value string) error

SetMeta writes (or overwrites) a value in store_meta. The whole upsert is one statement; concurrent writers go through SQLite's busy_timeout machinery without us doing anything special.

func (*Index) VacuumInto

func (i *Index) VacuumInto(ctx context.Context, destPath string) error

VacuumInto creates a snapshot copy of the database at destPath. Used by the Snapshot Agent: a snapshot is a full self-contained SQLite file that RebuildIndexAgent can later open and replay.

SQLite's `VACUUM INTO` runs in a single transaction and produces a defragmented copy. It does NOT interrupt regular reads/writes to the source database — readers proceed against the live WAL while the vacuum streams pages.

destPath must point to a non-existent file. SQLite's VACUUM INTO refuses to overwrite. We deliberately do not pre-delete: silently overwriting a snapshot would mask an upstream bug where two SnapshotAgents fight over the same path.

:memory: source is rejected — there is no on-disk content to snapshot. The Snapshot Agent should never call this on a memory index, but the explicit error is friendlier than a confusing SQLite-level failure.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL