vindex

package
v0.0.0-...-c54bdec Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 11, 2025 License: Apache-2.0 Imports: 31 Imported by: 1

README

Verifiable Index

Status: Experimental. Goal is to have MVP by 2025Q3. See Milestones for detailed feature progress.

This idea has been distilled from years of experiments with maps, and a pressing need to have an efficient and verifiable way for an end-user to find their data in logs without needing to download the whole log.

Discussions are welcome, please join us on Transparency-Dev Slack.

Overview

The Problem: Verifiability vs. Efficiency

Logs, such as those used in Certificate Transparency or Software Supply Chains, provide a strong foundation for discoverability. You can prove that an entry exists in a log. However, they lack a critical feature: the ability to verifiably query for entries based on their content.

This forces users who need to find specific data, like a domain owner finding their certificates, or a developer finding their software packages, into a painful choice:

  1. Massive Inefficiency: Download and process the entire log, which can be terabytes of mostly irrelevant data, just to find the few entries that matter to you.
  2. Losing Verifiability: Rely on a third-party service to index the data. This breaks the chain of verifiability, as the index operator could, by accident or design, fail to show you all the results. You are forced to trust them.

Neither option is acceptable. Users should not have to sacrifice efficiency for security, or security for efficiency.

The Solution: A Verifiable "Back-of-the-Book" Index

A Verifiable Index resolves this conflict by providing a third option: an efficient, cryptographically verifiable way to query log data.

At its core it works like a familiar index, much like one would find in the back of a book. It maps search terms (like a domain or package name) to the exact locations (pointers) in the main log where that data can be found.

This provides two key guarantees:

  • Efficiency: Users can look up data by a meaningful key and receive a small, targeted list of pointers back, avoiding the need to download the entire log.
  • Verifiability: Every lookup response comes with a cryptographic proof. This proof guarantees that the list of results is complete and that the index operator has not omitted any entries for your query.

The result is a system that extends the verifiability of the underlying log to its queries, preserving the end-to-end chain of trust while providing the efficiency modern systems require.

Applications

This verifiable map can be applied to any log where users have a need to enumerate all values matching a specific query. For example:

  • CT: domain owners wish to query for all certs matching a particular domain
  • SumDB: package owners want to find all releases for a given package

Indices exist for both ecosystems at the moment, but they aren’t verifiable.

Core Idea; TL;DR

The Verifiable Index has 3 data structures involved (and is informally called a Map Sandwich, as the Map sits between two Logs):

  1. The Input Log that is to be indexed
  2. The Verifiable Index containing pointers back into the Input Log
  3. The Output Log that contains a list of all revisions of the map

The Input Log likely aready exists before the Verifiable Index is added, but the Output Log is new, and required in order to make the Verifiable Index historically verifiable. For example, in Certificate Transparency, the Input Log could be any one of the CT Logs. In order to make certificates in a log be efficiently looked up by domain, an operator can spin up Verifiable Index and a corresponding Output Log. The Index would map domain names to indices in the Input Log where the cert is for this domain.

[!TIP] Note that the map doesn't have a "signed map root", i.e. it has no direct analog for a Log Checkpoint. Instead, the state of a Verifiable Index is committed to by including its state as a leaf in the Output Log.

[!NOTE] A Verifiable Index is constructed for a single Input Log. For ecosystems of multiple logs (e.g. CT), there will be as many Verifiable Indices as there are Input Logs.

Constructing

  1. Log data is consumed leaf by leaf
  2. Each log leaf is parsed using a MapFn that specifies all of the keys in the map to which this relates
    1. e.g. for CT, this would be all of the domains that relate to a cert.
    2. The raw domains are not output, but are hashed. If privacy is important, a VRF could be used here.
  3. The output from the MapFn stage represents a sequence of update operations to the map
    1. This output stream can be serialized for data recovery (see Write Ahead Map Transaction Log)
  4. The map is computed for these update operations and a root is calculated
  5. The root hash is written as a new leaf into the Output Log, along with the checkpoint from the Input Log that was consumed to create this revision
  6. The Output Log is witnessed, and an Output Log Checkpoint is made available with witness signatures

Reading

Users looking up values in the map need to know about the MapFn in order to know what the correct key hash is for, e.g. maps.google.com. The values returned for a verifiable point lookup under this key would be a list of uint64 values that represent indices into the log. To find the certs for these values, the original log is queried at these indices.

Given a key to read, a read operation needs to return:

  • A witnessed Output Log Checkpoint
  • The latest value in this log, with an inclusion proof to the Output Log Checkpoint
    • The value in this log commits to the Input Log state, and also contains a Verifiable Index root hash
  • The value at the given key in the Verifiable Index, and an inclusion proof

Verifying this involves verifying the following chain:

  • The Output Log Checkpoint is signed by the Map Operator, and sufficient witnesses
  • The inclusion proof in the Output Log: this ties the Map Root Hash to the Output Log Checkpoint
  • The inclusion proof in the Verifiable Index: this ties the indices returned to the key and the Map Root Hash

Verifying

The correct construction of the map can be verified by any other party. The only requirement is compute resources to be able to build the map, and a clear understanding of the MapFn (hence the importance for this to be universally specified). The verifier builds a map at the same size as the verifiable index and if the map checkpoint has the same root hash then both maps are equivalent and the map has been verified for correct construction.

Sub-Problems

MapFn Specified in Universal Language

The verifiable index is constructed by taking each leaf in turn from the Input Log. Each leaf entry is "mapped" (the terminology is borrowed from MapReduce), which outputs all of the locations in the index that this leaf should be found. For example, in CT this would take a []byte, parse it as a certificate, and output the domains that this cert covers.

Being able to specify which keys are relevant to any particular entry in the log is critical to allow a verifier to check for correct construction of the map. Ideally this MapFn would be specified in a universal way, to allow the verifier to be running a different technology stack than the core map operator. i.e. having the Go implementation be the specification is undesirable, as it puts a lot of tax on the verifier to reproduce this behaviour identically in another environment.

The current plan is to use WASM (go docs), however other options could be considered (e.g. a formal spec, or a functional language that can be transpiled).

The current implementation in map.go takes a MapFn interface:

// MapFn takes the raw leaf data from a log entry and outputs the SHA256 hashes
// of the keys at which this leaf should be indexed under.
// A leaf can be recorded at any number of entries, including no entries (in which case an empty slice must be returned).
//
// MapFn is expected to consume any error states that it encounters in some way that
// makes sense to the particular ecosystem. This might mean outputting any invalid leaves
// at a known locations (e.g. all 0s), or not outputting any entry. Any panics will cause
// the mapping process to terminate.
type MapFn func([]byte) [][sha256.Size]byte

Clients currently pass in implementations purely written in Go, however the Milestones tracks adding WASM support.

[!IMPORTANT] This describes the MapFn as returning key hashes. We may want to have the map return the raw key (e.g. maps.google.com) so that a prefix trie can be constructed. See https://github.com/transparency-dev/incubator/issues/33 for discussion and to provide feedback.

[!IMPORTANT] The MapFn is fixed for the life of the Verifiable Index. There are strategies that could be employed to allow updates, but these are out of scope for any early drafts.

Write Ahead Map Transaction Log

Having a compact append-only transaction log allows the map process to restart and pick up from where it last crashed efficiently. It also neatly divides the problem space: before this stage you have downloading logs and applying the MapFn, and after this stage you have the challenges of maintaining an efficient map data structure for updates and reads.

The core idea is to output at most a single record (row) for each entry in the log. A valid row has:

  1. the first token being the log index (string representation of uint64)
  2. the following (optional) space-separated values being the key hashes under which this index should appear
  3. a newline terminator

Some use cases may have lots of entries in the log that do not map to any value, and so this supports omitting a log index if it has no updates required in the map. However, an empty value can be output as a form of sentinel to provide a milepost on restarts that prevents going back over large numbers of empty entries from the log.

0 HEX_HASH_1 HEX_HASH_2
2 HEX_HASH_52
3 HEX_HASH_99
4
6 HEX_HASH_2

In the above example, there is no map value for the entries at index 1, 4, or 5 in the log. It is undetermined whether index 7+ is present, and thus anyone replaying this log would need to assume that this needs to be recomputed starting from index 7.

Turning Log into Efficient Map

The WAL can be transformed directly into the data structures needed for serving lookups. This is implemented using two data structures that are maintained in lockstep:

Keys in the map are hashes, according to whatever strategy MapFn returns. Values are an ordered list of indices.

[!NOTE] The current mechanism for hashing the list of indices writes all values out as a single block, and hashes this with a single SHA256. An alternative would be to build a Merkle tree of these values. This would be slightly more complex conceptually, but could allow for incremental updating of values, and more proofs.

Status

Known applications:

Milestones

# Step Status
1 Public code base and documentation for prototype
2 Implementation of in-memory Merkle Radix Tree
3 Incremental update
4 Verify that mapped data matches Input Log Checkpoint
5 Output log
6 Proofs served on Lookup
7 Storage backed verifiable-map
8 MapFn defined in WASM
9 Support reading directly from Input Log instead of Clone
10 Example written for mapping SumDB
11 Example written for hosting a log and VIndex together
12 Example written for mapping CT ⚠️
13 Promote this from the incubator repo
N Production ready

Note that a storage-backed map needs to be implemented before this can be applied to larger logs, e.g. CT.

Documentation

Overview

vindex contains a prototype of an in-memory verifiable index. It reads data from an InputLog interface, applies a MapFn to every leaf in the input log, and writes the mapped information out to a Write Ahead Log. Data is read from the WAL, and the in-memory map is built from this.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func MarshalLeaf

func MarshalLeaf(vindexRootHash [sha256.Size]byte, inLogCp []byte) []byte

MarshalLeaf creates the leaf to be committed to the Output Log given the root hash of the verifiable index, and the checkpoint from the Input Log.

func UnmarshalLeaf

func UnmarshalLeaf(leaf []byte) ([sha256.Size]byte, []byte, error)

UnmarshalLeaf returns the root hash of the Verifiable Index and the checkpoint from the Input Log by unmarshalling a leaf from the Output Log, previously marshalled with MarshalLeaf.

Types

type InputLog

type InputLog interface {
	// Checkpoint returns the latest checkpoint committing to the input log state.
	Checkpoint(ctx context.Context) (checkpoint []byte, err error)
	// Parse unmarshals and verifies a checkpoint obtained from GetCheckpoint.
	Parse(checkpoint []byte) (*log.Checkpoint, error)
	// Leaves returns all the leaves in the range [start, end), outputting them via
	// the returned iterator.
	Leaves(ctx context.Context, start, end uint64) iter.Seq2[[]byte, error]
}

InputLog represents a connection to the input log from which map data will be built. This can be a local or remote data source.

type MapFn

type MapFn func([]byte) [][sha256.Size]byte

MapFn takes the raw leaf data from a log entry and outputs the SHA256 hashes of the keys at which this leaf should be indexed under. A leaf can be recorded at any number of entries, including no entries (in which case an empty slice must be returned).

MapFn is expected to consume any error states that it encounters in some way that makes sense to the particular ecosystem. This might mean outputting any invalid leaves at a known locations (e.g. all 0s), or not outputting any entry. Any panics will cause the mapping process to terminate.

type Options

type Options struct {
	PersistIndex bool
}

type OutputLog

type OutputLog interface {
	// GetCheckpoint returns the latest checkpoint committing to the output log state.
	Checkpoint(ctx context.Context) (checkpoint []byte, err error)
	// Parse unmarshals and verifies a checkpoint obtained from GetCheckpoint.
	Parse(checkpoint []byte) (*log.Checkpoint, error)
	// Append adds a new leaf and returns the checkpoint that commits to it.
	Append(ctx context.Context, data []byte) (idx uint64, checkpoint []byte, err error)
	// Lookup fetches the data, with a proof, at the given index and tree size.
	Lookup(ctx context.Context, idx, size uint64) ([]byte, [][sha256.Size]byte, error)
}

OutputLog is where map roots are written as leaves.

func NewOutputLog

func NewOutputLog(ctx context.Context, outputLogDir string, s note.Signer, v note.Verifier) (log OutputLog, closer func(), err error)

outputLogOrDie returns an output log using a POSIX log in the given directory.

type VerifiableIndex

type VerifiableIndex struct {
	// contains filtered or unexported fields
}

VerifiableIndex manages reading from the input log, mapping leaves, updating the WAL, reading the WAL, and keeping the state of the in-memory index updated from the WAL.

func NewVerifiableIndex

func NewVerifiableIndex(ctx context.Context, inputLog InputLog, mapFn MapFn, outputLog OutputLog, rootDir string, opts Options) (*VerifiableIndex, error)

NewVerifiableIndex returns an IndexBuilder that pulls entries from the given inputLog, determines indices for each one using the mapFn, and then writes the entries out to a Write Ahead Log at the given path. Note that only one IndexBuilder should exist for any given walPath at any time. The behaviour is unspecified, but likely broken, if multiple processes are writing to the same file at any given time.

func (*VerifiableIndex) Close

func (b *VerifiableIndex) Close() error

Close ensures that any open connections are closed before returning.

func (*VerifiableIndex) Lookup

Lookup returns the values stored for the given key.

func (*VerifiableIndex) Update

func (b *VerifiableIndex) Update(ctx context.Context) error

Update checks the input log for a new Checkpoint, and ensures that the Verifiable Index is updated to the corresponding size.

Directories

Path Synopsis
api contains the API definitions for the prototype VIndex API.
api contains the API definitions for the prototype VIndex API.
client contains a library for interacting with a Verifiable Index.
client contains a library for interacting with a Verifiable Index.
cmd
client command
client defines a binary that looks up a key in a vindex, verifying all proofs.
client defines a binary that looks up a key in a vindex, verifying all proofs.
logandmap command
logandmap is a binary that serves as a demo of how to run a log and a map in the same process.
logandmap is a binary that serves as a demo of how to run a log and a map in the same process.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL