aws

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2025 License: Apache-2.0 Imports: 29 Imported by: 1

README

AWS Design

This document describes how the storage implementation for running Tessera on Amazon Web Services is intended to work.

Overview

This design takes advantage of S3 for long term storage and low cost & complexity serving of read traffic, but leverages something more transactional for coordinating writes.

New entries flow in from the binary built with Tessera into transactional storage, where they're held temporarily to batch them up, and then assigned sequence numbers as each batch is flushed. This allows the Add API call to quickly return with durably assigned sequence numbers.

From there, an async process derives the entry bundles and Merkle tree structure from the sequenced batches, writes these to GCS for serving, before finally removing integrated bundles from the transactional storage.

Since entries are all sequenced by the time they're stored, and sequencing is done in "chunks", it's worth noting that all tree derivations are therefore idempotent.

Transactional storage

The transactional storage is implemented with Aurora MySQL, and uses a schema with 3 tables:

SeqCoord

A table with a single row which is used to keep track of the next assignable sequence number.

Seq

This holds batches of entries keyed by the sequence number assigned to the first entry in the batch.

IntCoord

This table is used to coordinate integration of sequenced batches in the Seq table, and keep track of the current tree state.

Life of a leaf

  1. Leaves are submitted by the binary built using Tessera via a call the storage's Add func.
  2. The storage library batches these entries up, and, after a configurable period of time has elapsed or the batch reaches a configurable size threshold, the batch is written to the Seq table which effectively assigns a sequence numbers to the entries using the following algorithm: In a transaction:
    1. selects next from SeqCoord with for update ← this blocks other FE from writing their pools, but only for a short duration.
    2. Inserts batch of entries into Seq with key SeqCoord.next
    3. Update SeqCoord with next+=len(batch)
  3. Newly sequenced entries are periodically appended to the tree: In a transaction:
    1. select seq from IntCoord with for update ← this blocks other integrators from proceeding.
    2. Select one or more consecutive batches from Seq for update, starting at IntCoord.seq
    3. Write leaf bundles to S3 using batched entries
    4. Integrate in Merkle tree and write tiles to S3
    5. Update checkpoint in S3
    6. Delete consumed batches from Seq
    7. Update IntCoord with seq+=num_entries_integrated and the latest rootHash
  4. Checkpoints representing the latest state of the tree are published at the configured interval.

Dedup

Two experimental implementations have been tested which uses either Aurora MySQL, or a local bbolt database to store the <identity_hash> --> sequence mapping. They work well, but call for further stress testing and cost analysis.

Compatibility

This storage implementation is intended to be used with AWS services.

However, given that it's based on services which are compatible with MySQL and S3 protocols, it's possible that it will work with other non-AWS-based backends which are compatible with these protocols.

Given the vast array of combinations of backend implementations and versions, using this storage implementation outside of AWS isn't officially supported, although there may be folks who can help with issues in the Transparency-Dev slack.

Similarly, PRs raised against it relating to its use outside of AWS are unlikely to be accepted unless it's shown that they have no detremental effect to the implementation's performance on AWS.

Alternatives considered

Other transactional storage systems are available on AWS, e.g. Redshift, RDS or DynamoDB. Experiments were run using Aurora (MySQL, Serverless v2), RDS (MySQL), and DynamoDB.

Aurora (MySQL) worked out to be a good compromise between cost, performance, operational overhead, code complexity, and so was selected.

The alpha implementation was tested with entries of size 1KB each, at a write rate of 1500/s. This was done using the smallest possible Aurora instance available, db.r5.large, running 8.0.mysql_aurora.3.05.2.

Aurora (Serverless v2) worked out well, but seems less cost effective than provisioned Aurora for sustained traffic. For now, we decided not to explore this option further.

RDS (MySQL) worked out well, but requires more administrative overhead than Aurora. For now, we decided not to explore this option further.

DynamoDB worked out to be less cost efficient than Aurora and RDS. It also has constraints that introduced a non trivial amount of complexity: max object size is 400KB, max transaction size is {4MB OR 25 rows for write OR 100 rows for reads}, binary values must be base64 encoded, arrays of bytes are marshaled as sets by default (as of Dec. 2024). We decided not to explore this option further.

Documentation

Overview

Package aws contains an AWS-based storage implementation for Tessera.

TODO: decide whether to rename this package.

This storage implementation uses S3 for long-term storage and serving of entry bundles and log tiles, and MySQL for coordinating updates to AWS when multiple instances of a personality binary are running.

A single S3 bucket is used to hold entry bundles and log internal tiles. The object keys for the bucket are selected so as to conform to the expected layout of a tile-based log.

A MySQL database provides a transactional mechanism to allow multiple frontends to safely update the contents of the log.

Index

Constants

View Source
const (
	DefaultPushbackMaxOutstanding = 4096
	DefaultIntegrationSizeLimit   = 5 * 4096

	// SchemaCompatibilityVersion represents the expected version (e.g. layout & serialisation) of stored data.
	//
	// A binary built with a given version of the Tessera library is compatible with stored data created by a different version
	// of the library if and only if this value is the same as the compatibilityVersion stored in the Tessera table.
	//
	// NOTE: if changing this version, you need to consider whether end-users are going to update their schema instances to be
	// compatible with the new format, and provide a means to do it if so.
	SchemaCompatibilityVersion = 1
)

Variables

This section is empty.

Functions

func New

func New(ctx context.Context, cfg Config) (tessera.Driver, error)

New creates a new instance of the AWS based Storage.

Storage instances created via this c'tor will participate in integrating newly sequenced entries into the log and periodically publishing a new checkpoint which commits to the state of the tree.

Types

type Appender added in v0.1.1

type Appender struct {
	// contains filtered or unexported fields
}

Appender is an implementation of the Tessera appender lifecycle contract.

func (*Appender) Add added in v0.1.1

Add is the entrypoint for adding entries to a sequencing log.

type Config

type Config struct {
	// SDKConfig is an optional AWS config to use when configuring service clients, e.g. to
	// use non-AWS S3 or MySQL services.
	//
	// If nil, the value from config.LoadDefaultConfig() will be used - this is the only
	// supported configuration.
	SDKConfig *aws.Config
	// S3Options is an optional function which can be used to configure the S3 library.
	// This is primarily useful when configuring the use of non-AWS S3 or MySQL services.
	//
	// If nil, the default options will be used - this is the only supported configuration.
	S3Options func(*s3.Options)
	// Bucket is the name of the S3 bucket to use for storing log state.
	Bucket string
	// BucketPrefix is an optional prefix to prepend to all log resource paths.
	// This can be used e.g. to store multiple logs in the same bucket.
	BucketPrefix string
	// DSN is the DSN of the MySQL instance to use.
	DSN string
	// Maximum connections to the MySQL database.
	MaxOpenConns int
	// Maximum idle database connections in the connection pool.
	MaxIdleConns int
}

Config holds AWS project and resource configuration for a storage instance.

type MigrationStorage added in v0.1.2

type MigrationStorage struct {
	// contains filtered or unexported fields
}

MigrationStorage implements the tessera.MigrationStorage lifecycle contract.

func (*MigrationStorage) AwaitIntegration added in v0.1.2

func (m *MigrationStorage) AwaitIntegration(ctx context.Context, sourceSize uint64) ([]byte, error)

func (*MigrationStorage) IntegratedSize added in v0.1.2

func (m *MigrationStorage) IntegratedSize(ctx context.Context) (uint64, error)

func (*MigrationStorage) SetEntryBundle added in v0.1.2

func (m *MigrationStorage) SetEntryBundle(ctx context.Context, index uint64, partial uint8, bundle []byte) error

type Storage

type Storage struct {
	// contains filtered or unexported fields
}

Storage is an AWS based storage implementation for Tessera.

func (*Storage) Appender added in v0.1.1

func (*Storage) MigrationWriter added in v0.1.2

MigrationWriter creates a new AWS storage for the MigrationWriter lifecycle mode.

Directories

Path Synopsis
Package aws contains an AWS-based antispam implementation for Tessera.
Package aws contains an AWS-based antispam implementation for Tessera.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL