tatami

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 26, 2026 License: MIT Imports: 26 Imported by: 0

README

tatami

ci Release Go Reference Go Report Card License

tatami (畳) is a single-file columnar storage format for web-scale crawl and search. One .tatami file is a self-describing mat of compressed columns: it stores crawled documents compactly, reads them back fast with column projection, and doubles as a search segment. Think of it as a Parquet cousin tuned for two jobs at once, a cold document store and a hot inverted index, both behind one file layout and one reader.

The format is built for the rest of the fleet. A crawler like ami or a corpus like Common Crawl via ccrawl-cli writes billions of pages into many .tatami files, a manifest stitches them into one logical collection, and a reader serves point lookups, column scans, and keyword queries off the same bytes.

Full documentation, with guides and the complete reference, is at tatami.tamnd.com.

Status

Complete. The format and the search engine are both implemented and proven on real Common Crawl data. A .tatami file is a stable, self-describing container with an encoding cascade, blob separation, shared trained dictionaries, zone maps, bloom filters, and a sparse key index. A manifest stitches many files into one collection, the convert command brings existing Parquet shards in, and the search-segment role adds an inverted index with BM25 ranking and block-max WAND retrieval. On a real shard, 20246 documents and 1.4 million terms, keyword queries return with a p99 of 237 microseconds; served across twenty segments at once, fan-out retrieval stays at a p99 of 465 microseconds, more than twenty times under the ten-millisecond goal the format was built to hit. A search-only segment drops the document body it never serves and keeps a short snippet, 42.8 percent smaller on a real shard with byte-identical retrieval, and an aggregator tier fans a query out to many leaf brokers and merges an exact fleet-wide top-k, projecting a p99 of 1.29 milliseconds at a hundred thousand shards. The tatami serve command puts an HTTP server over a lock-free broker with admission control, a per-request deadline, and a smart segment cache sized to hold the working set: single-keyword serving holds a p99 of about 1.4 milliseconds at over 31,000 queries per second on real data, the resident memory stays bounded by the cache cap under thousands of concurrent queries, and a concurrent answer is exact, identical to the single-threaded one.

Install

brew install tamnd/tap/tatami       # macOS and Linux
go install github.com/tamnd/tatami/cmd/tatami@latest

Prebuilt binaries, .deb/.rpm/.apk packages, a multi-arch GHCR image, and checksums ship on releases. See installation for every channel.

Quick start

The CLI reads .tatami files. Point it at one to see the layout:

tatami inspect data.tatami
file:    data.tatami
version: 1.0
rows:    3
groups:  1
role:    document-store
size:    100 compressed / 85 uncompressed (0.85x)
columns:
  url      string  enc=plain codec=zstd values=3 nulls=0 pages=1
  status   int32   enc=plain codec=zstd values=3 nulls=0 pages=1
  title    string  enc=plain codec=zstd values=3 nulls=1 pages=1

Dump rows to JSONL, with optional column projection and a row limit:

tatami cat data.tatami --columns url,status --limit 10

Serve a directory of search segments over HTTP, then query it:

tatami serve ./segments --addr :8080
curl 'localhost:8080/search?q=open+source+software&k=10'

The command set is inspect, cat, convert, collection (alias col, with add/list/compact), and serve. Run tatami <command> --help for flags, or see the CLI reference.

Writing a file

The Go API takes typed columns a batch at a time and streams row groups to disk:

schema, _ := tatami.NewSchema(
	tatami.Field{Name: "url", Type: tatami.TypeString, SortKey: true},
	tatami.Field{Name: "status", Type: tatami.TypeInt32},
	tatami.Field{Name: "title", Type: tatami.TypeString, Nullable: true},
)

w, f, _ := tatami.Create("data.tatami", schema, tatami.WriterOptions{})
defer f.Close()

w.Append(tatami.Batch{Columns: []tatami.Column{
	{Data: []string{"https://a/1", "https://b/2"}},
	{Data: []int32{200, 404}},
	{Data: []string{"Alpha", ""}, Valid: []bool{true, false}},
})
w.Close()

Reading back is column-oriented, so you only pay for the columns you touch:

r, f, _ := tatami.OpenFile("data.tatami")
defer f.Close()

for g := 0; g < r.NumRowGroups(); g++ {
	col, _ := r.ReadColumn(g, 0) // the url column
	urls := col.Data.([]string)
	_ = urls
}

How it works

A .tatami file is laid out as a fixed 64-byte header, a run of row groups, an optional blob region for separated large payloads, optional dictionary and index regions, then a footer directory and a short trailer. The footer is written last and carries the full schema and the byte offsets of every column chunk, so opening a file is one read of the tail followed by seeks straight to the columns a query needs. The magic TAT1 sits at both ends, every page header is uncompressed so a reader can stride over pages without decoding them, and a CRC32C guards each page payload and the footer.

Two roles share the layout. A document-store file holds the crawled columns as written. A search-segment file (a header flag bit) adds an inverted region so the same reader can answer keyword queries. A search-only segment drops the body once the postings are built and keeps a snippet, a fraction of the size with identical retrieval.

Past one file, a Cluster broker serves a fan of cold shards behind one query: a routing index keeps it to the shards that can contribute, and global statistics make the merged top-k exact rather than approximate. An Aggregator fans a query across many brokers and merges one fleet-wide top-k, and tatami serve runs a broker over HTTP, answering thousands of concurrent queries without a shared lock behind a smart segment cache, admission control, and a per-request deadline. The design notes live in the spec; the implementation notes track each milestone as it lands.

License

MIT. See LICENSE.

Documentation

Overview

Package tatami implements a compact columnar single-file storage format for web-scale crawl and search. A tatami file holds a header, a run of row groups (one column chunk per column per group, each a sequence of pages), an optional blob region for separated large payloads, optional dictionary and index regions, and a footer directory written last so a reader learns the whole layout from one tail read.

This file pins the format-wide constants from Spec 2066: the magic, the version, the enum spaces for logical types, encodings, block codecs and checksums, and the fixed 64-byte header. Everything else in the package is built on the values declared here.

Index

Constants

View Source
const (
	VersionMajor uint16 = 1
	VersionMinor uint16 = 1
)

Format version. A reader refuses a major it does not implement; a minor bump only adds optional footer sections or new enum values, which old readers tolerate for the parts they understand.

View Source
const (
	FlagSorted         uint16 = 1 << 0
	FlagHasBlobRegion  uint16 = 1 << 1
	FlagHasDictRegion  uint16 = 1 << 2
	FlagHasIndexRegion uint16 = 1 << 3
	FlagRoleSearchSeg  uint16 = 1 << 4
)

Header flag bits.

View Source
const (
	DefaultRowGroupMaxRows  = 128 * 1024
	DefaultRowGroupMaxBytes = 256 << 20
	DefaultPageMaxValues    = 64 * 1024
)

Default row-group and page sizes from the format canon. A row group flushes at whichever of the two limits it hits first; pages cap at a value count.

View Source
const DefaultCacheSize = 64

DefaultCacheSize is the open-segment cap when ClusterOptions leaves it zero. It is the working-set size a broker keeps resident, deliberately tiny next to the shard count.

View Source
const DefaultMaxInFlight = 256

DefaultMaxInFlight is the admission cap when ServerOptions leaves it zero. It is the number of queries allowed to run at once, the knob that trades throughput against the memory and CPU a burst can claim.

View Source
const DefaultMaxK = 100

DefaultMaxK caps the k a request may ask for, so a single query cannot force an unbounded result set and the per-query memory stays bounded along with the admission cap.

View Source
const DefaultQueryTimeout = 2 * time.Second

DefaultQueryTimeout is the per-request deadline when ServerOptions leaves it zero. A query that does not finish inside it returns 504, freeing the admission slot. It is generous next to the sub-10ms warm path so it fires only on a real stall, not on normal cold-shard variance.

View Source
const DefaultSnippetRunes = 200

DefaultSnippetRunes is the snippet length a search-only builder keeps when its options leave the length zero. It is a lead excerpt long enough to render a result row, far short of the body it replaces.

View Source
const HeaderSize = 64

HeaderSize is the fixed size of the file header in bytes.

View Source
const Magic = "TAT1"

Magic is the 4-byte marker at the start and end of every tatami file.

View Source
const ManifestName = "tatami.manifest"

ManifestName is the catalog file at a collection root.

View Source
const PageHeaderSize = 32

PageHeaderSize is the fixed size of a page header in bytes. It is uncompressed so a reader can stride over pages without decoding them.

View Source
const TrailerSize = 12

TrailerSize is the fixed size of the bytes after the footer: footer length (u32), footer CRC32C (u32), and the end magic (4 bytes).

Variables

View Source
var ErrClosed = errors.New("tatami: writer is closed")

ErrClosed is returned when a Writer is used after Close.

View Source
var ErrNotFound = errors.New("tatami: not found")

ErrNotFound is returned by point-lookup paths when a key is absent. The CLI maps it to a distinct exit code so callers can tell a clean miss from a real failure, matching the fleet convention.

Functions

func MergeSegments

func MergeSegments(segs []*SearchSegment, outPath string, opts WriterOptions) error

MergeSegments merges the live documents of the given open search segments into one new search-segment file at outPath, honoring each segment's in-memory deletions. Dense doc ids are reassigned in a single ascending pass over the inputs in order, so each term's concatenated postings stay sorted; the merged segment's posting lists, skip tables, and term dictionary are rebuilt from scratch. The inputs are left untouched; the caller retires them after a successful merge.

Types

type AggStats added in v0.2.0

type AggStats struct {
	Leaves        int // leaves queried
	Candidates    int // candidate shards across all leaves
	ShardsVisited int // shards actually opened and scored across all leaves
}

AggStats reports how a fan-out query was answered: how many leaves it touched, and the totals of the per-leaf routing and pruning. ShardsVisited well below Candidates is the pruning that keeps the budget; Candidates well below the fleet shard count is the routing that keeps it.

type Aggregator added in v0.2.0

type Aggregator struct {
	// contains filtered or unexported fields
}

Aggregator fans a query out to a set of leaf Clusters and merges their results into a fleet-wide top-k. It holds the fleet statistics every leaf is scored against. It is safe for concurrent queries to the extent its leaves are: each leaf serializes internally, and a query touches each leaf once.

func OpenAggregator added in v0.2.0

func OpenAggregator(leaves []*Cluster) *Aggregator

OpenAggregator returns an aggregator over the given leaf clusters, precomputing the fleet statistics. The leaves keep their own routing and open-segment caches; the aggregator adds only the fan-out and the merge.

func (*Aggregator) Close added in v0.2.0

func (a *Aggregator) Close() error

Close closes every leaf cluster.

func (*Aggregator) NumDocs added in v0.2.0

func (a *Aggregator) NumDocs() int

NumDocs is the fleet-wide live document count.

func (*Aggregator) NumLeaves added in v0.2.0

func (a *Aggregator) NumLeaves() int

NumLeaves is how many leaf clusters the aggregator fans out to.

func (*Aggregator) NumShards added in v0.2.0

func (a *Aggregator) NumShards() int

NumShards is the total shard count across every leaf.

func (*Aggregator) Search added in v0.2.0

func (a *Aggregator) Search(query string, k int) ([]SearchResult, AggStats, error)

Search fans the query out to every leaf concurrently, each leaf routing, pruning, and scoring its own shards against the fleet statistics, then merges the leaves' partial top-k lists into one fleet-wide top-k. It dedups a page surfaced by more than one leaf after a recrawl by its stable doc_id and keeps the highest-scoring copy, the same discipline the leaf broker uses within itself. The total order is score descending then doc_id ascending, identical to a single broker over every shard, so the result is byte-identical to that broker.

func (*Aggregator) Stats added in v0.2.0

func (a *Aggregator) Stats() search.GlobalStats

Stats exposes the fleet statistics, for callers that want to score or route with the same corpus-wide IDF the aggregator uses.

type Batch

type Batch struct {
	Columns []Column
}

Batch is one Append call: a typed column per schema field, all the same length. Index i of every column is row i.

type ChecksumAlgo

type ChecksumAlgo uint8

ChecksumAlgo identifies the integrity hash used for pages and the footer.

const (
	ChecksumNone   ChecksumAlgo = 0
	ChecksumCRC32C ChecksumAlgo = 1
	ChecksumXXH64  ChecksumAlgo = 2
)

type Cluster added in v0.2.0

type Cluster struct {
	// contains filtered or unexported fields
}

Cluster serves many search-segment files as one logical index, routing each query to the shards that can contribute and keeping only a bounded working set of segments open. It is safe for concurrent queries: the open-segment working set lives in a concurrent reference-counted cache (segCache) whose lock guards only the residency bookkeeping, never the WAND loop or a column read, and the segments it serves are reentrant for read. So a server can drive one Cluster from thousands of goroutines and they route, prune, and score in parallel rather than queuing behind one lock (14-serving.md).

func OpenCluster added in v0.2.0

func OpenCluster(paths []string, opts ClusterOptions) (*Cluster, error)

OpenCluster builds a routing index over every shard in paths and returns a broker that serves them. Building the routing index opens each file once to read its dictionary, then closes it, so the steady-state open-file count is the cache cap rather than the shard count. The shard ids the routing index assigns are the indices into paths, so a routed shard maps straight back to its file.

func OpenClusterWithRouting added in v0.2.0

func OpenClusterWithRouting(paths []string, routing *search.RoutingIndex, opts ClusterOptions) *Cluster

OpenClusterWithRouting returns a broker over paths using an already-built routing index, for callers that loaded the routing sidecar instead of re-scanning every shard. The routing index must have been built over the same paths in the same order, since shard ids are path indices.

func (*Cluster) CacheLen added in v0.2.0

func (c *Cluster) CacheLen() int

CacheLen reports how many segments are currently resident, for tests and metrics.

func (*Cluster) Close added in v0.2.0

func (c *Cluster) Close() error

Close closes every segment still open in the cache.

func (*Cluster) NumDocs added in v0.2.0

func (c *Cluster) NumDocs() int

NumDocs is the live document count across every shard, from the routing index, without opening a segment.

func (*Cluster) NumShards added in v0.2.0

func (c *Cluster) NumShards() int

NumShards is how many shards the cluster serves.

func (*Cluster) Query added in v0.2.0

func (c *Cluster) Query(query string, k int) ([]ClusterHit, QueryStats, error)

Query is the retrieval-only path: the global top-k of (shard, dense id, score) across the routed shards, scored with global statistics and pruned by the bound walk, without fetching stored fields. It is what the scale latency benchmark times. The second return value reports the routing and pruning that produced it.

func (*Cluster) QueryWith added in v0.2.0

func (c *Cluster) QueryWith(query string, k int, stats search.GlobalStats) ([]ClusterHit, QueryStats, error)

QueryWith is Query with the corpus statistics supplied from outside. An aggregator passes fleet-wide stats so this leaf scores and prunes against the same IDF every other leaf uses, which is what makes the merged cross-leaf top-k exact (13-search-only-and-scale.md). The shard bounds are computed from the same stats, so the early stop stays safe.

func (*Cluster) Routing added in v0.2.0

func (c *Cluster) Routing() *search.RoutingIndex

Routing exposes the routing index, for stats and for persisting the sidecar.

func (*Cluster) Search added in v0.2.0

func (c *Cluster) Search(query string, k int) ([]SearchResult, QueryStats, error)

Search runs the routed, pruned retrieval and then fetches the url and title of each surviving hit, deduplicating a page that more than one shard carries after a recrawl by its stable doc_id and keeping the highest-scoring copy. Like the leaf Index it over-fetches per shard so duplicates collapsing cannot leave fewer than k distinct results, and it prunes against the k-th best distinct score.

func (*Cluster) SearchWith added in v0.2.0

func (c *Cluster) SearchWith(query string, k int, stats search.GlobalStats) ([]SearchResult, QueryStats, error)

SearchWith is Search with the corpus statistics supplied from outside, the path an aggregator drives so every leaf scores against fleet-wide IDF and the merged result is the exact fleet top-k. The dedup, over-fetch, and pruning are identical to Search; only the statistics differ.

type ClusterHit added in v0.2.0

type ClusterHit struct {
	Shard int
	Doc   uint32
	Score float32
}

ClusterHit is a scored hit tagged with the shard that produced it.

type ClusterOptions added in v0.2.0

type ClusterOptions struct {
	CacheSize int
}

ClusterOptions tunes a Cluster. CacheSize caps how many segments stay open at once; a query that routes to more shards than this still answers correctly, evicting the least recently used segment to stay within the cap.

type Codec

type Codec uint8

Codec identifies the block compressor applied to an encoded page payload.

const (
	CodecNone     Codec = 0
	CodecLZ4      Codec = 1
	CodecZstd     Codec = 2
	CodecZstdDict Codec = 3
)

func (Codec) String

func (c Codec) String() string

String renders a block codec for diagnostics.

type CollHit

type CollHit struct {
	Member string
	RowRef
}

CollHit locates a row across the collection: which member holds it and where inside that member.

type Collection

type Collection struct {
	// contains filtered or unexported fields
}

Collection is a dataset of many tatami files addressed through one manifest.

func OpenCollection

func OpenCollection(dir string) (*Collection, error)

OpenCollection opens (or starts) the collection rooted at dir. A missing manifest is an empty collection, not an error.

func (*Collection) AddFile

func (c *Collection) AddFile(rel string) error

AddFile catalogs the tatami file at rel (relative to the collection root): it opens the file, summarizes its footer into a member entry, and appends one ADD_FILE edit. The file itself is untouched.

func (*Collection) Compact

func (c *Collection) Compact() error

Compact rolls the manifest log into a fresh one from the live set, bounding replay time as removes accumulate.

func (*Collection) Lookup

func (c *Collection) Lookup(key any) (CollHit, bool, int, error)

Lookup finds a key across a sorted collection. It keeps only the members whose key range contains the key (one member for a disjoint, sorted dataset), opens them, and runs the in-file bounded-seek Lookup. FilesOpened reports the fan-out so a caller sees that a disjoint collection opens exactly one file.

func (*Collection) Members

func (c *Collection) Members() []manifest.Member

Members returns the live members in catalog order.

func (*Collection) Merge

func (c *Collection) Merge(inRels []string, outRel string, opts WriterOptions, createdMillis uint64) error

Merge combines a set of members into one new file under outRel and swaps the manifest: the inputs leave the live set and the output joins it, all in one atomic batch. It is the general decode-and-re-encode merge: it reads every input row, orders the combined stream by the sort key when the inputs are sorted, and writes one output through the normal writer, so the output gets fresh, well-fitted encodings, dictionaries, and clean row-group boundaries. The zero-copy concat path for already-disjoint sorted inputs is a later slice.

func (*Collection) Prune

func (c *Collection) Prune(pred *Pred) ([]manifest.Member, error)

Prune returns the members whose key range and zone rollup cannot be ruled out by pred, the cross-file pruning that lets a query skip most files without a single tail read. A nil pred keeps every member.

func (*Collection) Scan

func (c *Collection) Scan(pred *Pred, projection ...string) (*CollectionScan, error)

Scan runs pred across the collection: it prunes members on the manifest, then opens only the survivors and runs the in-file Scan on each, concatenating the projected rows in member order.

type CollectionScan

type CollectionScan struct {
	Schema       *Schema
	Columns      []string
	Rows         [][]any
	FilesTotal   int
	FilesScanned int
}

CollectionScan is the result of a dataset-wide scan: the projected rows from every surviving member, plus counters showing how many files the manifest pruning let the query skip.

type Column

type Column struct {
	Data  any
	Valid []bool
}

Column is a batch of values for one field. Data is a typed Go slice whose element type matches the field's logical type:

BOOL              -> []bool
INT8..INT64       -> []int8, []int16, []int32, []int64
UINT8..UINT64     -> []uint8, []uint16, []uint32, []uint64
FLOAT32/FLOAT64   -> []float32, []float64
STRING            -> []string
BYTES / BLOBREF   -> [][]byte
TIMESTAMP_MICROS  -> []int64

Data is always full length: it has one slot per row including null rows. When Valid is non-nil it has the same length, and a false entry marks a null. The value stored at a null slot is ignored on write and set to the type's zero on read, so callers never have to compact their own slices.

func (Column) At

func (c Column) At(i int) any

At returns the value at row i as an any, or nil when the row is null. It is a convenience for row-oriented consumers like the cat command; column-oriented code should use the typed Data slice directly.

type ColumnStat

type ColumnStat struct {
	Name              string
	Type              LogicalType
	Encoding          Encoding
	Codec             Codec
	NumValues         int64
	NullCount         int64
	NumPages          int64
	TotalUncompressed int64
	TotalCompressed   int64
	// Blob fields are non-zero only for separated BLOBREF columns.
	BlobUncompressed int64
	BlobCompressed   int64
	BlobRuns         int
	BlobDict         bool // true when the column kept a shared dictionary
	// Index fields summarize the M3 pruning structures the column carries.
	HasZone      bool // a chunk-level zone map on at least one group
	HasBloom     bool // a membership filter on at least one group
	HasPageIndex bool // a per-page index (the sort column carries one)
	IsSortKey    bool // the file's sort column
}

ColumnStat aggregates one column across all row groups, for inspect and stats. For a separated BLOBREF column the chunk totals cover only the validity pages; the value bytes are reported in the Blob fields, which come from the blob region directory.

type Encoding

type Encoding uint8

Encoding identifies how values inside a page are encoded before the block codec runs. M0 ships PLAIN only; the later milestones fill in the cascade.

const (
	EncPlain      Encoding = 0
	EncRLE        Encoding = 1
	EncDictionary Encoding = 2
	EncBitpackFOR Encoding = 3
	EncDelta      Encoding = 4
	EncGroupVar   Encoding = 5
	EncPForDelta  Encoding = 6
	EncFSST       Encoding = 7
	EncBitmap     Encoding = 8
)

func (Encoding) String

func (e Encoding) String() string

String renders an encoding for diagnostics.

type Field

type Field struct {
	Name     string
	Type     LogicalType
	Nullable bool
	// SortKey marks the single column the file is sorted on, if any. A file with
	// a sort key sets the header's sorted flag and carries the per-group key
	// bounds plus, from M3 on, a sparse key index.
	SortKey bool
	// BlobSeparated asks the writer to keep this column's payload in the blob
	// region (M2 on). For STRING and BYTES columns it is a hint; for BLOBREF it
	// is implied. M0 stores everything inline and ignores it.
	BlobSeparated bool
	// DictHint asks the sampler to prefer dictionary encoding for this column
	// (M1 on). M0 ignores it.
	DictHint bool
	// BloomFilter asks the writer to build a membership filter over this column
	// per row group (M3 on), so an equality probe skips groups that cannot hold
	// the value. It is the opt-in for the point-lookup columns (doc_id, url,
	// digest in the document-store role). A sort-key column needs none, since the
	// sparse key index answers membership exactly.
	BloomFilter bool
	// Element is the element type for a LIST column, otherwise zero.
	Element LogicalType
}

Field describes one column: its name, logical type, and how it is treated. The name matches the producer struct tag so the same schema round-trips between the Go record and the file.

type FileInfo

type FileInfo struct {
	Header            Header
	Schema            *Schema
	NumRowGroups      int
	RowCount          uint64
	UncompressedTotal uint64
	CompressedTotal   uint64
	Columns           []ColumnStat
	KeyValue          []KeyValue
	// NumDicts and DictUncompressed summarize the dict region.
	NumDicts         int
	DictUncompressed int64
	// NumBlooms is the count of membership filters in the index region; Sorted
	// and SortColumn describe the primary-key index.
	NumBlooms  int
	Sorted     bool
	SortColumn string
}

FileInfo is a read-only summary of a file for the CLI.

type Header struct {
	VersionMajor uint16
	VersionMinor uint16
	Flags        uint16
	Checksum     ChecksumAlgo
	DefaultCodec Codec
	PageSizeHint uint32
	FileUUID     [16]byte
	RowCount     uint64
	FooterOffset uint64
	CreatedMphis uint64 // created_unix_millis, supplied by the caller
	CreatorID    uint32
}

Header is the fixed 64-byte file header. It lets a reader sanity-check the file and learn the global defaults without parsing the footer.

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index is a searchable set of open search segments served as one logical index. It owns the segments it opens through OpenIndex and closes them together.

func NewIndex

func NewIndex(segs []*SearchSegment) *Index

NewIndex wraps already-open segments into one index. The caller keeps ownership of the segments; Close does not close them.

func OpenIndex

func OpenIndex(paths []string) (*Index, error)

OpenIndex opens every segment file in paths and serves them as one index. Close closes all of them. A failure to open any file closes the ones already opened and returns the error.

func (*Index) Close

func (ix *Index) Close() error

Close closes the segments if this index opened them.

func (*Index) NumDocs

func (ix *Index) NumDocs() int

NumDocs sums the live documents across every segment.

func (*Index) Query

func (ix *Index) Query(query string, k int) []IndexHit

Query is the retrieval-only path across all segments: the global top-k of dense ids and scores without the stored-field fetch, tagged with the segment each hit came from. It is what the latency benchmark times.

func (*Index) Search

func (ix *Index) Search(query string, k int) ([]SearchResult, error)

Search runs the query against every segment, merges the partial results into a global top-k, and dedups by stable doc_id, keeping the highest-scoring copy of a page that appears in more than one segment. It fetches the url and title of each surviving hit from the segment that produced it. To keep dedup correct it pulls more than k candidates per segment when there are several segments, since duplicates collapse and could otherwise leave fewer than k distinct results.

func (*Index) Segments

func (ix *Index) Segments() []*SearchSegment

Segments returns the underlying segments, for stats and merge selection.

func (*Index) SelectMerge

func (ix *Index) SelectMerge(p search.MergePolicy) []int

SelectMerge applies the tiered merge policy to the index's segments and returns the segment indices the policy chose to merge, or nil when none qualify.

type IndexHit

type IndexHit struct {
	Segment int
	Doc     uint32
	Score   float32
}

IndexHit is a scored hit tagged with the segment that produced it.

type KeyValue

type KeyValue struct {
	Key, Value string
}

KeyValue is one footer key-value metadata pair.

type LogicalType

type LogicalType uint8

LogicalType is the type of the values in a column. The enum values are pinned by the format canon and must never be renumbered.

const (
	TypeBool            LogicalType = 0
	TypeInt8            LogicalType = 1
	TypeInt16           LogicalType = 2
	TypeInt32           LogicalType = 3
	TypeInt64           LogicalType = 4
	TypeUint8           LogicalType = 5
	TypeUint16          LogicalType = 6
	TypeUint32          LogicalType = 7
	TypeUint64          LogicalType = 8
	TypeFloat32         LogicalType = 9
	TypeFloat64         LogicalType = 10
	TypeString          LogicalType = 11
	TypeBytes           LogicalType = 12
	TypeTimestampMicros LogicalType = 13
	TypeList            LogicalType = 14
	TypeBlobRef         LogicalType = 15
)

func (LogicalType) String

func (t LogicalType) String() string

String renders a logical type for diagnostics.

type Op

type Op uint8

Op is a leaf comparison operator.

const (
	OpEQ Op = iota
	OpNE
	OpLT
	OpLE
	OpGT
	OpGE
	OpBetween
	OpIsNull
)

type PageKind

type PageKind uint8

PageKind tags what a page holds.

const (
	PageData PageKind = 0
	PageDict PageKind = 1
	PageIdx  PageKind = 2
	PageBlob PageKind = 3
)

type Pred

type Pred struct {
	// contains filtered or unexported fields
}

Pred is one node of a predicate tree. Build leaves with Eq, Lt, Between, and the rest; combine them with And and Or.

func And

func And(kids ...*Pred) *Pred

And combines children so the result holds only when every child holds.

func Between

func Between(col string, lo, hi any) *Pred

Between builds a lo <= column <= hi leaf (both bounds inclusive).

func Eq

func Eq(col string, val any) *Pred

Eq builds a column == value leaf.

func Ge

func Ge(col string, val any) *Pred

Ge builds a column >= value leaf.

func Gt

func Gt(col string, val any) *Pred

Gt builds a column > value leaf.

func IsNull

func IsNull(col string) *Pred

IsNull builds a column IS NULL leaf.

func Le

func Le(col string, val any) *Pred

Le builds a column <= value leaf.

func Lt

func Lt(col string, val any) *Pred

Lt builds a column < value leaf.

func Ne

func Ne(col string, val any) *Pred

Ne builds a column != value leaf. It never prunes a region (a region almost always holds some other value) but composes inside AND/OR.

func Or

func Or(kids ...*Pred) *Pred

Or combines children so the result holds when any child holds.

type QueryStats added in v0.2.0

type QueryStats struct {
	Candidates int     // shards holding at least one query term
	Visited    int     // shards actually opened and scored
	Threshold  float32 // k-th best score when the walk stopped, zero if fewer than k hits
}

QueryStats reports how a routed query was answered: how many shards held a query term, how many the broker actually visited before the bound pruned the rest, and the score threshold the prune fired against. A test asserts Visited is far below Candidates while the results stay exact.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader opens a .tatami file and yields its columns. It reads the trailer and footer first (one tail read for a remote file), then reads only the column chunks a caller asks for.

func Open

func Open(r io.ReaderAt, size int64) (*Reader, error)

Open parses a file accessible through r whose total length is size.

func OpenFile

func OpenFile(path string) (*Reader, *os.File, error)

OpenFile opens path and returns a Reader plus the underlying file. The caller closes the file when done.

func (*Reader) Header

func (r *Reader) Header() Header

Header exposes a copy of the file header for inspection tooling.

func (*Reader) Info

func (r *Reader) Info() FileInfo

Info builds a summary of the file from the footer, touching no data pages.

func (*Reader) Lookup

func (r *Reader) Lookup(key any) (RowRef, bool, error)

Lookup finds the row whose sort key equals key, using the sparse primary-key index for a bounded-seek descent. It requires a sorted file; on an unsorted file use Scan with an Eq predicate instead. The bool is false when no row has the key.

func (*Reader) Meta

func (r *Reader) Meta(key string) (string, bool)

Meta returns the value of a footer key-value pair and whether it was present.

func (*Reader) NumRowGroups

func (r *Reader) NumRowGroups() int

NumRowGroups returns the number of row groups.

func (*Reader) NumRows

func (r *Reader) NumRows() uint64

NumRows returns the total row count.

func (*Reader) ReadColumn

func (r *Reader) ReadColumn(group, col int) (Column, error)

ReadColumn reads column col of row group group and returns its values.

func (*Reader) ReadRowGroup

func (r *Reader) ReadRowGroup(group int) ([]Column, error)

ReadRowGroup reads every column of one row group.

func (*Reader) RowGroupRows

func (r *Reader) RowGroupRows(g int) int

RowGroupRows returns the row count of group g.

func (*Reader) Scan

func (r *Reader) Scan(pred *Pred, projection ...string) (*ScanResult, error)

Scan evaluates pred against the file and returns the projected columns of every surviving row. An empty projection returns every column. A nil pred matches all rows (a pure projection scan).

func (*Reader) Schema

func (r *Reader) Schema() *Schema

Schema returns the file schema.

type RowRef

type RowRef struct {
	Group int
	Row   int
}

RowRef locates a row by its group and its index within that group.

type ScanResult

type ScanResult struct {
	Schema        *Schema
	Columns       []string
	Rows          [][]any
	GroupsTotal   int
	GroupsScanned int // groups the predicate could not prune
}

ScanResult is what Scan returns: the projected schema, one row per surviving row in row order, and counters that show how much the pruning saved.

type Schema

type Schema struct {
	Fields []Field
}

Schema is the ordered list of columns in a file.

func NewSchema

func NewSchema(fields ...Field) (*Schema, error)

NewSchema builds a schema from fields and validates it.

type SearchBuilder

type SearchBuilder struct {
	// contains filtered or unexported fields
}

SearchBuilder accumulates documents and seals them into a search segment. It assigns dense docIDs in add order, tokenizes each field, accumulates the term-to-postings map, and records the per-field length norms. The build is in-memory; a streaming builder is a later milestone (matching M4/M5, which defer streaming k-way merge).

In search-only mode the builder never retains the body: it tokenizes the body into the inverted index, keeps a short snippet for the result row, and drops the text. That is what lets a search-only segment cost a fraction of a full-document one on disk and in memory (13-search-only-and-scale.md).

func NewSearchBuilder

func NewSearchBuilder() *SearchBuilder

NewSearchBuilder returns an empty full-document builder, the shape that stores the body blob-separated.

func NewSearchBuilderWith added in v0.2.0

func NewSearchBuilderWith(opts SearchBuilderOptions) *SearchBuilder

NewSearchBuilderWith returns a builder of the shape the options select. A search-only builder indexes every document's body into the postings exactly as the full-document builder does, so the two produce identical retrieval, and differs only in what it keeps in the forward store.

func (*SearchBuilder) Add

func (b *SearchBuilder) Add(doc SearchDoc)

Add tokenizes a document, records its per-field term frequencies and length norms, and assigns it the next dense docID. A term's posting frequency is its total count across all fields; the per-field counts survive as the length norms that a BM25F re-rank reads (09-search-scale.md, section 5).

func (*SearchBuilder) NumDocs

func (b *SearchBuilder) NumDocs() int

NumDocs reports how many documents have been added.

func (*SearchBuilder) Write

func (b *SearchBuilder) Write(path string, opts WriterOptions) error

Write seals the segment to path: it writes the forward columns through the normal tatami writer, attaches the serialized inverted sub-region, and closes the file with the role bit set.

type SearchBuilderOptions added in v0.2.0

type SearchBuilderOptions struct {
	Snippet      bool
	SnippetRunes int
}

SearchBuilderOptions selects the segment shape. Snippet turns on the search-only forward store: no body is stored, only a SnippetRunes-long lead excerpt per document. SnippetRunes defaults to DefaultSnippetRunes.

type SearchDoc

type SearchDoc struct {
	DocID  string
	URL    string
	Title  string
	Body   string
	Anchor string
}

SearchDoc is one document handed to a SearchBuilder. DocID is the stable global identity; URL, Title, and Body are the stored fields and the text that is tokenized and inverted. Anchor text, when the link graph supplies it, goes in Anchor.

type SearchResult

type SearchResult struct {
	Doc     uint32
	DocID   string
	URL     string
	Title   string
	Snippet string
	Score   float32
}

SearchResult is one ranked hit: the dense docID, the stable global doc_id, the stored url, title, and snippet, and the relevance score. Snippet is empty on a full-document segment, which stores the body rather than a precomputed excerpt. DocID is the durable identity an aggregator dedups and tie-breaks on so a fleet-wide merge orders documents exactly as a single broker would.

type SearchSegment

type SearchSegment struct {
	// contains filtered or unexported fields
}

SearchSegment is a read-only view of a search-segment file: the tatami reader for the forward store plus the decoded inverted index for retrieval. It is the served handle.

It is safe for concurrent queries. The retrieval path (SearchTermsWith) is reentrant: the WAND loop allocates its posting cursors per call and the scorer is a value type, so two goroutines searching the same segment never share mutable state. The stored-field path lazily caches the small display columns per row group; those caches are guarded by mu, with the column read done outside the lock so concurrent fetches do not serialize on I/O. Delete mutates the live bitset in place and is therefore not safe to run while queries are in flight; a served segment is treated as immutable (14-serving.md).

func OpenSearch

func OpenSearch(path string) (*SearchSegment, error)

OpenSearch opens a search-segment file and decodes its inverted sub-region. It errors if the file does not carry the search-segment role.

func (*SearchSegment) Close

func (s *SearchSegment) Close() error

Close releases the underlying file.

func (*SearchSegment) Delete

func (s *SearchSegment) Delete(docID string) (bool, error)

Delete marks the document with the given global doc_id deleted, resolving it to its dense id through a lazily built index over the doc_id column. It returns false when the doc_id is not in this segment.

func (*SearchSegment) DeleteDense

func (s *SearchSegment) DeleteDense(d uint32) bool

DeleteDense marks a dense doc id deleted in the in-memory live bitset. The deletion is honored by every later query on this open segment and is materialized (the document dropped) at the next merge. It is not written back to the immutable file; durable deletes across a reopen are a tombstone-sidecar refinement (09-search-scale.md, section 7). It returns true when the call changed the state.

func (*SearchSegment) Inverted

func (s *SearchSegment) Inverted() *search.Inverted

Inverted exposes the decoded inverted index for retrieval-only callers (the latency benchmark measures this path without the stored-field fetch).

func (*SearchSegment) LiveDocs

func (s *SearchSegment) LiveDocs() int

LiveDocs returns the number of documents not yet deleted, the size class the merge policy tiers on.

func (*SearchSegment) NumDeleted

func (s *SearchSegment) NumDeleted() int

NumDeleted returns the number of deleted documents.

func (*SearchSegment) NumDocs

func (s *SearchSegment) NumDocs() int

NumDocs returns the dense doc-id space size, including deleted documents. It is the N that IDF is defined over until a merge removes the deletions.

func (*SearchSegment) NumTerms

func (s *SearchSegment) NumTerms() int

NumTerms returns the distinct term count.

func (*SearchSegment) Query

func (s *SearchSegment) Query(query string, k int) []search.Hit

Query tokenizes a query string and returns the top-k document ids and scores from the block-max WAND loop, without fetching stored fields. This is the hot retrieval path the <10ms target is measured against.

func (*SearchSegment) Search

func (s *SearchSegment) Search(query string, k int) ([]SearchResult, error)

Search runs the top-k retrieval and then fetches the url and title of each surviving document from the forward columns, the full query-to-results path.

func (*SearchSegment) SearchTermsWith added in v0.2.0

func (s *SearchSegment) SearchTermsWith(terms []string, k int, stats search.GlobalStats) []search.Hit

SearchTermsWith runs the block-max WAND loop over already-tokenized query terms with corpus-wide statistics, so a broker serving many shards scores every shard against the same global IDF and the partial top-k lists it merges are on one scale. Passing nil stats falls back to this shard's local IDF, which is the single-segment path. This is the entry the Cluster broker drives (12-distributed-serving.md).

func (*SearchSegment) SnippetOnly added in v0.2.0

func (s *SearchSegment) SnippetOnly() bool

SnippetOnly reports whether this is a search-only segment: one that stores a snippet for display and not the document body.

type Server added in v0.2.0

type Server struct {
	// contains filtered or unexported fields
}

Server serves a Cluster over HTTP. It is safe for thousands of concurrent requests: the Cluster answers each query without a shared lock, and the server adds only an admission semaphore and a per-request deadline on top. One Server drives one Cluster; build several behind a load balancer to scale past one process, or front several with an Aggregator to scale past one broker's shard reach.

func NewServer added in v0.2.0

func NewServer(c *Cluster, opts ServerOptions) *Server

NewServer wraps a Cluster in a serving layer with the given options. The Cluster must outlive the Server; closing the Cluster is the caller's job at shutdown.

func (*Server) Drain added in v0.2.0

func (s *Server) Drain()

Drain waits for every in-flight search worker to finish. A server's caller invokes it after the HTTP server has stopped accepting requests and before it closes the cluster, so a worker still reading a segment never races the close.

func (*Server) Handler added in v0.2.0

func (s *Server) Handler() http.Handler

Handler returns the http.Handler that routes the server's endpoints: GET /search for queries, GET /healthz for liveness, and GET /stats for the broker and serving counters. Mount it under a path prefix or serve it at the root.

type ServerOptions added in v0.2.0

type ServerOptions struct {
	// MaxInFlight caps concurrent queries; arrivals past it get 503. Zero uses
	// DefaultMaxInFlight.
	MaxInFlight int
	// Timeout bounds a single query; on expiry the handler returns 504. Zero uses
	// DefaultQueryTimeout.
	Timeout time.Duration
	// MaxK caps the requested result count. Zero uses DefaultMaxK.
	MaxK int
	// DefaultK is the k used when a request omits it. Zero means 10.
	DefaultK int
}

ServerOptions tunes the serving layer. The zero value is usable: every field falls back to its package default.

type Tri

type Tri uint8

Tri is the three-valued result of evaluating a predicate against a region's metadata: the region cannot match, the region definitely all-matches, or it might match and must be decoded.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer serializes batches of columns into a .tatami file. It streams to an io.WriterAt, buffering at most one row group in memory, and patches the fixed header at the end once the footer offset and row count are known.

func Create

func Create(path string, schema *Schema, opts WriterOptions) (*Writer, *os.File, error)

Create opens path for writing and returns a Writer over it.

func NewWriter

func NewWriter(w io.WriterAt, schema *Schema, opts WriterOptions) (*Writer, error)

NewWriter creates a Writer over an io.WriterAt (an *os.File is the common case). It reserves the 64-byte header up front and patches it on Close.

func (*Writer) Append

func (w *Writer) Append(batch Batch) error

Append adds a batch of rows. All columns must match the schema and have equal length. When the buffered row group reaches a size limit it is flushed.

func (*Writer) AttachInverted

func (w *Writer) AttachInverted(termDict, postings, skips, live []byte, numTerms, numDocs uint64)

AttachInverted hands the writer the four serialized runs of an inverted sub-region (term dictionary, posting payloads, skip tables, and the live-docs bitset), turning the file it produces into a search segment (role bit 4). It must be called before Close. The runs are written into the index region and addressed by the footer's inverted descriptor. A nil live run records no deletions, which a reader treats as all-live.

func (*Writer) Close

func (w *Writer) Close() error

Close flushes the last row group, writes the footer and trailer, and patches the header. It is safe to call once; later calls are no-ops.

func (*Writer) SetMeta

func (w *Writer) SetMeta(key, value string)

SetMeta records a free-form key-value pair in the footer. Pairs are written in the order added, after the built-in ones, so output stays deterministic.

type WriterOptions

type WriterOptions struct {
	RowGroupMaxRows  int
	RowGroupMaxBytes int
	PageMaxValues    int
	PageSizeHint     int
	// BlobRunTargetBytes caps the raw size of one packed blob run. Larger runs
	// compress better; smaller runs cost less to decode for a single-value read
	// and make a shared dictionary more likely to pay off. Zero takes the default.
	BlobRunTargetBytes int
	UUID               [16]byte
	CreatedMillis      uint64
	CreatorID          uint32
}

WriterOptions tune a Writer. The zero value is valid and deterministic: no uuid, no creation timestamp, default sizes.

Directories

Path Synopsis
Package blob owns the on-disk shape of tatami's blob region: the place where large, separated column payloads (markdown bodies, raw headers, anything a schema marks BLOBREF) live apart from the row groups.
Package blob owns the on-disk shape of tatami's blob region: the place where large, separated column payloads (markdown bodies, raw headers, anything a schema marks BLOBREF) live apart from the row groups.
Package cli wires tatami's command surface: the cobra tree and the fang-rendered help and errors.
Package cli wires tatami's command surface: the cobra tree and the fang-rendered help and errors.
cmd
tatami command
Command tatami inspects and dumps .tatami files: the compact columnar single-file format for web-scale crawl and search.
Command tatami inspects and dumps .tatami files: the compact columnar single-file format for web-scale crawl and search.
Package codec wraps the block compressors a tatami page payload can use behind one interface, so the container code never cares whether a block is raw, zstd, or dictionary-trained zstd.
Package codec wraps the block compressors a tatami page payload can use behind one interface, so the container code never cares whether a block is raw, zstd, or dictionary-trained zstd.
Package convert turns a producer's Parquet shard into a tatami file.
Package convert turns a producer's Parquet shard into a tatami file.
Package encoding implements the per-page value encodings that run before the block codec, the encode stage of the page pipeline pinned in the format spec.
Package encoding implements the per-page value encodings that run before the block codec, the encode stage of the page pipeline pinned in the format spec.
Package index holds the membership filters and the page-index layout a tatami file carries to skip data it does not need to read.
Package index holds the membership filters and the page-index layout a tatami file carries to skip data it does not need to read.
Package manifest is the collection catalog for tatami: an append-only log of edits that names the live members of a dataset and carries enough per-member summary (key range and a coarse zone rollup) that a reader prunes across files before opening any of them.
Package manifest is the collection catalog for tatami: an append-only log of edits that names the live members of a dataset and carries enough per-member summary (key range and a coarse zone rollup) that a reader prunes across files before opening any of them.
Package search is tatami's search-segment engine: the posting-list codec, the term dictionary, the BM25F scorer, and the block-max WAND retrieval loop that turn a tatami file with the role bit set into a full-text search segment (Spec 2066, 09-search-scale.md).
Package search is tatami's search-segment engine: the posting-list codec, the term dictionary, the BM25F scorer, and the block-max WAND retrieval loop that turn a tatami file with the role bit set into a full-text search segment (Spec 2066, 09-search-scale.md).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL