wildcat

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2025 License: MPL-2.0 Imports: 22 Imported by: 0

README

License

Wildcat is a high-performance embedded key-value database (or storage engine) written in Go. It incorporates modern database design principles including LSM (Log-Structured Merge) tree architecture, MVCC (Multi-Version Concurrency Control), and lock-free data structures for its critical paths, along with automatic background operations to deliver excellent read/write performance with strong consistency guarantees.

Features

  • LSM (Log-Structured Merge) tree architecture optimized for write and read throughput
  • Lock-free MVCC ensures non-blocking reads and writes
  • WAL logging captures full transaction state for recovery and rehydration
  • Version-aware skip list for fast in-memory MVCC access
  • Atomic write path, safe for multithreaded use
  • Scalable design with background flusher and compactor
  • Durable and concurrent and atomic block storage, leveraging direct, offset-based file I/O (using pread/pwrite) for optimal performance and control
  • Atomic LRU for active block manager handles
  • Memtable lifecycle management and snapshot durability
  • SSTables are immutable BTree's
  • Configurable Sync options such as None, Partial (with background interval), Full
  • Snapshot-isolated MVCC with read timestamps
  • Crash recovery restores all in-flight and committed transactions
  • Automatic multi-threaded background compaction
  • Full ACID transaction support
  • Range, prefix, and full iteration support with bidirectional traversal
  • Sustains 100K+ txns/sec writes, and hundreds of thousands of reads/sec
  • Optional Bloom filter per SSTable for fast key lookups
  • Key value separation optimization (.klog for keys, .vlog for values, klog entries point to vlog entries)
  • Tombstone-aware compaction with retention based on active transaction windows
  • Transaction recovery with incomplete transactions are preserved and accessible after crashes

Overview

Table of Contents

Version and Compatibility

  • Go 1.24+
  • Linux/macOS/Windows (64-bit)

Basic Usage

Wildcat supports opening multiple *wildcat.DB instances in parallel, each operating independently in separate directories.

Import

import (
    "github.com/guycipher/wildcat"
)

Opening a Wildcat DB instance

The only required option is the database directory path.

// Create default options
opts := &wildcat.Options{
Directory: "/path/to/db",
    // You don't need to set all options only Directory is required!
}

// Open or create a new Wildcat DB instance
db, err := wildcat.Open(opts) // Returns *wildcat.DB
if err != nil {
    // Handle error
}
defer db.Close()

Advanced Configuration

Wildcat provides several configuration options for fine-tuning.

opts := &wildcat.Options{
    Directory:                   "/path/to/database",     // Directory for database files
    WriteBufferSize:             32 * 1024 * 1024,        // 32MB memtable size
    SyncOption:                  wildcat.SyncFull,        // Full sync for maximum durability
    SyncInterval:                128 * time.Millisecond,  // Only set when using SyncPartial
    LevelCount:                  7,                       // Number of LSM levels
    LevelMultiplier:             10,                      // Size multiplier between levels
    BlockManagerLRUSize:         256,                     // Cache size for block managers
    SSTableBTreeOrder:           10,                      // BTree order for SSTable klog
    LogChannel:                  make(chan string, 1000), // Channel for real time logging
    BloomFilter:                 false,                   // Enable/disable sstable bloom filters
    MaxCompactionConcurrency:    4,                       // Maximum concurrent compactions
    CompactionCooldownPeriod:    5 * time.Second,         // Cooldown between compactions
    CompactionBatchSize:         8,                       // Max SSTables per compaction
    CompactionSizeRatio:         1.1,                     // Level size ratio trigger
    CompactionSizeThreshold:     8,                       // File count trigger
    CompactionScoreSizeWeight:   0.8,                     // Weight for size-based scoring
    CompactionScoreCountWeight:  0.2,                     // Weight for count-based scoring
    FlusherTickerInterval:       1 * time.Millisecond,    // Flusher check interval
    CompactorTickerInterval:     250 * time.Millisecond,  // Compactor check interval
    BloomFilterFPR:              0.01,                    // Bloom filter false positive rate
    WalAppendRetry:              10,                      // WAL append retry count
    WalAppendBackoff:            128 * time.Microsecond,  // WAL append retry backoff
    BlockManagerLRUEvictRatio:   0.20,                    // LRU eviction ratio
    BlockManagerLRUAccesWeight:  0.8,                     // LRU access weight
}
Configuration Options Explained
  1. Directory The path where the database files will be stored
  2. WriteBufferSize Size threshold for memtable before flushing to disk
  3. SyncOption Controls durability vs performance tradeoff
    • SyncNone Fastest, but no durability guarantees
    • SyncPartial Balances performance and durability
    • SyncFull Maximum durability, slower performance
  4. SyncInterval Time between background sync operations (only for SyncPartial)
  5. LevelCount Number of levels in the LSM tree
  6. LevelMultiplier Size ratio between adjacent levels
  7. BlockManagerLRUSize Number of block managers to cache
  8. SSTableBTreeOrder Size of SSTable klog block sets
  9. LogChannel Channel for real-time logging, useful for debugging and monitoring
  10. BloomFilter Enable or disable bloom filters for SSTables to speed up key lookups. Bloom filters use double hashing with FNV-1a and FNV hash functions. Is automatically sized based on expected items and desired false positive rate.
  11. MaxCompactionConcurrency Maximum number of concurrent compactions
  12. CompactionCooldownPeriod Cooldown period between compactions to prevent thrashing
  13. CompactionBatchSize Max number of SSTables to compact at once
  14. CompactionSizeRatio Level size ratio that triggers compaction
  15. CompactionSizeThreshold Number of files to trigger size-tiered compaction
  16. CompactionScoreSizeWeight Weight for size-based compaction scoring
  17. CompactionScoreCountWeight Weight for count-based compaction scoring
  18. FlusherTickerInterval Interval for flusher background process
  19. CompactorTickerInterval Interval for compactor background process
  20. BloomFilterFPR False positive rate for Bloom filters
  21. WalAppendRetry Number of retries for WAL append operations
  22. WalAppendBackoff Backoff duration for WAL append retries
  23. BlockManagerLRUEvictRatio Ratio for LRU eviction. Determines what percentage of the cache to evict when cleanup is needed.
  24. BlockManagerLRUAccesWeight Weight for LRU access eviction. Balances how much to prioritize access frequency vs. age when deciding what to evict.

Simple Key-Value Operations

The easiest way to interact with Wildcat is through the Update method, which handles transactions automatically. This means it runs begin, commit, and rollback for you, allowing you to focus on the operations themselves.

// Write a value
err := db.Update(func(txn *wildcat.Txn) error {
    return txn.Put([]byte("hello"), []byte("world"))
})
if err != nil {
    // Handle error
}

// Read a value
var result []byte
err = db.View(func(txn *wildcat.Txn) error {
    var err error
    result, err = txn.Get([]byte("hello"))
    return err
})

if err != nil {
    // Handle error
} else {
    fmt.Println("Value:", string(result)) // Outputs: Value: world
}

Manual Transaction Management

For more complex operations, you can manually manage transactions.

// Begin a transaction
txn := db.Begin()

// Perform operations
err := txn.Put([]byte("key1"), []byte("value1"))
if err != nil {
    txn.Rollback()
    // Handle error
}

value, err := txn.Get([]byte("key1"))
if err != nil {
    txn.Rollback()
    // Handle error
}

// Commit or rollback
err = txn.Commit()
if err != nil {
    txn.Rollback()
    // Handle commit error
}

Iterating Keys

Wildcat provides comprehensive iteration capabilities with MVCC consistency.

[!TIP] You can set ascending or descending order, and iterate over all keys, a range of keys, or keys with a specific prefix.

Full Iterator (bidirectional)
err := db.View(func(txn *wildcat.Txn) error {
    // Create ascending iterator
    iter, err := txn.NewIterator(true)
    if err != nil {
        return err
    }

    // Iterate forward
    for {
        key, value, timestamp, ok := iter.Next()
        if !ok {
            break
        }

        fmt.Printf("Key: %s, Value: %s, Timestamp: %d\n", key, value, timestamp)
    }

    // Change direction and iterate backward
    err = iter.SetDirection(false)
    if err != nil {
        return err
    }

    for {
        key, value, timestamp, ok := iter.Next()
        if !ok {
            break
        }

        fmt.Printf("Key: %s, Value: %s, Timestamp: %d\n", key, value, timestamp)
    }

    return nil
})
Range Iterator (bidirectional)
err := db.View(func(txn *wildcat.Txn) error {
    // Create range iterator
    iter, err := txn.NewRangeIterator([]byte("start"), []byte("end"), true)
    if err != nil {
        return err
    }

    // Iterate forward
    for {
        key, value, timestamp, ok := iter.Next()
        if !ok {
            break
        }

        fmt.Printf("Key: %s, Value: %s, Timestamp: %d\n", key, value, timestamp)
    }

    // Change direction and iterate backward
    err = iter.SetDirection(false)
    if err != nil {
        return err
    }

    for {
        key, value, timestamp, ok := iter.Next()
        if !ok {
            break
        }

        fmt.Printf("Key: %s, Value: %s, Timestamp: %d\n", key, value, timestamp)
    }

    return nil
})
Prefix Iterator (bidirectional)
err := db.View(func(txn *wildcat.Txn) error {
    // Create prefix iterator
    iter, err := txn.NewPrefixIterator([]byte("prefix"), true)
    if err != nil {
        return err
    }

    // Iterate forward
    for {
        key, value, timestamp, ok := iter.Next()
        if !ok {
            break
        }

        fmt.Printf("Key: %s, Value: %s, Timestamp: %d\n", key, value, timestamp)
    }

    // Change direction and iterate backward
    err = iter.SetDirection(false)
    if err != nil {
        return err
    }

    for {
        key, value, timestamp, ok := iter.Next()
        if !ok {
            break
        }

        fmt.Printf("Key: %s, Value: %s, Timestamp: %d\n", key, value, timestamp)
    }

    return nil
})

Read-Only Transactions with View

// Read a value with View
var result []byte
err = db.View(func(txn *wildcat.Txn) error {
    var err error
    result, err = txn.Get([]byte("hello"))
    return err
})

Batch Operations

You can perform multiple operations in a single transaction.

[!CAUTION] Batch operations on the Wildcat engine are slower completed inside an Update. It's better to use Begin->Put flow for batch writes.

err := db.Update(func(txn *wildcat.Txn) error {
    // Write multiple key-value pairs
    for i := 0; i < 1000; i++ {
        key := []byte(fmt.Sprintf("key%d", i))
        value := []byte(fmt.Sprintf("value%d", i))

        if err := txn.Put(key, value); err != nil {
            return err
        }
    }
    return nil
})

OR

// Begin a transaction
txn := db.Begin()

// Perform batch operations
for i := 0; i < 1000; i++ {
    key := []byte(fmt.Sprintf("key%d", i))
    value := []byte(fmt.Sprintf("value%d", i))

    if err := txn.Put(key, value); err != nil {
        txn.Rollback()
        // Handle error
        return err
    }
}

// Commit the transaction
err := txn.Commit()

Transaction Recovery

// After reopening a database, you can access recovered transactions
txn, err := db.GetTxn(transactionID)
if err != nil {
    // Transaction not found or error
    return err
}

// Inspect the recovered transaction state
fmt.Printf("Transaction %d status: committed=%v\n", txn.Id, txn.Committed)
fmt.Printf("Write set: %v\n", txn.WriteSet)
fmt.Printf("Delete set: %v\n", txn.DeleteSet)

// You can commit or rollback the recovered transaction
if !txn.Committed {
    err = txn.Commit() // or txn.Rollback()
}

Log Channel

Wildcat provides a log channel for real-time logging. You can set up a goroutine to listen for log messages.

// Create a log channel
logChannel := make(chan string, 100) // Buffer size of 100 messages

// Set up options with the log channel
opts := &wildcat.Options{
    Directory:       "/path/to/db",
    LogChannel:      logChannel,
    // Other options...
}

// Open the database
db, err := wildcat.Open(opts)
if err != nil {
    // Handle error
}

wg := &sync.WaitGroup{}

wg.Add(1)

// Start a goroutine to listen to the log channel
go func() {
    defer wg.Done()
    for msg := range logChannel {
        // Process log messages
        fmt.Println("wildcat:", msg)

        // You could also write to a file, send to a logging service, etc.
        // log.Println(msg)
    }
}()

// Use..

wg.Wait() // Wait for the goroutine to finish

// When you're done, close the database
defer db.Close()

Database Statistics

Wildcat provides comprehensive statistics about database state

stats := db.Stats()
fmt.Println(stats)

Output example

┌───────────────────────────────────────────────────────────────────────────┐
│ Wildcat DB Stats and Configuration                                        │
├───────────────────────────────────────────────────────────────────────────┤
│ Write Buffer Size          : 25                                           │
│ Sync Option                : 1                                            │
│ Level Count                : 6                                            │
│ Bloom Filter Enabled       : false                                        │
│ Max Compaction Concurrency : 4                                            │
│ Compaction Cooldown        : 5s                                           │
│ Compaction Batch Size      : 8                                            │
│ Compaction Size Ratio      : 1.1                                          │
│ Compaction Threshold       : 8                                            │
│ Score Size Weight          : 0.8                                          │
│ Score Count Weight         : 0.2                                          │
│ Flusher Interval           : 1ms                                          │
│ Compactor Interval         : 250ms                                        │
│ Bloom FPR                  : 0.01                                         │
│ WAL Retry                  : 10                                           │
│ WAL Backoff                : 128µs                                        │
│ SSTable B-Tree Order       : 10                                           │
│ LRU Size                   : 1024                                         │
│ LRU Evict Ratio            : 0.2                                          │
│ LRU Access Weight          : 0.8                                          │
│ File Version               : 1                                            │
│ Magic Number               : 1464421444                                   │
│ Directory                  : /tmp/db_merge_iterator_large_test1776741552/ │
├───────────────────────────────────────────────────────────────────────────┤
│ ID Generator State                                                        │
├───────────────────────────────────────────────────────────────────────────┤
│ Last SST ID                : 0                                            │
│ Last WAL ID                : 0                                            │
│ Last TXN ID                : 0                                            │
├───────────────────────────────────────────────────────────────────────────┤
│ Runtime Statistics                                                        │
├───────────────────────────────────────────────────────────────────────────┤
│ Active Memtable Size       : 0                                            │
│ Active Memtable Entries    : 0                                            │
│ Active Transactions        : 20                                           │
│ Oldest Read Timestamp      : 0                                            │
│ WAL Files                  : 4                                            │
│ Total SSTables             : 5                                            │
│ Total Entries              : 18                                           │
└───────────────────────────────────────────────────────────────────────────┘

This returns detailed information including

  • Configuration settings and tuning parameters
  • Active memtable size and entry count
  • Transaction counts and oldest active read timestamp
  • Level statistics and SSTable counts
  • ID generator states and WAL file counts
  • Compaction and flushing statistics

Force Flushing

You can force a flush of current and immutable memtables to disk using the Flush method.

// Force all memtables to flush to SSTables
err := db.ForceFlush()
if err != nil {
    // Handle error
}

Shared C Library

You will require the latest Go toolchain to build the shared C library for Wildcat. This allows you to use Wildcat as a C library in other languages.

go build -buildmode=c-shared -o libwildcat.so wildcat_c.go

C API

typedef enum {
    SYNC_NONE = 0,
    SYNC_ALWAYS,
    SYNC_INTERVAL
} sync_option_t;

extern void* wildcat_open(wildcat_opts_t* opts);
extern void wildcat_close(void* ptr);
extern long int wildcat_begin_txn(void* ptr);
extern int wildcat_txn_put(long int txnId, char* key, char* val);
extern char* wildcat_txn_get(long int txnId, char* key);
extern int wildcat_txn_commit(long int txnId);
extern int wildcat_txn_rollback(long int txnId);
extern char* wildcat_stats(void* ptr);
extern int wildcat_force_flush(void* ptr);
extern int wildcat_txn_delete(long int txnId, char* key);
extern void wildcat_txn_free(long int txnId);
extern long unsigned int wildcat_txn_new_iterator(long int txnId, int asc);
extern long unsigned int wildcat_txn_new_range_iterator(long int txnId, char* start, char* end, int asc);
extern long unsigned int wildcat_txn_new_prefix_iterator(long int txnId, char* prefix, int asc);
extern int wildcat_txn_iterate_next(long unsigned int id);
extern int wildcat_txn_iterate_prev(long unsigned int id);
extern int wildcat_txn_iter_valid(long unsigned int id);
extern char* wildcat_iterator_key(long unsigned int id);
extern char* wildcat_iterator_value(long unsigned int id);
extern void wildcat_iterator_free(long unsigned int id);

Overview

MVCC Model

  • Each key stores a timestamped version chain. The timestamps used are physical nanosecond timestamps (derived from time.Now().UnixNano()), providing a simple yet effective global ordering for versions.
  • Transactions read the latest version ≤ their timestamp.
  • Writes are buffered and atomically committed.
  • Delete operations are recorded as tombstones.

WAL and Durability

  • Shared WAL per memtable; transactions append full state.
  • WAL replay restores all committed and in-flight transactions.
  • WALs rotate when memtables flush.

Memtable Lifecycle

  • Active memtable is swapped atomically when full.
  • Immutable memtables are flushed in background.
  • Skip list implementation with MVCC version chains for concurrent access.

SSTables and Compaction

  • Immutable SSTables are organized into levels.
  • L1–L2 use size-tiered compaction.
  • L3+ use leveled compaction by key range.
  • Concurrent compaction with configurable maximum concurrency limits.
  • Compaction score is calculated using a hybrid formula score = (levelSize / capacity) * 0.7 + (sstableCount / threshold) * 0.3 compaction is triggered when score > 1.0
  • Cooldown period enforced between compactions to prevent resource thrashing.
  • Compaction filters out redundant tombstones based on timestamp and overlapping range.
  • A tombstone is dropped if it's older than the oldest active read and no longer needed in higher levels.

SSTable Metadata

Each SSTable tracks the following main meta details:

  • Min and Max keys for fast range filtering
  • EntryCount (total number of valid records) used for recreating filters if need be
  • Size (approximate byte size) used for when reopening levels
  • Optional BloomFilter for accelerated key lookups
  • Level (mainly tells us during reopening and compaction)

We only list the main meta data but there is more for internal use.

SSTable Format

SSTables are prefix and range optimized immutable BTree's.

Structure

  • KLog .klog Contains BTree with key metadata and pointers to values
  • VLog .vlog Contains actual value data in append-only format

During lookups

  • Range check Min/Max key range validation (skipped if outside bounds)
  • Bloom filter Consulted first if enabled (configurable false positive rate)
  • BTree search Key lookup in KLog BTree structure
  • Value retrieval Actual values retrieved from VLog using block IDs from KLog entries
  • MVCC filtering Only versions visible to read timestamp are returned

Compaction Policy / Strategy

  • L1-L2 Size-tiered compaction for similar-sized SSTables
  • L3+ Leveled compaction based on key range overlaps
  • Concurrent execution Configurable max concurrency with priority-based job scheduling
  • Cooldown mechanism Prevents thrashing with configurable cooldown periods
Compaction Scoring Formula
score = (levelSize / capacity) * sizeWeight + (sstableCount / threshold) * countWeight
  • Default weight sizeWeight=0.8, countWeight=0.2
  • Trigger threshold Compaction starts when score > 1.0
  • Priority-based Higher scores get priority in the compaction queue
Compaction Process
  • Job scheduling Background process evaluates all levels every CompactorTickerInterval
  • Priority queue Jobs sorted by compaction score (highest first)
  • Concurrent execution Up to MaxCompactionConcurrency jobs run simultaneously
  • Atomic operations Source SSTables marked as merging, target level updated atomically
  • Cleanup Old SSTable files removed after successful compaction

Concurrency Model

  • Wildcat uses lock-free structures where possible (e.g., atomic value swaps for memtables, atomic lru, queues, and more)
  • Read and write operations are designed to be non-blocking.
  • WAL appends are retried with backoff and allow concurrent writes.
  • Flusher and Compactor run as independent goroutines, handling flushing and compaction in the background.
  • Block manager uses per-file concurrency-safe(multi writer-reader) access and is integrated with LRU for lifecycle management. It leverages direct system calls (pread/pwrite) for efficient, non-blocking disk I/O.
  • Writers never block; readers always see consistent, committed snapshots.
  • No uncommitted state ever surfaces due to internal synchronization and timestamp-based visibility guarantees.

Isolation Levels

Wildcat supports ACID-compliant isolation

  • Snapshot Isolation Transactions see a consistent snapshot as of their start timestamp
  • Read Committed Readers observe only committed versions, never intermediate uncommitted writes
  • No dirty reads Timestamp-based MVCC prevents reading uncommitted data
  • Repeatable reads Within a transaction, reads are consistent to the transaction's timestamp

Recoverability Guarantee Order

Recovery process consists of several steps

  • WAL scanning All WAL files are processed in chronological order (sorted by timestamp)
  • Transaction consolidation Multiple WAL entries for same transaction ID are merged to final state
  • State restoration Final transaction states are applied to memtables and transaction lists
  • Incomplete transaction preservation Uncommitted transactions remain accessible via GetTxn(id)
Durability Order
  • WAL All writes recorded atomically with full transaction details (ReadSet, WriteSet, DeleteSet, commit status)
  • Memtable Writes reflected in memory immediately upon commit
  • SSTables Memtables flushed to SSTables asynchronously via background flusher
Recovery Guarantees
  • Complete state restoration All committed transactions are fully recovered
  • Incomplete transaction access Uncommitted transactions can be inspected, committed, or rolled back
  • Timestamp consistency WAL replay maintains correct timestamp ordering
  • Atomic recovery Only transactions with durable WAL entries are considered recoverable

Block Manager

Wildcat's block manager provides a low-level, atomic high-performance file I/O with sophisticated features.

Core
  • Direct I/O Uses pread/pwrite system calls for atomic, position-independent operations
  • Block chaining Supports multi-block data with automatic chain management
  • Free block management Atomic queue-based allocation with block reuse
  • CRC verification All blocks include CRC32 checksums for data integrity
  • Concurrent access Thread-safe operations with lock-free design where possible
Block Structure
  • Header validation Magic number and version verification
  • Block metadata Size, next block ID, and CRC checksums
  • Chain support Automatic handling of data spanning multiple blocks
  • Free space tracking Intelligent free block scanning and allocation
Sync Options
  • SyncNone No disk synchronization (fastest, least durable)
  • SyncFull Synchronous writes with fdatasync (safest, slower)
  • SyncPartial Background synchronization at configurable intervals (balanced)
Recovery Features
  • Block validation CRC verification on all block reads
  • Chain reconstruction Automatic detection and handling of multi-block data
  • Free space recovery Intelligent scanning to rebuild free block allocation table

LRU Cache

Wildcat uses a sophisticated lock-free LRU cache for block manager handles.

Advanced Features
  • Lock-free design Atomic operations with lazy eviction
  • Anti-thrashing Smart eviction with access pattern analysis
  • Load factor awareness Only evicts when near capacity (95%+)
  • Emergency recovery Automatic detection and recovery from stuck states
  • Node reuse Efficient memory management with node recycling
Eviction Algorithm
evictionScore = accessWeight * accessCount + timeWeight * age
  • Balanced approach Considers both access frequency and recency
  • Configurable weights BlockManagerLRUAccesWeight controls the balance
  • Gradual eviction Evicts BlockManagerLRUEvictRatio portion when needed
  • Emergency fallback Aggressive cleanup when cache becomes unresponsive
Performance Optimizations
  • Batched processing Processes eviction queue in controlled batches
  • Progress tracking Monitors cache operations to detect performance issues
  • Concurrent-safe Multiple threads can safely access cached resources
  • Resource cleanup Automatic resource management with eviction callbacks

Lock-Free Queue

  • Michael & Scott algorithm Industry-standard lock-free queue design
  • ABA problem prevention Proper pointer management and consistency checks
  • Memory ordering Atomic operations ensure proper synchronization
  • High throughput Supports millions of operations per second under contention

BTree

Wildcat's BTree provides the foundation for SSTable key storage with advanced features for range queries and bidirectional iteration.

Core
  • Immutable design Once written, BTrees are never modified, ensuring consistency
  • Configurable order SSTableBTreeOrder controls node size and tree depth
  • Metadata storage Supports arbitrary metadata attachment for SSTable information
  • Block-based storage Integrates seamlessly with the block manager for efficient I/O
Advanced Features
  • Bidirectional iteration Full support for forward and reverse traversal
  • Range queries Efficient RangeIterator with start/end key bounds
  • Prefix iteration Optimized PrefixIterator for prefix-based searches
  • Seek operations Direct positioning to any key with Seek, SeekToFirst, SeekToLast
  • BSON serialization Automatic serialization/deserialization of nodes and values
Iterator Capabilities
// Multiple iterator types with consistent interface
iterator, err := btree.Iterator(ascending)                      // Full tree iteration
rangeIter, err := btree.RangeIterator(start, end, ascending)    // Range-bounded
prefixIter, err := btree.PrefixIterator(prefix, ascending)      // Prefix-based

// All iterators support
iter.Next()           // Advance in configured direction
iter.Prev()           // Move opposite to configured direction
iter.Seek(key)        // Position at specific key
iter.SetDirection()   // Change iteration direction
iter.Valid()          // Check if positioned at valid entry
Performance Optimizations
  • Lazy loading Nodes loaded on-demand from block storage
  • Efficient splitting Automatic node splitting maintains balance
  • Leaf linking Direct leaf-to-leaf pointers for faster iteration
  • Memory efficient Minimal memory footprint with block-based storage
Node Structure
  • Internal nodes Store keys and child pointers for navigation
  • Leaf nodes Store actual key-value pairs with next/prev links
  • BSON encoding Consistent serialization format across all node types

SkipList

Wildcat's SkipList serves as the core data structure for memtables, providing concurrent MVCC access with lock-free operations.

MVCC Architecture
  • Version chains Each key maintains a linked list of timestamped versions
  • Timestamp ordering Physical nanosecond timestamps ensure global ordering
  • Lock-free access Atomic operations enable concurrent reads and writes
  • Snapshot isolation Readers see consistent snapshots at their read timestamp
Concurrency Features
  • Non-blocking reads Readers never block, even during concurrent writes
  • Atomic writes CAS operations ensure thread-safe modifications
  • Version visibility Timestamp-based filtering for MVCC consistency
  • Memory ordering Proper atomic synchronization prevents race conditions
Version Management
type ValueVersion struct {
    Data      []byte           // Actual value data
    Timestamp int64            // Version timestamp
    Type      ValueVersionType // Write or Delete marker
    Next      *ValueVersion    // Link to previous version
}
Advanced Operations
  • Put operations Atomic insertion with version chaining
  • Delete operations Tombstone markers with timestamp tracking
  • Get operations Version-aware lookups with timestamp filtering
  • Range scanning Efficient iteration over key ranges with MVCC consistency

Iterator Types

// Multiple specialized iterators
iterator := skiplist.NewIterator(startKey, readTimestamp)
rangeIter := skiplist.NewRangeIterator(start, end, readTimestamp)
prefixIter := skiplist.NewPrefixIterator(prefix, readTimestamp)

// All support bidirectional traversal with MVCC semantics
iter.Next()    // Forward iteration with timestamp filtering
iter.Prev()    // Backward iteration with timestamp filtering
iter.Peek()    // Non-destructive current value inspection
Performance Characteristics
  • O(log n) average case for all operations
  • Lock-free design High concurrency with minimal contention
  • Memory efficient Optimized node structure with atomic pointers
  • Skip probability Configurable level distribution (p=0.25)
  • Maximum levels Bounded height (MaxLevel=16) for predictable performance
MVCC Semantics
  • Read consistency Transactions see stable snapshots throughout their lifetime
  • Write isolation Concurrent writes don't interfere with each other
  • Timestamp ordering Later timestamps always override earlier ones
  • Tombstone handling Delete operations create versioned tombstone markers
  • Garbage collection Old versions naturally filtered by timestamp-based reads
Memory Layout
  • Node structure Atomic forward pointers array + backward pointer
  • Level generation Probabilistic level assignment with geometric distribution
  • Version chains Per-key linked lists with atomic head pointers

Motivation

My name is Alex Gaetano Padula, and I've spent the past several years fully immersed in one thing.. databases and storage engines. Not frameworks. Not trends. Just the raw, unforgiving internals of how data lives, moves, and survives.

My journey began not with textbooks or formal training, but with pure curiosity. After writing my first database, CursusDB, to solve a real-time distributed problem, I couldn't stop experimenting. I dove into everything I could get my hands on.

I didn't just want to use a database. I wanted to understand and build every layer of one, from disk to wire.

I'm not a database or storage expert - I'm just someone who loves building and tinkering. I've written everything from the simplest key-value stores to complex real-time distributed systems, learning through trial and error, late-night debugging sessions, and countless experiments.

Each engine taught me something new. Each mistake deepened my understanding. Each redesign brought me closer to fresh insights and ideas.

And that's where Wildcat came in.

I wanted to build a storage layer that didn't force you to choose between performance, safety, and usability. A system that embraced concurrency, durability, and simplicity, not as tradeoffs, but as first principles. Wildcat is my answer to the single-writer bottleneck - an MVCC, multi-writer, LSM-tree storage engine built in Go with a fully non-blocking transactional model.

Wildcat isn't just code I wrote. It's the distillation of every storage engine I've experimented with, every system I've torn down to understand, and every lesson I've learned through hands-on building.

It's a culmination of countless experiments, sleepless nights, LOTS of COFFEE, and a relentless curiosity about how things actually work under the hood.

Contributing

You can contribute, no problem! Find a bug, optimization, refactor and submit a PR. It will be reviewed. You're only helping us all :)

Documentation

Overview

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Package wildcat

(C) Copyright Alex Gaetano Padula

Licensed under the Mozilla Public License, v. 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.mozilla.org/en-US/MPL/2.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Index

Constants

View Source
const (
	SSTablePrefix    = "sst_"     // Prefix for SSTable files
	LevelPrefix      = "l"        // Prefix for level directories i.e. "l1", "l2", etc.
	WALFileExtension = ".wal"     // Extension for Write Ahead Log files <timestamp>.wal
	KLogExtension    = ".klog"    // Extension for KLog files
	VLogExtension    = ".vlog"    // Extension for VLog files
	IDGSTFileName    = "idgstate" // Filename for ID generator state
)

Prefixes, filenames, extensions constants

View Source
const (
	DefaultWriteBufferSize = 64 * 1024 * 1024     // Default write buffer size
	DefaultSyncOption      = SyncNone             // Default sync option for write operations
	DefaultSyncInterval    = 16 * time.Nanosecond // Default sync interval for write operations
	DefaultLevelCount      = 6                    // Default number of levels in the LSM tree
	DefaultLevelMultiplier = 10                   // Multiplier for the number of levels
	// 64MB -> 640MB -> 6.4GB ->  64GB -> 640GB ->  6.4TB
	DefaultBlockManagerLRUSize         = 1024            // Size of the LRU cache for block managers
	DefaultBlockManagerLRUEvictRatio   = 0.20            // Eviction ratio for the LRU cache
	DefaultBlockManagerLRUAccessWeight = 0.8             // Access weight for the LRU cache
	DefaultPermission                  = 0750            // Default permission for created files
	DefaultBloomFilter                 = false           // Default Bloom filter option
	DefaultMaxCompactionConcurrency    = 4               // Default max compaction concurrency
	DefaultCompactionCooldownPeriod    = 5 * time.Second // Default cooldown period for compaction
	DefaultCompactionBatchSize         = 8               // Default max number of SSTables to compact at once
	DefaultCompactionSizeRatio         = 1.1             // Default level size ratio that triggers compaction
	DefaultCompactionSizeThreshold     = 8               // Default number of files to trigger size-tiered compaction
	DefaultCompactionScoreSizeWeight   = 0.8             // Default weight for size-based score
	DefaultCompactionScoreCountWeight  = 0.2             // Default weight for count-based score
	DefaultFlusherTickerInterval       = 1 * time.Millisecond
	DefaultCompactorTickerInterval     = 250 * time.Millisecond // Default interval for compactor ticker
	DefaultBloomFilterProbability      = 0.01                   // Default probability for Bloom filter
	DefaultWALAppendRetry              = 10                     // Default number of retries for WAL append
	DefaultWALAppendBackoff            = 128 * time.Microsecond // Default backoff duration for WAL append
	DefaultSSTableBTreeOrder           = 10                     // Default order of the B-tree for SSTables
)

Defaults

Variables

This section is empty.

Functions

This section is empty.

Types

type Compactor

type Compactor struct {
	// contains filtered or unexported fields
}

Compactor is responsible for managing compaction jobs

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB represents the main Wildcat structure

func Open

func Open(opts *Options) (*DB, error)

Open initializes a new Wildcat instance with the provided options

func (*DB) Begin

func (db *DB) Begin() *Txn

Begin starts a new transaction

func (*DB) Close

func (db *DB) Close() error

Close closes the database and all open resources

func (*DB) ForceFlush

func (db *DB) ForceFlush() error

ForceFlush forces the flush of all memtables and immutable memtables

func (*DB) GetTxn

func (db *DB) GetTxn(id int64) (*Txn, error)

GetTxn retrieves a transaction by ID. Can be used on system recovery. You can recover an incomplete transaction.

func (*DB) Stats added in v1.0.0

func (db *DB) Stats() string

Stats returns a string with the current statistics of the Wildcat database

func (*DB) Update

func (db *DB) Update(fn func(txn *Txn) error) error

Update performs an atomic update using a transaction

func (*DB) View

func (db *DB) View(fn func(txn *Txn) error) error

View performs a read-only transaction

type Flusher

type Flusher struct {
	// contains filtered or unexported fields
}

Flusher is responsible for queuing and flushing memtables to disk

type IDGenerator

type IDGenerator struct {
	// contains filtered or unexported fields
}

IDGenerator is a thread-safe ID generator

type IDGeneratorState

type IDGeneratorState struct {
	// contains filtered or unexported fields
}

IDGeneratorState represents the state of the ID generator. When system shuts down the state is saved to disk and restored on next startup.

type KLogEntry

type KLogEntry struct {
	Key          []byte // Key of the entry
	Timestamp    int64  // Timestamp of the entry
	ValueBlockID int64  // Block ID of the value
}

KLogEntry represents a key-value entry in the KLog

type Level

type Level struct {
	// contains filtered or unexported fields
}

Level is a disk level within Wildcat, which contains a list of immutable SSTables

func (*Level) SSTables added in v1.0.0

func (l *Level) SSTables() []*SSTable

SSTables returns the list of SSTables in the level

type Memtable

type Memtable struct {
	// contains filtered or unexported fields
}

Memtable is a memory table structure

type MergeIterator

type MergeIterator struct {
	// contains filtered or unexported fields
}

MergeIterator combines multiple iterators into a single iterator

func NewMergeIterator

func NewMergeIterator(iterators []*iterator, ts int64, ascending bool) (*MergeIterator, error)

NewMergeIterator creates a new MergeIterator with the given iterators

func (*MergeIterator) HasNext added in v1.0.0

func (mi *MergeIterator) HasNext() bool

HasNext returns true if there are more entries in the configured direction

func (*MergeIterator) HasPrev added in v1.0.0

func (mi *MergeIterator) HasPrev() bool

HasPrev returns true if there are entries in the opposite direction

func (*MergeIterator) IsAscending added in v1.0.0

func (mi *MergeIterator) IsAscending() bool

IsAscending returns the current iteration direction

func (*MergeIterator) Next

func (mi *MergeIterator) Next() ([]byte, []byte, int64, bool)

Next returns the next key-value pair in the configured direction

func (*MergeIterator) Prev

func (mi *MergeIterator) Prev() ([]byte, []byte, int64, bool)

Prev returns the previous key-value pair (opposite of configured direction)

func (*MergeIterator) SetDirection added in v1.0.0

func (mi *MergeIterator) SetDirection(ascending bool) error

SetDirection changes the iteration direction

type Options

type Options struct {
	Directory                  string        // Directory for Wildcat
	WriteBufferSize            int64         // Size of the write buffer
	SyncOption                 SyncOption    // Sync option for write operations
	SyncInterval               time.Duration // Interval for syncing the write buffer
	LevelCount                 int           // Number of levels in the LSM tree
	LevelMultiplier            int           // Multiplier for the number of levels
	BlockManagerLRUSize        int           // Size of the LRU cache for block managers
	BlockManagerLRUEvictRatio  float64       // Eviction ratio for the LRU cache
	BlockManagerLRUAccesWeight float64       // Access weight for the LRU cache
	Permission                 os.FileMode   // Permission for created files
	LogChannel                 chan string   // Channel for logging
	BloomFilter                bool          // Enable Bloom filter for SSTables
	MaxCompactionConcurrency   int           // Maximum number of concurrent compactions
	CompactionCooldownPeriod   time.Duration // Cooldown period for compaction
	CompactionBatchSize        int           // Max number of SSTables to compact at once
	CompactionSizeRatio        float64       // Level size ratio that triggers compaction
	CompactionSizeThreshold    int           // Number of files to trigger size-tiered compaction
	CompactionScoreSizeWeight  float64       // Weight for size-based score
	CompactionScoreCountWeight float64       // Weight for count-based score
	FlusherTickerInterval      time.Duration // Interval for flusher ticker
	CompactorTickerInterval    time.Duration // Interval for compactor ticker
	BloomFilterFPR             float64       // False positive rate for Bloom filter
	WalAppendRetry             int           // Number of retries for WAL append
	WalAppendBackoff           time.Duration // Backoff duration for WAL append
	SSTableBTreeOrder          int           // Order of the B-tree for SSTables
}

Options represents the configuration options for Wildcat

type SSTable

type SSTable struct {
	Id          int64                    // SStable ID
	Min         []byte                   // The minimum key in the SSTable
	Max         []byte                   // The maximum key in the SSTable
	Size        int64                    // The size of the SSTable in bytes
	EntryCount  int                      // The number of entries in the SSTable
	Level       int                      // The level of the SSTable
	BloomFilter *bloomfilter.BloomFilter // Optional bloom filter for fast lookups
	Timestamp   int64                    // Timestamp of latest entry in the SSTable
	// contains filtered or unexported fields
}

SSTable represents a sorted string table

type SyncOption

type SyncOption int

SyncOption is a block manager sync option that can be set by the user.

const (
	SyncNone    SyncOption = iota // We don't sync
	SyncFull                      // We sync the entire block manager after each write
	SyncPartial                   // We sync based at intervals based on the SyncInterval option.. So every SyncInterval we sync the block manager.
)

type Txn

type Txn struct {
	Id        int64             // The transactions id, can be recovered
	ReadSet   map[string]int64  // Key -> Timestamp
	WriteSet  map[string][]byte // Key -> Value
	DeleteSet map[string]bool   // Key -> Deleted
	Timestamp int64             // The timestamp of the transaction
	Committed bool              // Whether the transaction is committed
	// contains filtered or unexported fields
}

Txn represents a transaction in Wildcat

func (*Txn) Commit

func (txn *Txn) Commit() error

Commit commits the transaction

func (*Txn) Delete

func (txn *Txn) Delete(key []byte) error

Delete removes a key from database

func (*Txn) Get

func (txn *Txn) Get(key []byte) ([]byte, error)

Get retrieves a value by key

func (*Txn) NewIterator

func (txn *Txn) NewIterator(asc bool) (*MergeIterator, error)

NewIterator creates a new bidirectional iterator

func (*Txn) NewPrefixIterator added in v1.0.0

func (txn *Txn) NewPrefixIterator(prefix []byte, asc bool) (*MergeIterator, error)

NewPrefixIterator creates a new prefix bidirectional iterator

func (*Txn) NewRangeIterator added in v1.0.0

func (txn *Txn) NewRangeIterator(startKey []byte, endKey []byte, asc bool) (*MergeIterator, error)

NewRangeIterator creates a new range bidirectional iterator

func (*Txn) Put

func (txn *Txn) Put(key []byte, value []byte) error

Put adds key-value pair to database

func (*Txn) Rollback

func (txn *Txn) Rollback() error

Rollback rolls back the transaction

type WAL

type WAL struct {
	// contains filtered or unexported fields
}

WAL is a write-ahead log structure

Directories

Path Synopsis
Package blockmanager
Package blockmanager
Package bloomfilter
Package bloomfilter
Package lru
Package lru
Package queue
Package queue
Package skiplist
Package skiplist
Package tree
Package tree

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL