wormdb

package module
v0.0.0-...-3f005bc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 27, 2025 License: Apache-2.0 Imports: 11 Imported by: 0

README

WORM-DB - Write Once Read Many Database

The WORMDB format is a binary database format designed for both efficient storage and efficient lookups of sorted data. The database works off a disk binary file and an index file. The index file provides the ability to lookup which block a record should be located. The record can then be retrieved and returned. A record in the WORMDB is always a binary slice of arbitrary length. To search for an individual record, provide a unique prefix.

Note: If the query prefix match matches multiple record indexs, either the first record will be returned or an error. Make sure your index is unique.

An example of how the database works:

If one is looking for the record starting with abc123 they can use the Get() function to find this record:

An example database (one entry per line)

abc122cat
abc123bat
abc124dob
	f, err := os.Create("new.db")
	if err != nil {
		log.Fatal(err)
	}
	bs := worm.NewBinarySearch()
	db, err := worm.New(f,
		worm.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()
	db.Add([]byte("hello world"))
	db.Add([]byte("hello world abc"))
	db.Add([]byte("hello world def"))
	db.Add([]byte("hello world ghi"))

	db.Finalize()
	err = db.Get([]byte("hello world ab"), func(rec []byte) error {
		fmt.Println("found:", string(rec))
		return nil
	})
	// Output:
	// found: hello world abc

Documentation

Index

Examples

Constants

This section is empty.

Variables

View Source
var Debug bool

Turn on debug logging

Functions

This section is empty.

Types

type BinarySearch

type BinarySearch struct {
	Index [][]byte
	// contains filtered or unexported fields
}

func LoadBinarySearch

func LoadBinarySearch(index [][]byte) *BinarySearch

Load a binary search from a memory 2-D byte slice.

func LoadDiskBinarySearch

func LoadDiskBinarySearch(file *os.File) (*BinarySearch, error)

Load a binary search from disk.

Example
package main

import (
	"fmt"
	"log"
	"os"

	bwdb "github.com/pschou/go-wormdb"
)

func main() {
	f, err := os.Open("disk_data.db")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	ind, err := os.Open("disk_index.db")
	if err != nil {
		log.Fatal(err)
	}
	defer ind.Close()

	bs, err := bwdb.LoadDiskBinarySearch(ind)
	if err != nil {
		log.Fatal(err)
	}

	db, err := bwdb.Open(f,
		bwdb.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	err = db.Get([]byte("hello world de"), func(rec []byte) error {
		fmt.Println("found:", string(rec))
		return nil
	})
}
Output:

found: hello world def

func NewBinarySearch

func NewBinarySearch() *BinarySearch

Build a search index in memory for the constructed wormdb. Please note that there must be enough memory on the system for the Index when the database is being built. This is in opposed to the [NewFileBinarySearch], which uses disk instead of memory.

func NewDiskBinarySearch

func NewDiskBinarySearch(file *os.File) *BinarySearch

Build a search index on disk for the constructed wormdb. This is in opposed to the [NewMemoryBinarySearch], which uses memory instead of disk and is ready for use when Finalize() is called. One must then load the index from disk into memory with LoadIndexToMemory().

Example
package main

import (
	"fmt"
	"log"
	"os"

	bwdb "github.com/pschou/go-wormdb"
)

func main() {
	f, err := os.Create("disk_data.db")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	ind, err := os.Create("disk_index.db")
	if err != nil {
		log.Fatal(err)
	}
	defer ind.Close()

	bs := bwdb.NewDiskBinarySearch(ind)
	db, err := bwdb.New(f,
		bwdb.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()
	db.Add([]byte("hello world"))
	// Making the file larger
	for i := 4; i < 1000; i++ {
		db.Add([]byte(fmt.Sprintf("hello world a%08d00000000000000000000000000000000000000000000000000000000000000000000000000000000", i)))
	}
	db.Add([]byte("hello world abc"))
	db.Add([]byte("hello world def"))
	db.Add([]byte("hello world ghi"))
	// Making the file larger
	for i := 4; i < 1000; i++ {
		db.Add([]byte(fmt.Sprintf("hello world p%08d00000000000000000000000000000000000000000000000000000000000000000000000000000000", i)))
	}

	// Flush the database to disk and switch to read mode
	err = db.Finalize()
	if err != nil {
		log.Fatal(err)
	}

	// Load indexes into memory from disk
	err = bs.LoadIndexToMemory()
	if err != nil {
		log.Fatal(err)
	}

	err = db.Get([]byte("hello world de"), func(rec []byte) error {
		fmt.Println("found:", string(rec))
		return nil
	})

}
Output:

found: hello world def

func NewDiskMemBinarySearch

func NewDiskMemBinarySearch(file *os.File) *BinarySearch

Build a search index on disk and in memory for the constructed wormdb. This is in opposed to the [NewMemoryBinarySearch], which uses memory instead of disk and is ready for use when Finalize() is called. One must then load the index from disk into memory with LoadIndexToMemory().

func (*BinarySearch) Add

func (s *BinarySearch) Add(needle []byte) error

Add a record into the searchable list. This involves an in-memory cache of the first record in each block and built using a link list so as to avoid growing memory and doing a slice copy.

When Finalize() is called the linked list is flattened into a 2-D byte slice in memory and the list is disposed of.

func (*BinarySearch) Finalize

func (s *BinarySearch) Finalize() error

Do not call this directly, but instead wormdb calls this once the database has been finalized.

func (*BinarySearch) Find

func (s *BinarySearch) Find(needle []byte) (pos int, lower []byte, exactMatch bool)

Find will search for a needle in the Index and return either the match or the lower bound where the match would be located between two entries. The purpose of the lower bound is to ensure that the match will be contained in the block retrieved from slow storage, such as a disk.

func (*BinarySearch) FindBounds

func (s *BinarySearch) FindBounds(needle []byte) (pos int, lower, upper []byte, exactMatch bool)

FindBounds will search for a needle in the Index and return either the match or the lower and upper bound matches where the match would be located between two entries. The purpose of the lower bound is to ensure that the match will be contained in the block retrieved from slow storage (such as a disk) and the upper bound is useful for segmenting data to make sure the result lies within the block.

func (*BinarySearch) LoadIndexToMemory

func (s *BinarySearch) LoadIndexToMemory() error

When the index is built on disk, one can load the index into memory for use.

type Cache

type Cache interface {
	// Either get the value or begin to store the element into a cache
	GetOrCompute(key string, value func() *Result) (actual *Result, loaded bool)

	// Element being set is now fully populated, one can use this to delete older
	// elements as it is called once a Set has taken place.
	Stored(key string)
}

type CacheMap

type CacheMap struct {

	// Set this function to handle when a cached value is hit
	CountHit func(key string)
	// contains filtered or unexported fields
}

func NewCacheMap

func NewCacheMap(size int) *CacheMap

func (*CacheMap) GetOrCompute

func (c *CacheMap) GetOrCompute(K string, V func() *Result) (*Result, bool)

func (*CacheMap) Stored

func (c *CacheMap) Stored(K string)

type CompareFunc

type CompareFunc func(a, b []byte) int

Compare returns an integer comparing two byte slices lexicographically. The result will be 0 if a == b, -1 if a < b, and +1 if a > b. A nil argument is equivalent to an empty slice.

Record removal is also possible, if -2 is provided then only `a` will be used and if +2 is provided then only `b` will be used.

This is mainly used with the WithMerge function. Note that the previous merged-to database will always be `a` and the incoming data will be `b`.

type DB

type DB struct {
	// contains filtered or unexported fields
}

func New

func New(file *os.File, options ...Option) (*DB, error)

Create a WORM db using the os.File handle to write a Write-Once-Read-Many ordered database optimized for reading based on sectors.

Example
package main

import (
	"fmt"
	"log"
	"os"

	bwdb "github.com/pschou/go-wormdb"
)

func main() {
	f, err := os.Create("new.db")
	if err != nil {
		log.Fatal(err)
	}
	bs := bwdb.NewBinarySearch()
	db, err := bwdb.New(f,
		bwdb.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()
	db.Add([]byte("hello world"))
	db.Add([]byte("hello world abc"))
	db.Add([]byte("hello world def"))
	db.Add([]byte("hello world ghi"))

	db.Finalize()
	err = db.Get([]byte("hello world ab"), func(rec []byte) error {
		fmt.Println("found:", string(rec))
		return nil
	})
}
Output:

found: hello world abc

func Open

func Open(file *os.File, options ...Option) (*DB, error)

Open a wormdb for use, note that the index must be provided out of band.

Example
package main

import (
	"fmt"
	"log"
	"os"

	bwdb "github.com/pschou/go-wormdb"
)

var index [][]byte

func main() {
	f, err := os.Open("test.db")
	if err != nil {
		log.Fatal(err)
	}
	bs := bwdb.LoadBinarySearch(index)
	db, err := bwdb.Open(f,
		bwdb.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}

	// Note that the index must be stored out of band
	db.Get([]byte("hello world qrs"), func(rec []byte) error {
		fmt.Printf("rec: %q err: %v\n", rec, err)
		return nil
	})
	db.Close()
}
Output:

rec: "hello world qrs00000000000000000000000000000000000000000000000000000000000000000000000000000000" err: <nil>

func (*DB) Add

func (d *DB) Add(rec []byte) (err error)

Add a record to a wormdb when it is in write mode.

func (*DB) Close

func (d *DB) Close() error

Close the database and the file handle at the same time.

func (*DB) Finalize

func (d *DB) Finalize() (err error)

Finalize the database, write any buffers to disk, and build search index.

func (DB) Get

func (d DB) Get(needle []byte, handler func([]byte) error) error

Search for a record in a wormdb and call func if a match is found. Only the first matching prefix will be returned, so larger matches will be ignored.

The slice MUST be copied to a local variable as the underlying byte slice will be reused in future function calls.

func (*DB) NewWalker

func (d *DB) NewWalker() *Walker

NewWalker will return all the records in a wormdb with a scanner like interface.

The slice MUST be copied to a local variable as the underlying byte slice will be reused in future function calls.

type Option

type Option func(*DB)

func WithBlockSize

func WithBlockSize(v int) Option

Define a custom block size, if left unset the value of 4096 is used.

func WithCache

func WithCache(c Cache) Option

Include an optional cache to help with speeding up repeat calls.

func WithMerge

func WithMerge(old *DB, comp CompareFunc) Option

Build from a previous wormDB and merge the records.

func WithOffset

func WithOffset(v int64) Option

Offset at the beginning of the file to ignore. This must be a step size of the blocksize.

func WithSearch

func WithSearch(s Search) Option

Include a call back for the search function to use. The built option uses the binary search to find records within the index.

type Result

type Result struct {
	// contains filtered or unexported fields
}
type Search interface {
	// When building an index, this is called upon each insertion and can be used
	// to build a more complex searching method.
	//
	// If one plans on doing complex sorting logic, it is recommended to use a
	// pre-forked-goroutine and a sized input channel to absorb the needles as
	// they are loaded so as to free up the reading method (and maintain
	// insertion order) so as to continue to write to disk additional records
	// until the next block is surpassed.  Also-- after the wormdb has been fully
	// loaded, one must make sure that this input channel has been flushed out
	// before querying the wormdb.
	Add(needle []byte) error

	// Look up a record and determine the sector on disk to read from.
	Find(needle []byte) (sectorId int, lower []byte, wasExactMatch bool)

	// Return the lower and upper bounds of a block for the given needle.
	FindBounds(needle []byte) (sectorId int, lower, upper []byte, wasExactMatch bool)

	// Called after last record has been added.
	Finalize() error
}

type Walker

type Walker struct {
	// contains filtered or unexported fields
}

func (*Walker) Bytes

func (w *Walker) Bytes() []byte

Bytes returns the most recent token generated by a call to Walker.Scan. The underlying array may point to data that will be overwritten by a subsequent call to Scan. It does no allocation.

func (*Walker) Err

func (w *Walker) Err() error

Err returns the first non-EOF error that was encountered by the Walker.

func (*Walker) Scan

func (w *Walker) Scan() bool

Scan advances the Walker to the next token, which will then be available through the Walker.Bytes or Walker.Text method. It returns false when there are no more tokens, either by reaching the end of the input or an error. After Scan returns false, the Walker.Err method will return any error that occurred during scanning, except that if it was io.EOF, Walker.Err will return nil.

func (*Walker) Text

func (s *Walker) Text() string

Text returns the most recent token generated by a call to Walker.Scan as a newly allocated string holding its bytes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL