wormdb

package module

v0.0.0-...-3f005bc Latest Latest Go to latest Published: Feb 27, 2025 License: Apache-2.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/pschou/go-wormdb

Links

Open Source Insights

README ¶

WORM-DB - Write Once Read Many Database

The WORMDB format is a binary database format designed for both efficient storage and efficient lookups of sorted data. The database works off a disk binary file and an index file. The index file provides the ability to lookup which block a record should be located. The record can then be retrieved and returned. A record in the WORMDB is always a binary slice of arbitrary length. To search for an individual record, provide a unique prefix.

Note: If the query prefix match matches multiple record indexs, either the first record will be returned or an error. Make sure your index is unique.

An example of how the database works:

If one is looking for the record starting with abc123 they can use the Get() function to find this record:

An example database (one entry per line)

abc122cat
abc123bat
abc124dob

	f, err := os.Create("new.db")
	if err != nil {
		log.Fatal(err)
	}
	bs := worm.NewBinarySearch()
	db, err := worm.New(f,
		worm.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()
	db.Add([]byte("hello world"))
	db.Add([]byte("hello world abc"))
	db.Add([]byte("hello world def"))
	db.Add([]byte("hello world ghi"))

	db.Finalize()
	err = db.Get([]byte("hello world ab"), func(rec []byte) error {
		fmt.Println("found:", string(rec))
		return nil
	})
	// Output:
	// found: hello world abc

Documentation ¶

Index ¶

Variables
type BinarySearch
type Cache
type CacheMap
- func NewCacheMap(size int) *CacheMap
- func (c *CacheMap) GetOrCompute(K string, V func() *Result) (*Result, bool)
- func (c *CacheMap) Stored(K string)
type CompareFunc
type DB
- func New(file *os.File, options ...Option) (*DB, error)
- func Open(file *os.File, options ...Option) (*DB, error)
type Option
type Result
type Search
type Walker

Constants ¶

This section is empty.

Variables ¶

View Source

var Debug bool

Turn on debug logging

Functions ¶

This section is empty.

Types ¶

type BinarySearch ¶

type BinarySearch struct {
	Index [][]byte
	// contains filtered or unexported fields
}

func LoadBinarySearch ¶

func LoadBinarySearch(index [][]byte) *BinarySearch

Load a binary search from a memory 2-D byte slice.

func LoadDiskBinarySearch ¶

func LoadDiskBinarySearch(file *os.File) (*BinarySearch, error)

Load a binary search from disk.

Example ¶

package main

import (
	"fmt"
	"log"
	"os"

	bwdb "github.com/pschou/go-wormdb"
)

func main() {
	f, err := os.Open("disk_data.db")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	ind, err := os.Open("disk_index.db")
	if err != nil {
		log.Fatal(err)
	}
	defer ind.Close()

	bs, err := bwdb.LoadDiskBinarySearch(ind)
	if err != nil {
		log.Fatal(err)
	}

	db, err := bwdb.Open(f,
		bwdb.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	err = db.Get([]byte("hello world de"), func(rec []byte) error {
		fmt.Println("found:", string(rec))
		return nil
	})
}

Output:

found: hello world def

func NewBinarySearch ¶

func NewBinarySearch() *BinarySearch

Build a search index in memory for the constructed wormdb. Please note that there must be enough memory on the system for the Index when the database is being built. This is in opposed to the [NewFileBinarySearch], which uses disk instead of memory.

func NewDiskBinarySearch ¶

func NewDiskBinarySearch(file *os.File) *BinarySearch

Build a search index on disk for the constructed wormdb. This is in opposed to the [NewMemoryBinarySearch], which uses memory instead of disk and is ready for use when Finalize() is called. One must then load the index from disk into memory with LoadIndexToMemory().

Example ¶

package main

import (
	"fmt"
	"log"
	"os"

	bwdb "github.com/pschou/go-wormdb"
)

func main() {
	f, err := os.Create("disk_data.db")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	ind, err := os.Create("disk_index.db")
	if err != nil {
		log.Fatal(err)
	}
	defer ind.Close()

	bs := bwdb.NewDiskBinarySearch(ind)
	db, err := bwdb.New(f,
		bwdb.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()
	db.Add([]byte("hello world"))
	// Making the file larger
	for i := 4; i < 1000; i++ {
		db.Add([]byte(fmt.Sprintf("hello world a%08d00000000000000000000000000000000000000000000000000000000000000000000000000000000", i)))
	}
	db.Add([]byte("hello world abc"))
	db.Add([]byte("hello world def"))
	db.Add([]byte("hello world ghi"))
	// Making the file larger
	for i := 4; i < 1000; i++ {
		db.Add([]byte(fmt.Sprintf("hello world p%08d00000000000000000000000000000000000000000000000000000000000000000000000000000000", i)))
	}

	// Flush the database to disk and switch to read mode
	err = db.Finalize()
	if err != nil {
		log.Fatal(err)
	}

	// Load indexes into memory from disk
	err = bs.LoadIndexToMemory()
	if err != nil {
		log.Fatal(err)
	}

	err = db.Get([]byte("hello world de"), func(rec []byte) error {
		fmt.Println("found:", string(rec))
		return nil
	})

}

Output:

found: hello world def

func NewDiskMemBinarySearch ¶

func NewDiskMemBinarySearch(file *os.File) *BinarySearch

Build a search index on disk and in memory for the constructed wormdb. This is in opposed to the [NewMemoryBinarySearch], which uses memory instead of disk and is ready for use when Finalize() is called. One must then load the index from disk into memory with LoadIndexToMemory().

func (*BinarySearch) Add ¶

func (s *BinarySearch) Add(needle []byte) error

Add a record into the searchable list. This involves an in-memory cache of the first record in each block and built using a link list so as to avoid growing memory and doing a slice copy.

When Finalize() is called the linked list is flattened into a 2-D byte slice in memory and the list is disposed of.

func (*BinarySearch) Finalize ¶

func (s *BinarySearch) Finalize() error

Do not call this directly, but instead wormdb calls this once the database has been finalized.

func (*BinarySearch) Find ¶

func (s *BinarySearch) Find(needle []byte) (pos int, lower []byte, exactMatch bool)

Find will search for a needle in the Index and return either the match or the lower bound where the match would be located between two entries. The purpose of the lower bound is to ensure that the match will be contained in the block retrieved from slow storage, such as a disk.

func (*BinarySearch) FindBounds ¶

func (s *BinarySearch) FindBounds(needle []byte) (pos int, lower, upper []byte, exactMatch bool)

FindBounds will search for a needle in the Index and return either the match or the lower and upper bound matches where the match would be located between two entries. The purpose of the lower bound is to ensure that the match will be contained in the block retrieved from slow storage (such as a disk) and the upper bound is useful for segmenting data to make sure the result lies within the block.

func (*BinarySearch) LoadIndexToMemory ¶

func (s *BinarySearch) LoadIndexToMemory() error

When the index is built on disk, one can load the index into memory for use.

type Cache ¶

type Cache interface {
	// Either get the value or begin to store the element into a cache
	GetOrCompute(key string, value func() *Result) (actual *Result, loaded bool)

	// Element being set is now fully populated, one can use this to delete older
	// elements as it is called once a Set has taken place.
	Stored(key string)
}

type CacheMap ¶

type CacheMap struct {

	// Set this function to handle when a cached value is hit
	CountHit func(key string)
	// contains filtered or unexported fields
}

func NewCacheMap ¶

func NewCacheMap(size int) *CacheMap

func (*CacheMap) GetOrCompute ¶

func (c *CacheMap) GetOrCompute(K string, V func() *Result) (*Result, bool)

func (*CacheMap) Stored ¶

func (c *CacheMap) Stored(K string)

type CompareFunc ¶

type CompareFunc func(a, b []byte) int

Compare returns an integer comparing two byte slices lexicographically. The result will be 0 if a == b, -1 if a < b, and +1 if a > b. A nil argument is equivalent to an empty slice.

Record removal is also possible, if -2 is provided then only `a` will be used and if +2 is provided then only `b` will be used.

This is mainly used with the WithMerge function. Note that the previous merged-to database will always be `a` and the incoming data will be `b`.

type DB ¶

type DB struct {
	// contains filtered or unexported fields
}

func New ¶

func New(file *os.File, options ...Option) (*DB, error)

Create a WORM db using the os.File handle to write a Write-Once-Read-Many ordered database optimized for reading based on sectors.

Example ¶

package main

import (
	"fmt"
	"log"
	"os"

	bwdb "github.com/pschou/go-wormdb"
)

func main() {
	f, err := os.Create("new.db")
	if err != nil {
		log.Fatal(err)
	}
	bs := bwdb.NewBinarySearch()
	db, err := bwdb.New(f,
		bwdb.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()
	db.Add([]byte("hello world"))
	db.Add([]byte("hello world abc"))
	db.Add([]byte("hello world def"))
	db.Add([]byte("hello world ghi"))

	db.Finalize()
	err = db.Get([]byte("hello world ab"), func(rec []byte) error {
		fmt.Println("found:", string(rec))
		return nil
	})
}

Output:

found: hello world abc

func Open ¶

func Open(file *os.File, options ...Option) (*DB, error)

Open a wormdb for use, note that the index must be provided out of band.

Example ¶

package main

import (
	"fmt"
	"log"
	"os"

	bwdb "github.com/pschou/go-wormdb"
)

var index [][]byte

func main() {
	f, err := os.Open("test.db")
	if err != nil {
		log.Fatal(err)
	}
	bs := bwdb.LoadBinarySearch(index)
	db, err := bwdb.Open(f,
		bwdb.WithSearch(bs))
	if err != nil {
		log.Fatal(err)
	}

	// Note that the index must be stored out of band
	db.Get([]byte("hello world qrs"), func(rec []byte) error {
		fmt.Printf("rec: %q err: %v\n", rec, err)
		return nil
	})
	db.Close()
}

Output:

rec: "hello world qrs00000000000000000000000000000000000000000000000000000000000000000000000000000000" err: <nil>

func (*DB) Add ¶

func (d *DB) Add(rec []byte) (err error)

Add a record to a wormdb when it is in write mode.

func (*DB) Close ¶

func (d *DB) Close() error

Close the database and the file handle at the same time.

func (*DB) Finalize ¶

func (d *DB) Finalize() (err error)

Finalize the database, write any buffers to disk, and build search index.

func (DB) Get ¶

func (d DB) Get(needle []byte, handler func([]byte) error) error

Search for a record in a wormdb and call func if a match is found. Only the first matching prefix will be returned, so larger matches will be ignored.

The slice MUST be copied to a local variable as the underlying byte slice will be reused in future function calls.

func (*DB) NewWalker ¶

func (d *DB) NewWalker() *Walker

NewWalker will return all the records in a wormdb with a scanner like interface.

The slice MUST be copied to a local variable as the underlying byte slice will be reused in future function calls.

type Option ¶

type Option func(*DB)

func WithBlockSize ¶

func WithBlockSize(v int) Option

Define a custom block size, if left unset the value of 4096 is used.

func WithCache ¶

func WithCache(c Cache) Option

Include an optional cache to help with speeding up repeat calls.

func WithMerge ¶

func WithMerge(old *DB, comp CompareFunc) Option

Build from a previous wormDB and merge the records.

func WithOffset ¶

func WithOffset(v int64) Option

Offset at the beginning of the file to ignore. This must be a step size of the blocksize.

func WithSearch ¶

func WithSearch(s Search) Option

Include a call back for the search function to use. The built option uses the binary search to find records within the index.

type Result ¶

type Result struct {
	// contains filtered or unexported fields
}

type Search ¶

type Search interface {
	// When building an index, this is called upon each insertion and can be used
	// to build a more complex searching method.
	//
	// If one plans on doing complex sorting logic, it is recommended to use a
	// pre-forked-goroutine and a sized input channel to absorb the needles as
	// they are loaded so as to free up the reading method (and maintain
	// insertion order) so as to continue to write to disk additional records
	// until the next block is surpassed.  Also-- after the wormdb has been fully
	// loaded, one must make sure that this input channel has been flushed out
	// before querying the wormdb.
	Add(needle []byte) error

	// Look up a record and determine the sector on disk to read from.
	Find(needle []byte) (sectorId int, lower []byte, wasExactMatch bool)

	// Return the lower and upper bounds of a block for the given needle.
	FindBounds(needle []byte) (sectorId int, lower, upper []byte, wasExactMatch bool)

	// Called after last record has been added.
	Finalize() error
}

type Walker ¶

type Walker struct {
	// contains filtered or unexported fields
}

func (*Walker) Bytes ¶

func (w *Walker) Bytes() []byte

Bytes returns the most recent token generated by a call to Walker.Scan. The underlying array may point to data that will be overwritten by a subsequent call to Scan. It does no allocation.

func (*Walker) Err ¶

func (w *Walker) Err() error

Err returns the first non-EOF error that was encountered by the Walker.

func (*Walker) Scan ¶

func (w *Walker) Scan() bool

Scan advances the Walker to the next token, which will then be available through the Walker.Bytes or Walker.Text method. It returns false when there are no more tokens, either by reaching the end of the input or an error. After Scan returns false, the Walker.Err method will return any error that occurred during scanning, except that if it was io.EOF, Walker.Err will return nil.

func (*Walker) Text ¶

func (s *Walker) Text() string

Text returns the most recent token generated by a call to Walker.Scan as a newly allocated string holding its bytes.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL