data

package
v0.0.0-...-10c6735 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 9, 2014 License: BSD-2-Clause Imports: 6 Imported by: 0

Documentation

Overview

Collection data file contains document data. Every document has a binary header and UTF-8 text content. Documents are inserted one after another, and occupies 2x original document size to leave room for future updates. Deleted documents are marked as deleted and the space is irrecoverable until a "scrub" action (in DB logic) is carried out. When update takes place, the new document may overwrite original document if there is enough space, otherwise the original document is marked as deleted and the updated document is inserted as a new document.

Common data file features - enlarge, close, sync, close, etc.

Hash table file contains binary content; it implements a static hash table made of hash buckets and integer entries. Every bucket has a fixed number of entries. When a bucket becomes full, a new bucket is chained to it in order to store more entries. Every entry has an integer key and value. An entry key may have multiple values assigned to it, however the combination of entry key and value must be unique across the entire hash table.

(Collection) Partition is a collection data file accompanied by a hash table in order to allow addressing of a document using an unchanging ID: The hash table stores the unchanging ID as entry key and the physical document location as entry value.

Index

Constants

View Source
const (
	COL_FILE_GROWTH = 32 * 1048576 // Collection file initial size & size growth (32 MBytes)
	DOC_MAX_ROOM    = 2 * 1048576  // Max document size (2 MBytes)
	DOC_HEADER      = 1 + 10       // Document header size - validity (single byte), document room (int 10 bytes)
	// Pre-compiled document padding (128 spaces)
	PADDING     = "" /* 128-byte string literal not displayed */
	LEN_PADDING = len(PADDING)
)
View Source
const (
	HT_FILE_GROWTH  = 32 * 1048576                          // Hash table file initial size & file growth
	ENTRY_SIZE      = 1 + 10 + 10                           // Hash entry size: validity (single byte), key (int 10 bytes), value (int 10 bytes)
	BUCKET_HEADER   = 10                                    // Bucket header size: next chained bucket number (int 10 bytes)
	PER_BUCKET      = 16                                    // Entries per bucket
	HASH_BITS       = 16                                    // Number of hash key bits
	BUCKET_SIZE     = BUCKET_HEADER + PER_BUCKET*ENTRY_SIZE // Size of a bucket
	INITIAL_BUCKETS = 65536                                 // Initial number of buckets == 2 ^ HASH_BITS
)

Variables

This section is empty.

Functions

func GetPartitionRange

func GetPartitionRange(partNum, totalParts int) (start int, end int)

Divide the entire hash table into roughly equally sized partitions, and return the start/end key range of the chosen partition.

func HashKey

func HashKey(key int) int

Smear the integer entry key and return the portion (first HASH_BITS bytes) used for allocating the entry.

func LooksEmpty

func LooksEmpty(buf gommap.MMap) bool

Return true if the buffer begins with 64 consecutive zero bytes.

Types

type Collection

type Collection struct {
	*DataFile
}

Collection file contains document headers and document text data.

func OpenCollection

func OpenCollection(path string) (col *Collection, err error)

Open a collection file.

func (*Collection) Delete

func (col *Collection) Delete(id int) (err error)

Delete a document by ID.

func (*Collection) ForEachDoc

func (col *Collection) ForEachDoc(fun func(id int, doc []byte) bool)

Run the function on every document; stop when the function returns false.

func (*Collection) Insert

func (col *Collection) Insert(data []byte) (id int, err error)

Insert a new document, return the new document ID.

func (*Collection) Read

func (col *Collection) Read(id int) []byte

Find and retrieve a document by ID (physical document location). Return value is a copy of the document.

func (*Collection) Update

func (col *Collection) Update(id int, data []byte) (newID int, err error)

Overwrite or re-insert a document, return the new document ID if re-inserted.

type DataFile

type DataFile struct {
	Path               string
	Size, Used, Growth int
	Fh                 *os.File
	Buf                gommap.MMap
}

Data file keeps track of the amount of total and used space.

func OpenDataFile

func OpenDataFile(path string, growth int) (file *DataFile, err error)

Open a data file that grows by the specified size.

func (*DataFile) Clear

func (file *DataFile) Clear() (err error)

Clear the entire file and resize it to initial size.

func (*DataFile) Close

func (file *DataFile) Close() (err error)

Un-map the file buffer and close the file handle.

func (*DataFile) EnsureSize

func (file *DataFile) EnsureSize(more int) (err error)

Ensure there is enough room for that many bytes of data.

func (*DataFile) Reopen

func (file *DataFile) Reopen() (err error)

Open file handle and map the file buffer. Calculate size and used space.

func (*DataFile) Sync

func (file *DataFile) Sync() (err error)

Synchronize file buffer onto underlying storage device.

type HashTable

type HashTable struct {
	*DataFile

	Lock *sync.RWMutex
	// contains filtered or unexported fields
}

Hash table file is a binary file containing buckets of hash entries.

func OpenHashTable

func OpenHashTable(path string) (ht *HashTable, err error)

Open a hash table file.

func (*HashTable) Clear

func (ht *HashTable) Clear() (err error)

Clear the entire hash table.

func (*HashTable) Get

func (ht *HashTable) Get(key, limit int) (vals []int)

Look up values by key.

func (*HashTable) GetPartition

func (ht *HashTable) GetPartition(partNum, partSize int) (keys, vals []int)

Return all entries in the chosen partition.

func (*HashTable) Put

func (ht *HashTable) Put(key, val int)

Store the entry into a vacant (invalidated or empty) place in the appropriate bucket.

func (*HashTable) Remove

func (ht *HashTable) Remove(key, val int)

Flag an entry as invalid, so that Get will not return it later on.

type Partition

type Partition struct {
	Lock *sync.RWMutex
	// contains filtered or unexported fields
}

Partition associates a hash table with collection documents, allowing addressing of a document using an unchanging ID.

func OpenPartition

func OpenPartition(colPath, lookupPath string) (part *Partition, err error)

Open a collection partition.

func (*Partition) ApproxDocCount

func (part *Partition) ApproxDocCount() int

Return approximate number of documents in the partition.

func (*Partition) Clear

func (part *Partition) Clear() (err error)

Clear data file and lookup hash table.

func (*Partition) Close

func (part *Partition) Close() (err error)

Close all file handles.

func (*Partition) Delete

func (part *Partition) Delete(id int) (err error)

Delete a document.

func (*Partition) ForEachDoc

func (part *Partition) ForEachDoc(partNum, totalPart int, fun func(id int, doc []byte) bool) (moveOn bool)

Partition documents into roughly equally sized portions, and run the function on every document in the portion.

func (*Partition) Insert

func (part *Partition) Insert(id int, data []byte) (physID int, err error)

Insert a document. The ID may be used to retrieve/update/delete the document later on.

func (*Partition) LockUpdate

func (part *Partition) LockUpdate(id int) (err error)

Lock a document for exclusive update.

func (*Partition) Read

func (part *Partition) Read(id int) ([]byte, error)

Find and retrieve a document by ID.

func (*Partition) Sync

func (part *Partition) Sync() (err error)

Synchronize all file buffers.

func (*Partition) UnlockUpdate

func (part *Partition) UnlockUpdate(id int)

Unlock a document to make it ready for the next update.

func (*Partition) Update

func (part *Partition) Update(id int, data []byte) (err error)

Update a document.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL