data

package

v0.0.0-...-10c6735 Latest Latest Go to latest Published: Aug 9, 2014 License: BSD-2-Clause Imports: 6 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/tgregory/tiedot

Links

Open Source Insights

Documentation ¶

Overview ¶

Collection data file contains document data. Every document has a binary header and UTF-8 text content. Documents are inserted one after another, and occupies 2x original document size to leave room for future updates. Deleted documents are marked as deleted and the space is irrecoverable until a "scrub" action (in DB logic) is carried out. When update takes place, the new document may overwrite original document if there is enough space, otherwise the original document is marked as deleted and the updated document is inserted as a new document.

Common data file features - enlarge, close, sync, close, etc.

Hash table file contains binary content; it implements a static hash table made of hash buckets and integer entries. Every bucket has a fixed number of entries. When a bucket becomes full, a new bucket is chained to it in order to store more entries. Every entry has an integer key and value. An entry key may have multiple values assigned to it, however the combination of entry key and value must be unique across the entire hash table.

(Collection) Partition is a collection data file accompanied by a hash table in order to allow addressing of a document using an unchanging ID: The hash table stores the unchanging ID as entry key and the physical document location as entry value.

Index ¶

Constants
func GetPartitionRange(partNum, totalParts int) (start int, end int)
func HashKey(key int) int
func LooksEmpty(buf gommap.MMap) bool
type Collection
- func OpenCollection(path string) (col *Collection, err error)
type DataFile
- func OpenDataFile(path string, growth int) (file *DataFile, err error)
type HashTable
- func OpenHashTable(path string) (ht *HashTable, err error)
type Partition
- func OpenPartition(colPath, lookupPath string) (part *Partition, err error)

Constants ¶

View Source

const (
	COL_FILE_GROWTH = 32 * 1048576 // Collection file initial size & size growth (32 MBytes)
	DOC_MAX_ROOM    = 2 * 1048576  // Max document size (2 MBytes)
	DOC_HEADER      = 1 + 10       // Document header size - validity (single byte), document room (int 10 bytes)
	// Pre-compiled document padding (128 spaces)
	PADDING     = "" /* 128-byte string literal not displayed */
	LEN_PADDING = len(PADDING)
)

View Source

const (
	HT_FILE_GROWTH  = 32 * 1048576                          // Hash table file initial size & file growth
	ENTRY_SIZE      = 1 + 10 + 10                           // Hash entry size: validity (single byte), key (int 10 bytes), value (int 10 bytes)
	BUCKET_HEADER   = 10                                    // Bucket header size: next chained bucket number (int 10 bytes)
	PER_BUCKET      = 16                                    // Entries per bucket
	HASH_BITS       = 16                                    // Number of hash key bits
	BUCKET_SIZE     = BUCKET_HEADER + PER_BUCKET*ENTRY_SIZE // Size of a bucket
	INITIAL_BUCKETS = 65536                                 // Initial number of buckets == 2 ^ HASH_BITS
)

Variables ¶

This section is empty.

Functions ¶

func GetPartitionRange ¶

func GetPartitionRange(partNum, totalParts int) (start int, end int)

Divide the entire hash table into roughly equally sized partitions, and return the start/end key range of the chosen partition.

func HashKey ¶

func HashKey(key int) int

Smear the integer entry key and return the portion (first HASH_BITS bytes) used for allocating the entry.

func LooksEmpty ¶

func LooksEmpty(buf gommap.MMap) bool

Return true if the buffer begins with 64 consecutive zero bytes.

Types ¶

type Collection ¶

type Collection struct {
	*DataFile
}

Collection file contains document headers and document text data.

func OpenCollection ¶

func OpenCollection(path string) (col *Collection, err error)

Open a collection file.

func (*Collection) Delete ¶

func (col *Collection) Delete(id int) (err error)

Delete a document by ID.

func (*Collection) ForEachDoc ¶

func (col *Collection) ForEachDoc(fun func(id int, doc []byte) bool)

Run the function on every document; stop when the function returns false.

func (*Collection) Insert ¶

func (col *Collection) Insert(data []byte) (id int, err error)

Insert a new document, return the new document ID.

func (*Collection) Read ¶

func (col *Collection) Read(id int) []byte

Find and retrieve a document by ID (physical document location). Return value is a copy of the document.

func (*Collection) Update ¶

func (col *Collection) Update(id int, data []byte) (newID int, err error)

Overwrite or re-insert a document, return the new document ID if re-inserted.

type DataFile ¶

type DataFile struct {
	Path               string
	Size, Used, Growth int
	Fh                 *os.File
	Buf                gommap.MMap
}

Data file keeps track of the amount of total and used space.

func OpenDataFile ¶

func OpenDataFile(path string, growth int) (file *DataFile, err error)

Open a data file that grows by the specified size.

func (*DataFile) Clear ¶

func (file *DataFile) Clear() (err error)

Clear the entire file and resize it to initial size.

func (*DataFile) Close ¶

func (file *DataFile) Close() (err error)

Un-map the file buffer and close the file handle.

func (*DataFile) EnsureSize ¶

func (file *DataFile) EnsureSize(more int) (err error)

Ensure there is enough room for that many bytes of data.

func (*DataFile) Reopen ¶

func (file *DataFile) Reopen() (err error)

Open file handle and map the file buffer. Calculate size and used space.

func (*DataFile) Sync ¶

func (file *DataFile) Sync() (err error)

Synchronize file buffer onto underlying storage device.

type HashTable ¶

type HashTable struct {
	*DataFile

	Lock *sync.RWMutex
	// contains filtered or unexported fields
}

Hash table file is a binary file containing buckets of hash entries.

func OpenHashTable ¶

func OpenHashTable(path string) (ht *HashTable, err error)

Open a hash table file.

func (*HashTable) Clear ¶

func (ht *HashTable) Clear() (err error)

Clear the entire hash table.

func (*HashTable) Get ¶

func (ht *HashTable) Get(key, limit int) (vals []int)

Look up values by key.

func (*HashTable) GetPartition ¶

func (ht *HashTable) GetPartition(partNum, partSize int) (keys, vals []int)

Return all entries in the chosen partition.

func (*HashTable) Put ¶

func (ht *HashTable) Put(key, val int)

Store the entry into a vacant (invalidated or empty) place in the appropriate bucket.

func (*HashTable) Remove ¶

func (ht *HashTable) Remove(key, val int)

Flag an entry as invalid, so that Get will not return it later on.

type Partition ¶

type Partition struct {
	Lock *sync.RWMutex
	// contains filtered or unexported fields
}

Partition associates a hash table with collection documents, allowing addressing of a document using an unchanging ID.

func OpenPartition ¶

func OpenPartition(colPath, lookupPath string) (part *Partition, err error)

Open a collection partition.

func (*Partition) ApproxDocCount ¶

func (part *Partition) ApproxDocCount() int

Return approximate number of documents in the partition.

func (*Partition) Clear ¶

func (part *Partition) Clear() (err error)

Clear data file and lookup hash table.

func (*Partition) Close ¶

func (part *Partition) Close() (err error)

Close all file handles.

func (*Partition) Delete ¶

func (part *Partition) Delete(id int) (err error)

Delete a document.

func (*Partition) ForEachDoc ¶

func (part *Partition) ForEachDoc(partNum, totalPart int, fun func(id int, doc []byte) bool) (moveOn bool)

Partition documents into roughly equally sized portions, and run the function on every document in the portion.

func (*Partition) Insert ¶

func (part *Partition) Insert(id int, data []byte) (physID int, err error)

Insert a document. The ID may be used to retrieve/update/delete the document later on.

func (*Partition) LockUpdate ¶

func (part *Partition) LockUpdate(id int) (err error)

Lock a document for exclusive update.

func (*Partition) Read ¶

func (part *Partition) Read(id int) ([]byte, error)

Find and retrieve a document by ID.

func (*Partition) Sync ¶

func (part *Partition) Sync() (err error)

Synchronize all file buffers.

func (*Partition) UnlockUpdate ¶

func (part *Partition) UnlockUpdate(id int)

Unlock a document to make it ready for the next update.

func (*Partition) Update ¶

func (part *Partition) Update(id int, data []byte) (err error)

Update a document.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL