hash

package
v0.0.0-...-a6b59fd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 5, 2016 License: MIT Imports: 0 Imported by: 0

README

Hash Table

A hash table is a data structure which maps keys to values. It uses a hash function which finds an index based on the key and stores the value in a "bucket" or "slot". In an ideal situation, each bucket will only contain one value but this is not always the case. In the instance of a collision two keys can reside in the same bucket which can ultimately degrade the hash table performance to O(n) at worst case.

When building a hash table it's important to maintain a good balance between the number of entries and number of buckets used. We can determine this "load factor" by dividing the number of entries by the number of buckets. A larger load factor points to a slower hash table.

Hash Functions

Having a good hash function is incredibly important to a hash table. Hash functions determine how to distribute values into the hash table and a poor distribution will lead to fast degradation.

A common pattern in hashing will follow the formula index = f(key, array_size).

Our function f() then could be:

hash = hashfunc(key)
index = hash % array_size

This concept allows the hash to be independent of the size of the underlying array.

It's incredibly hard to have a perfect hash function without knowledge of the input but in the case that all keys are known ahead of time it is possible to build a "perfect hash function" which has no collisions.

Collision Resolution

Collisions are almost guaranteed to happen in a hash table with an unknown set of keys. Luckily there are a number of strategies that can be implemented in order to mitigate collisions.

Separate Chaining

Separate chaining is the process of storing matching keys in a list-type data structure. In this case finding the bucket can be done in constant time and finding the correct key is dependent on the structure used within. It should be said that the structure you use shouldn't be one that expects to store a large number of items. If your bucket sizes are that large then there is something wrong with the hash function you're using and should evaluate there first.

A linked list is a common structure for storing these values. They can be used to traverse the bucket based on linked-list complexity times. While there is some overhead storing a pointer to the next node they still are a valid option for handling this chaining.

Balanced binary search trees can be another good option but the needs of the system should be thought about first. While this will give the bucket an O(lg n) lookup time it may take more memory for that guarantee.

Open Addressing

Open addressing stores all entries in the bucket array itself and inserts new entries based on a probing sequence. It then uses this same probing sequence when searching for entries later. Some of the well known probing sequences include:

  • Linear Probing: Interval between probes is fixes (usually 1)
  • Quadratic probing: Interval between probes is increased by adding the successive outputs of a quadratic polynomial to the starting value given by the original hash computation
  • Double hashing: Interval between probes is computed by a second hash function

Unfortunately this approach may require dynamic resizing since a growing load factor makes this almost unusable. On the other hand it decreases the time needed to allocate each new entry.

Dyanamic Resizing

Dynamic Resizing is the act of resizing the hash table itself as the load factor grows. In a lot of applications, a loadfactor of 2/3 is a good point to consider a resize so the hash table retains it's optimized state. Since inserts are the only time a table becomes larger the resizing would be run at this stage. Since changing the size of the underlying array will result in hashes changing, resizing also includes rehashing or rebuilding the existing hashes.

In many cases the entire table can be copied to a newly allocated array and still perform well enough. In other cases such as real-time systems though this penalty can be costly and an incremental approach needs to be taken. This approach may follow a pattern:

  • Allocate a new hash table and keep the old table
  • Inserts will be placed in the new hash table
  • Lookups will check both tables as they exist
  • After an insert, a set number of entries will be copied to the new hash table
  • Deallocate old hash table once all entries have been copied over

Use Cases

Because most cases of the hash table have such fast lookups, hash tables are used incredibly often. Some of the common use cases for a hash table may be an associative array, database indexing, caches, and sets.

Time Complexity

Case Average Worst Case
Space O(n) O(n)
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)

Documentation

Index

Constants

View Source
const (
	// MaxLoadFactor is the ratio of entries to buckets which will force a resize
	MaxLoadFactor float64 = .75
	// DefaultTableSize is set higher to avoid many resizes
	DefaultTableSize uint64 = 128
	// SizeIncrease is the resize multiple when determining the new size of an
	// updated hash table
	SizeIncrease uint64 = 2
	// FNVOffset is the constant used in the FNV hash function as seen on
	// https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
	FNVOffset uint64 = 14695981039346656037
	// FNVPrime is the constant prime factor used in the FNV hash function
	FNVPrime uint64 = 1099511628211
)

Variables

This section is empty.

Functions

This section is empty.

Types

type HashTable

type HashTable struct {
	Buckets                []*List
	NumEntries, NumBuckets uint64
}

HashTable is the primary data structure for storing our hash table. We keep implicit counts of buckets and entries with buckets being singly linked lists for separate chaining.

func NewHashTable

func NewHashTable() *HashTable

NewHashTable creates a hash table based on the default sizing constraints.

func (*HashTable) Del

func (t *HashTable) Del(key string)

Del attempts to delete a key/value pair from the hash table.

func (*HashTable) Get

func (t *HashTable) Get(key string) interface{}

Get attempts to find the value based on a key.

func (*HashTable) Hash

func (t *HashTable) Hash(s string) uint64

Hash handles hashing the key based on the FNV hash function. More can be read about it here: https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

func (*HashTable) LoadFactor

func (t *HashTable) LoadFactor() float64

LoadFactor returns the current load of the table.

func (*HashTable) Put

func (t *HashTable) Put(key string, val interface{})

Put attempts to insert a new key/value pair or update an existing key with the new value.

type List

type List struct {
	// contains filtered or unexported fields
}

List is a basic singly linked-list

func NewList

func NewList() *List

NewList returns an empty List

func (*List) Append

func (l *List) Append(key string, val interface{})

Append adds a new value to the end of the list

func (*List) GetHead

func (l *List) GetHead() *Node

GetHead returns the head of the list

func (*List) GetTail

func (l *List) GetTail() *Node

GetTail returns the tail of the list

func (*List) Iter

func (l *List) Iter() chan *Node

Iter returns a range-able iterator for easy traversal

func (*List) Remove

func (l *List) Remove(key string)

Remove removes a value based on its key from the list

type Node

type Node struct {
	// contains filtered or unexported fields
}

Node is a linked-list node

func (*Node) GetKey

func (n *Node) GetKey() string

GetKey returns the node's key

func (*Node) GetVal

func (n *Node) GetVal() interface{}

GetVal returns the node's value

func (*Node) Next

func (n *Node) Next() *Node

GetHead returns the next value from the node

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL