faststringmap

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 22, 2021 License: MIT Imports: 1 Imported by: 0

README

faststringmap

faststringmap is a fast read-only string keyed map for Go (golang). For our use case it is approximately 5 times faster than using Go's built-in map type with a string key. It also has the following advantages:

  • look up strings and byte slices without use of the unsafe package
  • minimal impact on GC due to lack of pointers in the data structure
  • data structure can be trivially serialized to disk or network

The code provided implements a map from string to uint32 which fits our use case, but you can easily substitute other value types.

faststringmap is a variant of a data structure called a Trie. At each level we use a slice to hold the next possible byte values. This slice is of length one plus the difference between the lowest and highest possible next bytes of strings in the map. Not all the entries in the slice are valid next bytes. faststringmap is thus more space efficient for keys using a small set of nearby runes, for example those using a lot of digits.

Example

Example usage can be found in uint32_store_example_test.go.

Motivation

I created faststringmap in order to improve the speed of parsing CSV where the fields were category codes from survey data. The majority of these were numeric ("1", "2", "3"...) plus a distinct code for "not applicable". I was struck that in the simplest possible cases (e.g. "1" ... "5") the map should be a single slice lookup.

Our fast CSV parser provides fields as byte slices into the read buffer to avoid creating string objects. So I also wanted to facilitate key lookup from a []byte rather than a string. This is not possible using a built-in Go map without use of the unsafe package.

Benchmarks

Example benchmarks from my laptop:

cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
BenchmarkUint32Store
BenchmarkUint32Store-8        	  218463	      4959 ns/op
BenchmarkGoStringToUint32
BenchmarkGoStringToUint32-8   	   49279	     24483 ns/op

Improvements

You can improve the performance further by using a slice for the next fields. This avoids a bounds check when looking up the entry for a byte. However, it comes at the cost of easy serialization and introduces a lot of pointers which will have impact on GC. It is not possible to directly construct the slice version in the same way so that the whole store is one block of memory. Either create as in this code and then derive the slice version or create distinct slice objects at each level.

Documentation

Overview

Example
package main

import (
	"fmt"
	"sort"
	"strings"

	"github.com/sensiblecodeio/faststringmap"
)

func main() {
	m := exampleSource{
		"key1": 42,
		"key2": 27644437,
		"l":    2,
	}

	fm := faststringmap.NewUint32Store(m)

	// add an entry that is not in the fast map
	m["m"] = 4

	// sort the keys so output is the same for each test run
	keys := make([]string, 0, len(m))
	for k := range m {
		keys = append(keys, k)
	}
	sort.Strings(keys)

	// lookup every key in the fast map and print the corresponding value
	for _, k := range keys {
		v, ok := fm.LookupString(k)
		fmt.Printf("%q: %d, %v\n", k, v, ok)
	}

	// Dump out the store to aid in understanding the implementation
	fmt.Println()
	dump := fmt.Sprintf("%+v", fm)
	dump = strings.ReplaceAll(dump, "}", "}\n")
	dump = strings.ReplaceAll(dump, "[", "[\n ")
	fmt.Println(dump)

}

type exampleSource map[string]uint32

func (s exampleSource) AppendKeys(a []string) []string {
	for k := range s {
		a = append(a, k)
	}
	return a
}

func (s exampleSource) Get(k string) uint32 {
	return s[k]
}
Output:


"key1": 42, true
"key2": 27644437, true
"l": 2, true
"m": 0, false

{store:[
 {nextLo:1 nextLen:2 nextOffset:107 valid:false value:0}
 {nextLo:3 nextLen:1 nextOffset:101 valid:false value:0}
 {nextLo:0 nextLen:0 nextOffset:0 valid:true value:2}
 {nextLo:4 nextLen:1 nextOffset:121 valid:false value:0}
 {nextLo:5 nextLen:2 nextOffset:49 valid:false value:0}
 {nextLo:0 nextLen:0 nextOffset:0 valid:true value:42}
 {nextLo:0 nextLen:0 nextOffset:0 valid:true value:27644437}
]}

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Uint32Source

type Uint32Source interface {
	// AppendKeys should append the keys of the maps to the supplied slice and return the resulting slice
	AppendKeys([]string) []string
	// Get should return the value for the supplied key
	Get(string) uint32
}

Uint32Source is for supplying data to initialise Uint32Store

type Uint32Store

type Uint32Store struct {
	// contains filtered or unexported fields
}

Uint32Store is a fast read only map from string to uint32 Lookups are about 5x faster than the built-in Go map type

func NewUint32Store

func NewUint32Store(srcMap Uint32Source) Uint32Store

NewUint32Store creates from the data supplied in srcMap

func (*Uint32Store) LookupBytes

func (m *Uint32Store) LookupBytes(s []byte) (uint32, bool)

LookupBytes looks up the supplied byte slice in the map

func (*Uint32Store) LookupString

func (m *Uint32Store) LookupString(s string) (uint32, bool)

LookupString looks up the supplied string in the map

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL