radix

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 3, 2024 License: LGPL-3.0 Imports: 11 Imported by: 0

README

GoDoc

Radix package provide radix algorithm.

https://en.wikipedia.org/wiki/Radix_tree

This implementation allow indexing key with a bit precision. Note, the wikipedia reference show a radix tree with a byte precision. It seems natural with string. This radix tree is designed to manipulate networks.

On goal of this implementation is using little amount of memory, in order to deal wiht huge trees.

  • Each tree accepts ~ formula nodes.

  • Keys are 16bit encoded, so could have 64k bits of length or 8kB.

  • Like most tree algothm, keys are stored sorted, accessing the little or greater value is very fast.

  • Lookup algortithm complexity is O(log(n)), tree depth is log(n).

  • The package provide facility to use string, uint64 and network as key.

Benchmark

go-radix is written to be fast and save memory. Below basic benchmark using CPU Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz. Using IPv4 networks as key. The benchmark code is provided in radix_test.go file with a reference dataset.

  • Index 2 649 172 entries in 1.22s, so 2.17M insert/s

  • Lookup 1 000 000 random data in 0.78s with 146 190 hit, 853 810 miss, so 1.28M lookup/s

  • Browse 2 649 172 entries in 0.47s, so 5.64M read/s

  • Delete 2 649 172 sequential data in 0.74s, so 3.58M delete/s

Nothing about memory consumation in this test beacause it hard to measure relevant data, because the garbage collector.

Example

package radix

import "net"

import "github.com/thierry-f-78/go-radix"

func Example_ipv4() {

	// Create new tree root
	r := radix.NewRadix()

	// Insert first network
	_, n1, _ := net.ParseCIDR("10.0.0.0/16") 
	r.IPv4Insert(n1, "This is the first network inserted")

	// Lookup the network
	_, n2, _ := net.ParseCIDR("10.0.0.33/32") 
	node1 := r.IPv4LookupLonguest(n2)
	if node1 != nil {
		println("network", n2.String(), "is contained in network", node1.IPv4GetNet().String())
		println("network", node1.IPv4GetNet().String(), "is associated with data", node1.Data.(string))
	}

	// Lookup too large network
	_, n3, _ := net.ParseCIDR("10.0.0.0/8") 
	node2 := r.IPv4LookupLonguest(n3)
	if node2 == nil {
		println("network", n3.String(), "has no entries in the tree")
	}
}

func Example_string() {
	// Create new tree root
	r := radix.NewRadix()

	// insert string
	r.StringInsert("home", "This is a prefix")

	// lookup word
	node1 := r.StringLookupLonguest("homemade")
	if node1 != nil {
		println("homemade has prefix", node1.StringGetKey(), "in the tree, with data", node1.Data.(string))
	}
}

func main() {
	Example_ipv4()
	Example_string()
}

Internals

The tree use two kind of nodes. Internal node type node struct were are not exported and leaf node type Node struct. These name are ambiguous, this is due to a necessary compatibility with existing programs. Node are just an interface{} associated with a node struct.

Saving memory

Because this king of tree could contains millions of entries, it is important to use the mimnimum of memory. Member structs are obviously sorted in order to respect alignment packing data without hole.

In other way, I used three means to compact memory:

1. Saving memory : use string in place of []byte

The most simpler is storing []byte with string type. The string uses 16 byte, the []byte uses 24 bytes.

2. Saving memory : use 32bit reference in place of 64bit pointers

More tricky, how use 32bit reference in place of 64bit pointers for chaining nodes. I use memory pool and 64k array of 64k array of nodes. The following explanation is not complete, but the code comment contains more information. There just an introduction.

Usage of memory pool implies while the radix tree is in use, allocated node are not garbage colected.

 type node struct {
    ...
    next uint32
 }
 const node_size = unsafe.Sizeof(node{})
 var memory_pool [][65536]node
 var memory_backref []struct {
    ptr_start uintptr
    ptr_stop uintptr
    memory_pool_index int
 }
 var memory_free uint32

memory_pool growth by block of 64k nodes. Each time there are no free node, memory pool has one more slot of 64k node.

memory_backref is dichotomic sorted list of pointer start and stop which reference the memory_pool index which contains pointer. ptr_start = &[0]node, ptr_stop = &[65535]node.

memory_free is the list of avalaible nodes.

node pointer from reference = &memory_pool[x][y], where x and y and 16bit values

node reference from pointer:

  • x = dichotomic search of memory_pool_index in memory_backref array using &node pointer reference
  • y = (&node - ptr_start) / node_size

In reality, there are two distinct pool per tree using this concept. The pool of node and the pool of leaf. the memory_backref contains also a boolean value to distinct these two kind of pool.

In reality x is not used until 64k but limited to 32k and the msb is used to differenciate the two pools.

3. Saving memory : split node types according with their kind to avoid interface{} pointer

Some node are used for internal chainaing, other nodes are used to display data. data is stored in an interface which uses 16bytes. It is garbage to let these 16bytes for each nodes. there the approximatively the same numbers of internal nodes than exposed nodes, so for 1M data indexed, 1M of internal nodes.

 type node struct {
    ... // chaining things
 }
 type Node struct {
    node node
    Data interface{}
 }

I use a property which is &Node adress = &Node.node adress. I manipulate only node to chain each data. If I need to insert a leaf, I use Node.node. When I browse the tree I need to kwnown if I encounter leaf or node. I just chack the msb of the reference.

Documentation

Overview

Radix package provide radix algorithm.

https://en.wikipedia.org/wiki/Radix_tree

Properties

• This implementation allow indexing key with a bit precision. The wikipedia reference show a radix tree with a byte precision. It seems natural with string. This radix tree is designed to manipulate networks.

• Each tree could accept ~2x10^12 nodes.

• Keys are 16bit encoded, so could have 64k bytes of length.

• Like most tree algothm, keys are stored sorted, accessing the little or greater value is very fast.

• Lookup algortithm complexity is O(log(n)), tree depth is log(n).

• The package provide facility to use string, uint64 and network as key.

Benchmark

It is written to be fast and save memory. Below basic benchmark using CPU "Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz". Using IPv4 networks as key. The benchmark code is provided in radix_test.go file with a reference dataset.

• Index 2 649 172 entries in 1.22s, so 2.17M insert/s

• Lookup 1 000 000 random data in 0.78s with 146 190 hit, 853 810 miss, so 1.28M lookup/s

• Browse 2 649 172 entries in 0.47s, so 5.64M read/s

• Delete 2 649 172 sequential data in 0.74s, so 3.58M delete/s

Nothing about memory consumation in this test beacause it hard to measure relevant data, because the garbage collector.

Internals

The tree use two kind of nodes. Internal node `type node struct` were are not exported and leaf node `type Node struct`. These name are ambiguous, this is due to a necessary compatibility with existing programs. Node are just an interface{} associated with a `node struct`.

Saving memory

Because this king of tree could contains millions of entries, it is important to use the mimnimum of memory. Member structs are obviously sorted in order to respect alignment packing data without hole.

In other way, I used three means to compact memory:

1. Saving memory : use string in place of []byte

The most simpler is storing []byte with string type. The string uses 16 byte, the []byte uses 24 bytes.

2. Saving memory : use 32bit reference in place of 64bit pointers

More tricky, how use 32bit reference in place of 64bit pointers for chaining nodes. I use memory pool and 64k array of 64k array of nodes. The following explanation is not complete, but the code comment contains more information. There just an introduction.

Usage of memory pool implies while the radix tree is in use, allocated node are not garbage colected.

type node struct {
   ...
   next uint32
}
const node_size = unsafe.Sizeof(node{})
var memory_pool [][65536]node
var memory_backref []struct {
   ptr_start uintptr
   ptr_stop uintptr
   memory_pool_index int
}
var memory_free uint32

memory_pool growth by block of 64k nodes. Each time there are no free node, memory pool has one more slot of 64k node.

memory_backref is dichotomic sorted list of pointer start and stop which reference the memory_pool index which contains pointer. ptr_start = &[0]node, ptr_stop = &[65535]node.

memory_free is the list of avalaible nodes.

node pointer from reference = &memory_pool[x][y], where x and y and 16bit values

node reference from pointer:

• x = dichotomic search of memory_pool_index in memory_backref array using &node pointer reference

• y = (&node - ptr_start) / node_size

In reality, there are two distinct pool per tree using this concept. The pool of node and the pool of leaf. the memory_backref contains also a boolean value to distinct these two kind of pool.

In reality x is not used until 64k but limited to 32k and the msb is used to differenciate the two pools.

3. Saving memory : split node types according with their kind to avoid interface{} pointer

Some node are used for internal chainaing, other nodes are used to display data. data is stored in an interface which uses 16bytes. It is garbage to let these 16bytes for each nodes. there the approximatively the same numbers of internal nodes than exposed nodes, so for 1M data indexed, 1M of internal nodes.

type node struct {
   ... // chaining things
}
type Node struct {
   node node
   Data interface{}
}

I use a property which is &Node adress = &Node.node adress. I manipulate only node to chain each data. If I need to insert a leaf, I use Node.node. When I browse the tree I need to kwnown if I encounter leaf or node. I just chack the msb of the reference.

Example (Ipv4)
// Create new tree root
r := NewRadix()

// Insert first network
_, n1, _ := net.ParseCIDR("10.0.0.0/16")
r.IPv4Insert(n1, "This is the first network inserted")

// Lookup the network
_, n2, _ := net.ParseCIDR("10.0.0.33/32")
node1 := r.IPv4LookupLonguest(n2)
if node1 != nil {
	println("network", n2.String(), "is contained in network", node1.IPv4GetNet().String())
	println("network", node1.IPv4GetNet().String(), "is associated with data", node1.Data.(string))
}

// Lookup too large network
_, n3, _ := net.ParseCIDR("10.0.0.0/8")
node2 := r.IPv4LookupLonguest(n3)
if node2 == nil {
	println("network", n3.String(), "has no entries in the tree")
}
Output:

Example (String)
// Create new tree root
r := NewRadix()

// insert string
r.StringInsert("home", "This is a prefix")

// lookup word
node1 := r.StringLookupLonguest("homemade")
if node1 != nil {
	println("homemade has prefix", node1.StringGetKey(), "in the tree, with data", node1.Data.(string))
}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Equal

func Equal(n1 *Node, n2 *Node) bool

Equal return true if nodes are equal. Node are equal if there are the same prefix length and bytes.

Types

type Counters

type Counters struct {
	Length int           `json:"length"` // Number of leaf used in the tree
	Node   Node_counters `json:"node"`   // Counters relative to nodes
	Leaf   Node_counters `json:"leaf"`   // counters relative to leaf.
}

Node_counters describe tree counters

type Iter

type Iter struct {
	// contains filtered or unexported fields
}

Iter is a struct for managing iteration

func (*Iter) Get

func (i *Iter) Get() *Node

Get return the node. Many calls on this function return the same value.

func (*Iter) Next

func (i *Iter) Next() bool

Next return true if there next node avalaible. This function also perform lookup for the next node.

type Node

type Node struct {
	Data interface{} // Contains interface matching the node
	// contains filtered or unexported fields
}

Node is a struct which describe leaf of the tree.

func (*Node) IPv4GetNet

func (n *Node) IPv4GetNet() *net.IPNet

IPv4GetNet convert node key/length prefix to IPv4 network data

func (*Node) StringGetKey

func (n *Node) StringGetKey() string

StringGetKey convert node key/length prefix to string

func (*Node) TimeGetValue added in v1.0.1

func (n *Node) TimeGetValue() time.Time

TimeGetValue convert node key/length prefix to time.Time data. Note the tree precision is microsecond

func (*Node) UInt64GetValue

func (n *Node) UInt64GetValue() uint64

UInt64GetValue convert node key/length prefix to uint64 data

type Node_counters

type Node_counters struct {
	Capacity int `json:"capacity"` // the total nodes/leaf capacity
	Free     int `json:"free"`     // the number of free nodes/leaf
	Size     int `json:"size"`     // the size of a node/leaf in bytes
}

Node_counters describe tree node/leaf counters

type Radix

type Radix struct {
	Node uint32
	// contains filtered or unexported fields
}

Radix is the struct which contains the tree root.

func NewRadix

func NewRadix() *Radix

NewRadix return initialized *Radix tree.

func (*Radix) Counters

func (r *Radix) Counters() *Counters

Counters return counters useful to monitor the radix tree usage.

func (*Radix) Debug

func (r *Radix) Debug(fh io.Writer)

func (*Radix) DebugStdout

func (r *Radix) DebugStdout()

func (*Radix) Delete

func (r *Radix) Delete(n *Node)

Delete remove Node from the tree.

func (*Radix) First

func (r *Radix) First() *Node

First return first node of the tree. Return nil if the tree is empty.

func (*Radix) Get

func (r *Radix) Get(data *[]byte, length int16) *Node

Get gets a key/length prefix and return exact match of the prefix. Exact match is a node wich match the prefix bit and the length.

func (*Radix) IPv4DeleteNetwork

func (r *Radix) IPv4DeleteNetwork(network *net.IPNet)

IPv4DeleteNetwork lookup network and remove it. does nothing if the network not exists.

func (*Radix) IPv4Get

func (r *Radix) IPv4Get(network *net.IPNet) *Node

IPv4Get gets a ipv4 network and return exact match of the prefix. Exact match is a node wich match the prefix bit and the length.

func (*Radix) IPv4Insert

func (r *Radix) IPv4Insert(network *net.IPNet, data interface{}) (*Node, bool)

IPv4Insert ipv4 network in the tree. The tree accept only unique value, if the prefix already exists in the tree, return existing leaf, otherwaise return nil.

func (*Radix) IPv4LookupLonguest

func (r *Radix) IPv4LookupLonguest(network *net.IPNet) *Node

IPv4LookupLonguest get a ipv4 network and return the leaf which match the longest part of the prefix. Return nil if none match.

func (*Radix) IPv4LookupLonguestPath

func (r *Radix) IPv4LookupLonguestPath(network *net.IPNet) []*Node

IPv4LookupLonguestPath take the radix tree and a ipv4 network, return the list of all leaf matching the prefix. If none match, return nil

func (*Radix) IPv4NewIter

func (r *Radix) IPv4NewIter(network *net.IPNet) *Iter

IPv4NewIter return struct Iter for browsing all nodes there children match the ipv4 network

func (*Radix) Insert

func (r *Radix) Insert(key *[]byte, length int16, data interface{}) (*Node, bool)

Insert key/length prefix in the tree. The tree accept only unique value, if the prefix already exists in the tree, return existing leaf, otherwaise return nil.

func (*Radix) Last

func (r *Radix) Last() *Node

Last return the last node of the tree. Return nil if the tree is empty.

func (*Radix) Len

func (r *Radix) Len() int

Len return the number of leaf in the tree.

func (*Radix) LookupGe added in v1.0.1

func (r *Radix) LookupGe(data *[]byte, length int16) *Node

In the case of any entry match, LookupGe return the greater or equal closest value of the key. This is usefull for constant prefix keys like keys derivated from uint64 or time. If the tree is empty or greater value not exists, return nil

func (*Radix) LookupLe added in v1.0.1

func (r *Radix) LookupLe(data *[]byte, length int16) *Node

In the case of any entry match, LookupLe return the lesser or equal closest value of the key. This is usefull for constant prefix keys like keys derivated from uint64 or time. If the tree is empty or lesser value not exists, return nil

func (*Radix) LookupLonguest

func (r *Radix) LookupLonguest(data *[]byte, length int16) *Node

LookupLonguest get a key/length prefix and return the leaf which match the longest part of the prefix. Return nil if none match.

func (*Radix) LookupLonguestPath

func (r *Radix) LookupLonguestPath(data *[]byte, length int16) []*Node

LookupLonguestPath take the radix tree and a key/length prefix, return the list of all leaf matching the prefix. If none match, return nil

func (*Radix) NewIter

func (r *Radix) NewIter(key *[]byte, length int16) *Iter

NewIter return struct Iter for browsing all nodes there children match the given key/length prefix.

func (*Radix) Next

func (r *Radix) Next(n *Node) *Node

Next return next Node in browsing order. Return nil if we reach end of tree.

func (*Radix) Prev added in v1.0.1

func (r *Radix) Prev(n *Node) *Node

Next return next Node in browsing order. Return nil if we reach end of tree.

func (*Radix) StringDelete

func (r *Radix) StringDelete(str string)

StringDelete lookup string and remove it. does nothing if the string not exists.

func (*Radix) StringGet

func (r *Radix) StringGet(str string) *Node

Get gets a string as prefix and return exact match of the prefix. Exact match is a node wich match the prefix bit and the length.

func (*Radix) StringInsert

func (r *Radix) StringInsert(str string, data interface{}) (*Node, bool)

StringInsert string as prefix in the tree. The tree accept only unique value, if the prefix already exists in the tree, return existing leaf, otherwaise return nil.

func (*Radix) StringLookupLonguest

func (r *Radix) StringLookupLonguest(str string) *Node

StringLookupLonguest get a string as prefix and return the leaf which match the longest part of the prefix. Return nil if none match.

func (*Radix) StringLookupLonguestPath

func (r *Radix) StringLookupLonguestPath(str string) []*Node

StringLookupLonguestPath take the radix tree and a string as prefix, return the list of all leaf matching the prefix. If none match, return nil

func (*Radix) StringNewIter

func (r *Radix) StringNewIter(str string) *Iter

StringNewIter return struct Iter for browsing all nodes there children match the string prefix.

func (*Radix) TimeDelete added in v1.0.1

func (r *Radix) TimeDelete(value time.Time)

TimeDelete lookup time.Time and remove it. does nothing if the network not exists. Note the tree precision is microsecond

func (*Radix) TimeGet added in v1.0.1

func (r *Radix) TimeGet(value time.Time) *Node

TimeGet gets a time.Time prefix and return exact match of the prefix. Exact match is a node wich match the prefix bit and the length. Note the tree precision is microsecond

func (*Radix) TimeInsert added in v1.0.1

func (r *Radix) TimeInsert(value time.Time, data interface{}) (*Node, bool)

TimeInsert time.Time prefix in the tree. The tree accept only unique value, if the prefix already exists in the tree, return existing leaf, otherwise return nil. Note the tree precision is microsecond

func (*Radix) TimeLookupAfterEq added in v1.0.1

func (r *Radix) TimeLookupAfterEq(value time.Time) *Node

In the case of any entry match, TimeLookupAfterEq return the time after or equal closest value of the time key. If the tree is empty or after value not exists, return nil

func (*Radix) TimeLookupBeforeEq added in v1.0.1

func (r *Radix) TimeLookupBeforeEq(value time.Time) *Node

In the case of any entry match, TimeLookupBeforeEq return the time before or equal closest value of the time key. If the tree is empty or before value not exists, return nil

func (*Radix) TimeNewIter added in v1.0.1

func (r *Radix) TimeNewIter(value time.Time) *Iter

UInt64NewIter return struct Iter for browsing all nodes there children match the key/length prefix. Note the tree precision is microsecond

func (*Radix) UInt64Delete

func (r *Radix) UInt64Delete(value uint64)

UInt64Delete lookup uint64 and remove it. does nothing if the network not exists.

func (*Radix) UInt64Get

func (r *Radix) UInt64Get(value uint64) *Node

UInt64Get gets a uint64 prefix and return exact match of the prefix. Exact match is a node wich match the prefix bit and the length.

func (*Radix) UInt64Insert

func (r *Radix) UInt64Insert(value uint64, data interface{}) (*Node, bool)

UInt64Insert uint64 prefix in the tree. The tree accept only unique value, if the prefix already exists in the tree, return existing leaf, otherwaise return nil.

func (*Radix) UInt64LookupGe added in v1.0.1

func (r *Radix) UInt64LookupGe(value uint64) *Node

In the case of any entry match, UInt64LookupGe return the greater or equal closest value of the key. If the tree is empty or greater value not exists, return nil

func (*Radix) UInt64LookupLe added in v1.0.1

func (r *Radix) UInt64LookupLe(value uint64) *Node

In the case of any entry match, UInt64LookupGe return the lesser or equal closest value of the key. If the tree is empty or lesser value not exists, return nil

func (*Radix) UInt64NewIter

func (r *Radix) UInt64NewIter(value uint64) *Iter

UInt64NewIter return struct Iter for browsing all nodes there children match the key/length prefix.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL