ptrie

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 1, 2023 License: BSD-3-Clause Imports: 1 Imported by: 0

README

Pruning Radix Trie

GoDoc

A Go port of the Pruning Radix Trie, an augmented compressed trie that orders nodes based upon the maximum rank of the items in their subtree. This allows it to quickly search and find k elements with the highest rank. You decide what the rank is for each item.

The most obvious use case would be autocomplete: Finding the top 10 used words with a given prefix is cheap with this data structure.

Usage

To use it, add the library to your go.mod file by issuing

$ go get olympos.io/container/pruning-radix-trie

and import it like so:

import ptrie "olympos.io/container/pruning-radix-trie"

The library contains in practice two functions you use to interact with it:

  • FromItems, which is used to build an immutable pruning radix trie, and
  • PTrie.FindTopK to find the top k ranked items with a given prefix

Here's a small standalone example of how they are used:

package main

import (
	"fmt"

	ptrie "olympos.io/container/pruning-radix-trie"
)

type empty struct{}

func main() {
	trie := ptrie.FromItems([]ptrie.Item[empty]{
		{Term: "hello", Rank: 1000},
		{Term: "hi", Rank: 871},
		{Term: "howdy", Rank: 70},
		{Term: "goodbye", Rank: 918},
	})

	top := trie.FindTopK("h", 4)
	for _, res := range top {
		fmt.Printf("%s (rank %d)\n", res.Term, res.Rank)
	}
	// hello (rank 1000)
	// hi (rank 871)
	// howdy (rank 70)
}

Notes on Performance

I'm a bit confused about the performance of the original implementation. The benchmark image shows nanoseconds, yet the post talks about microseconds, and the numbers are off by 1000 in either direction. Either the benchmark image should show microseconds or the post should talk about microseconds. Given the presentation, it seems like the benchmark image should talk about microseconds and not nanoseconds.

I don't think there's much point in comparing this trie with an exhaustive search, which is the best a standard Patricia trie would be able to do. Instead, I wanted to highlight the difference between a Patricia/compressed trie and an uncompressed one:

Compressed vs uncompressed

The results aren't that surprising: The compressed trie will have far fewer empty nodes, and so scanning a trie for top-ranked items is gonna be cheaper when there are fewer nodes to visit. When the walk down the trie becomes more and more of the work, you'd prefer that to be efficient. The logic for that is simpler in the uncompressed case, and if you end up with very few nodes to evaluate after that, the uncompressed case could perform better in benchmarks.

However, this benchmark looks better than it probably should for the uncompressed case, as more or less all the nodes will be in the cache. In a real-world scenario, the compressed trie will likely fare better than the uncompressed because of cache invalidation etc.

License

Copyright © 2023 Jean Niklas L'orange and contributors

Distributed under the BSD 3-clause license, which is available in the file LICENSE.

Documentation

Overview

Package ptrie implements a Pruning Radix Trie.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Item

type Item[T any] struct {
	Value T
	Term  string
	Rank  uint
}

Item is an element store in a PTrie. If T is immutable, so is the item.

type Items

type Items[T any] []Item[T]

type PTrie

type PTrie[T any] struct {
	// contains filtered or unexported fields
}

PTrie is a Pruning Radix Trie.

func FromItems

func FromItems[T any](items []Item[T]) *PTrie[T]

FromItems returns an immutable PTrie from the given items. The item terms must be distinct.

func (*PTrie[T]) FindTopK

func (pt *PTrie[T]) FindTopK(prefix string, k int) []Item[T]

FindTopK returns the k items with the highest rank in pt with the prefix `term`, where the highest ranked item is the first element etc. There may be less than k elements in the result.

func (*PTrie[T]) FindTopKFast

func (pt *PTrie[T]) FindTopKFast(prefix string, result []Item[T]) []Item[T]

FindTopKFast works like PTrie.FindTopK, but is able to reuse an already allocated result slice for the items. The slice will be filled up to its capacity if there are enough elements that match the prefix.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL