The highest tagged major version is v4.

leia

package module

v3.3.0 Latest Latest Go to latest Published: Nov 11, 2022 License: GPL-3.0 Imports: 19 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/nuts-foundation/go-leia

Links

Open Source Insights

README ¶

go-leia

Go Lightweight Embedded Indexed (JSON) Archive

go-leia is built upon bbolt. It adds indexed based search capabilities for JSON documents to the key-value store.

The goal is to provide a simple and fast way to find relevant JSON documents using an embedded Go key-value store.

Installing

Install Go and run go get:

$ go get github.com/nuts-foundation/go-leia

When using Go > 1.16, Go modules will probably require you to install additional dependencies.

$ go get github.com/stretchr/testify
$ go get github.com/tidwall/gjson
$ go get go.etcd.io/bbolt

Opening a database

Opening a database only requires a file location for the bbolt db.

package main

import (
	"log"
	
	"github.com/nuts-foundation/go-leia"
)

func main() {
	// Open the my.db data file in your current directory.
	// It will be created if it doesn't exist using filemode 0600 and default bbolt options.
	store, err := leia.NewStore("my.db")
	if err != nil {
		log.Fatal(err)
	}
	defer store.Close()

	...
}

Collections

Leia adds collections to bbolt. Each collection has its own bucket where documents are stored. An index is also only valid for a single collection.

To create a collection:

func main() {
    store, err := leia.NewStore("my.db")
	...
	
    // if a collection doesn't exist, it'll be created for you.
    // the underlying buckets are created when a document is added.
    collection := store.Collection("credentials")
}

Writing

Writing a document to a collection is straightforward:

func main() {
    store, err := leia.NewStore("my.db")
    collection := store.Collection("credentials")
	...
	
    // leia uses leia.Documents as arguments. Which is basically a []byte
    documents := make([]leia.Document, 1)
    documents[1] = leia.DocumentFromString("{...some json...}")
    
    // documents are added by slice
    collection.Add(documents)
}

Documents are added by slice. Each operation is done within a single bbolt transaction. BBolt is a key-value store, so you've probably noticed the key is missing as an argument. Leia computes the sha-1 of the document and uses that as key.

To get the key when needed:

func main() {
    store, err := leia.NewStore("my.db")
    collection := store.Collection("credentials")
    ...
    
    // define your document
    document := leia.DocumentFromString("{...some json...}")
    
    // retrieve a leia.Reference (also a []byte)
    reference := collection.Reference(document)
}

Documents can also be removed:

func main() {
    store, err := leia.NewStore("my.db")
    collection := store.Collection("credentials")
    ...
    
    // define your document
    document := leia.DocumentFromString("{...some json...}")
    
    // remove a document using a leia.Document
    err := collection.Delete(document)
}

Reading

A document can be retrieved by reference:

func main() {
    store, err := leia.NewStore("my.db")
    collection := store.Collection("credentials")
    ...
    
    // document by reference, it returns nil when not found
    document, err := collection.Get(reference)
}

Searching

The major benefit of leia is searching. The performance of a search greatly depends on the available indices on a collection. If no index matches the query, a bbolt cursor is used to loop over all documents in the collection.

Leia supports equal, prefix and range queries. The first argument for each matcher is the JSON path using the syntax from gjson. Only basic path syntax is used. There is no support for wildcards or comparison operators. The second argument is the value to match against. Leia can only combine query terms using AND logic.

func main() {
    ...
    
    // define a new query
    query := leia.New(leia.Eq("subject", "some_value")).
                  And(leia.Range("some.path.#.amount", 1, 100))
}

Getting results can be done with either Find or Iterate. Find will return a slice of documents. Iterate will allow you to pass a DocWalker which is called for each hit.

func main() {
    ...
    
    // get a slice of documents
    documents, err := collection.Find(query)
    
    // use a DocWalker
    walker := func(ref []byte, doc []byte) error {
    	// do something with the document
    }
    err := collection.Iterate(query, walker)
}

Indexing

Indexing JSON documents is where the real added value of leia lies. For each collection multiple indices can be added. Each added index will slow down write operations.

An index can be added and removed:

func main() {
    ...
    
    // define the index
    index := leia.NewIndex("compound",
                leia.NewFieldIndexer("subject"),
                leia.NewFieldIndexer("some.path.#.amount"),
    )
    
    // add it to the collection
    err := collection.AddIndex(index)
    
    // remove it from the collection
    err := collection.DropIndex("compound")
}

The argument for NewFieldIndexer uses the same notation as the query parameter, also without wildcards or comparison operators. Adding an index will trigger a re-index of all documents in the collection. Adding an index with a duplicate name will ignore the index.

Alias option

Leia support indexing JSON paths under an alias. An alias can be used to index different documents but use a single query to find both.

func main() {
    ...
    
    // define the index for credentialX
    indexX := leia.NewIndex("credentialX", leia.NewFieldIndexer("credentialSubject.id", leia.AliasOption{Alias: "subject"}))
    // define the index for credentialY
    indexY := leia.NewIndex("credentialY", leia.NewFieldIndexer("credentialSubject.organization.id", leia.AliasOption{Alias: "subject"}))
    
    ...

    // define a new query
    query := leia.New(leia.Eq("subject", "some_value"))
}

The example above defines two indices to a collection, each index has a different JSON path to be indexed. Both indices will be used when the given query is executed, resulting in documents that match either index.

Transform option

A transformer can be defined for a FieldIndexer. A transformer will transform the indexed value and query parameter. This can be used to allow case-insensitive search or add a soundex style index.

func main() {
    ...
    
    // This index transforms all values to lowercase
    index := leia.NewIndex("credential", leia.NewFieldIndexer("subject", leia.TransformOption{Transform: leia.ToLower}))
    
    ...

    // these queries will yield the same result
    query1 := leia.New(leia.Eq("subject", "VALUE"))
    query2 := leia.New(leia.Eq("subject", "value"))
}

Tokenizer option

Sometimes JSON fields contain a whole text. Leia has a tokenizer option to split a value at a JSON path into multiple keys to be indexed. For example, the sentence "The quick brown fox jumps over the lazy dog" could be tokenized so the document can easily be found when the term fox is used in a query. A more advanced tokenizer could also remove common words like the.

func main() {
    ...
    
    // This index transforms all values to lowercase
    index := leia.NewIndex("credential", leia.NewFieldIndexer("text", leia.TokenizerOption{Tokenizer: leia.WhiteSpaceTokenizer}))
    
    ...

    // will match {"text": "The quick brown fox jumps over the lazy dog"}
    query := leia.New(leia.Eq("subject", "fox"))
}

All options can be combined.

Documentation ¶

Index ¶

Constants
Variables
func WhiteSpaceTokenizer(text string) []string
type BoolScalar
- func (bs BoolScalar) Bytes() []byte
type Collection
type Document
type DocumentWalker
type FieldIndexer
- func NewFieldIndexer(jsonPath QueryPath, options ...IndexOption) FieldIndexer
type Float64Scalar
- func (fs Float64Scalar) Bytes() []byte
type Index
type IndexOption
- func TokenizerOption(tokenizer Tokenizer) IndexOption
- func TransformerOption(transformer Transform) IndexOption
type Key
- func ComposeKey(current Key, additional Key) Key
- func KeyOf(value interface{}) Key
- func (k Key) Split() []Key
- func (k Key) String() string
type Query
- func New(part QueryPart) Query
- func (q Query) And(part QueryPart) Query
type QueryPart
- func Eq(queryPath QueryPath, value Scalar) QueryPart
- func NotNil(queryPath QueryPath) QueryPart
- func Prefix(queryPath QueryPath, value Scalar) QueryPart
- func Range(queryPath QueryPath, begin Scalar, end Scalar) QueryPart
type QueryPath
- func NewIRIPath(IRIs ...string) QueryPath
- func NewJSONPath(path string) QueryPath
type QueryPathComparable
type Reference
- func (r Reference) ByteSize() int
- func (r Reference) EncodeToString() string
type ReferenceFunc
type ReferenceScanFn
type Scalar
- func JSONLDValueCollector(collection *collection, document Document, queryPath QueryPath) ([]Scalar, error)
- func JSONPathValueCollector(_ *collection, document Document, queryPath QueryPath) ([]Scalar, error)
- func MustParseScalar(value interface{}) Scalar
- func ParseScalar(value interface{}) (Scalar, error)
- func ToLower(scalar Scalar) Scalar
type Store
- func NewStore(dbFile string, options ...StoreOption) (Store, error)
type StoreOption
- func WithDocumentLoader(documentLoader ld.DocumentLoader) StoreOption
- func WithoutSync() StoreOption
type StringScalar
- func (ss StringScalar) Bytes() []byte
type Tokenizer
type Transform

Constants ¶

View Source

const KeyDelimiter = 0x10

Variables ¶

View Source

var ErrInvalidJSON = errors.New("invalid json")

ErrInvalidJSON is returned when invalid JSON is parsed

View Source

var ErrInvalidQuery = errors.New("invalid query type")

ErrInvalidQuery is returned when a collection is queried with the wrong type

View Source

var ErrInvalidValue = errors.New("invalid value")

ErrInvalidValue is returned when an invalid value is parsed

View Source

var ErrNoIndex = errors.New("no index found")

ErrNoIndex is returned when no index is found to query against

View Source

var ErrNoQuery = errors.New("no query given")

ErrNoQuery is returned when an empty query is given

Functions ¶

func WhiteSpaceTokenizer ¶

func WhiteSpaceTokenizer(text string) []string

WhiteSpaceTokenizer tokenizes the string based on the /\S/g regex

Types ¶

type BoolScalar ¶ added in v3.0.1

type BoolScalar bool

func (BoolScalar) Bytes ¶ added in v3.0.1

func (bs BoolScalar) Bytes() []byte

type Collection ¶

type Collection interface {
	// AddIndex to this collection. It doesn't matter if the index already exists.
	// If you want to override an index (by path) drop it first.
	AddIndex(index ...Index) error
	// DropIndex by path
	DropIndex(name string) error
	// NewIndex creates a new index from the context of this collection
	// If multiple field indexers are given, a compound index is created.
	NewIndex(name string, parts ...FieldIndexer) Index
	// Add a set of documents to this collection
	Add(jsonSet []Document) error
	// Get returns the data for the given key or nil if not found
	Get(ref Reference) (Document, error)
	// Delete a document
	Delete(doc Document) error
	// Find queries the collection for documents
	// returns ErrNoIndex when no suitable index can be found
	// returns context errors when the context has been cancelled or deadline has exceeded.
	// passing ctx prevents adding too many records to the result set.
	Find(ctx context.Context, query Query) ([]Document, error)
	// Reference uses the configured reference function to generate a reference of the function
	Reference(doc Document) Reference
	// Iterate over documents that match the given query
	Iterate(query Query, walker DocumentWalker) error
	// IndexIterate is used for iterating over indexed values. The query keys must match exactly with all the FieldIndexer.Name() of an index
	// returns ErrNoIndex when no suitable index can be found
	IndexIterate(query Query, fn ReferenceScanFn) error
	// ValuesAtPath returns a slice with the values found by the configured valueCollector
	ValuesAtPath(document Document, queryPath QueryPath) ([]Scalar, error)
	// DocumentCount returns the number of indexed documents
	DocumentCount() (int, error)
}

Collection defines a logical collection of documents and indices within a store.

type Document ¶

type Document []byte

Document represents a JSON document in []byte format

type DocumentWalker ¶

type DocumentWalker func(key Reference, value []byte) error

DocumentWalker defines a function that is used as a callback for matching documents. The key will be the document Reference (hash) and the value will be the raw document bytes

type FieldIndexer ¶

type FieldIndexer interface {
	QueryPathComparable
	// Tokenize may split up Keys and search terms. For example split a sentence into words.
	Tokenize(value Scalar) []Scalar
	// Transform is a function that alters the value to be indexed as well as any search criteria.
	// For example LowerCase is a Transform function that transforms the value to lower case.
	Transform(value Scalar) Scalar
}

FieldIndexer is the public interface that defines functions for a field index instruction. A FieldIndexer is used when a document is indexed.

func NewFieldIndexer ¶

func NewFieldIndexer(jsonPath QueryPath, options ...IndexOption) FieldIndexer

NewFieldIndexer creates a new fieldIndexer

type Float64Scalar ¶ added in v3.0.1

type Float64Scalar float64

func (Float64Scalar) Bytes ¶ added in v3.0.1

func (fs Float64Scalar) Bytes() []byte

type Index ¶

type Index interface {
	// Name returns the path of this index
	Name() string
	// Add indexes the document. It uses a sub-bucket of the given bucket.
	// It will only be indexed if the complete index matches.
	Add(bucket *bbolt.Bucket, ref Reference, doc Document) error
	// Delete document from the index
	Delete(bucket *bbolt.Bucket, ref Reference, doc Document) error
	// IsMatch determines if this index can be used for the given query. The higher the return value, the more likely it is useful.
	// return values lie between 0.0 and 1.0, where 1.0 is the most useful.
	IsMatch(query Query) float64
	// Iterate over the key/value pairs given a query. Entries that match the query are passed to the iteratorFn.
	// it will not filter out double values
	Iterate(bucket *bbolt.Bucket, query Query, fn iteratorFn) error
	// BucketName returns the bucket path for this index
	BucketName() []byte
	// QueryPartsOutsideIndex selects the queryParts that are not covered by the index.
	QueryPartsOutsideIndex(query Query) []QueryPart
	// Depth returns the number of indexed fields
	Depth() int
	// Keys returns the scalars found in the document at the location specified by the FieldIndexer
	Keys(fi FieldIndexer, document Document) ([]Scalar, error)
}

Index describes an index. An index is based on a json path and has a path. The path is used for storage but also as identifier in search options.

type IndexOption ¶

type IndexOption func(fieldIndexer *fieldIndexer)

IndexOption is the option function for adding options to a FieldIndexer

func TokenizerOption ¶

func TokenizerOption(tokenizer Tokenizer) IndexOption

TokenizerOption is the option for a FieldIndexer to split a value to be indexed into multiple parts. Each part is then indexed separately.

func TransformerOption ¶

func TransformerOption(transformer Transform) IndexOption

TransformerOption is the option for a FieldIndexer to apply transformation before indexing the value. The transformation is also applied to a query value that matches the indexed field.

type Key ¶

type Key []byte

Key is used as DB key type

func ComposeKey ¶

func ComposeKey(current Key, additional Key) Key

ComposeKey creates a new key from two keys

func KeyOf ¶

func KeyOf(value interface{}) Key

KeyOf creates a key from an interface

func (Key) Split ¶

func (k Key) Split() []Key

Split splits a compound key into parts

func (Key) String ¶

func (k Key) String() string

String returns the string representation, only useful if a Key represents readable bytes

type Query ¶

type Query struct {
	// contains filtered or unexported fields
}

Query represents a query with multiple arguments

func New ¶

func New(part QueryPart) Query

New creates a new query with an initial query part. Both begin and end are inclusive for the conditional check.

func (Query) And ¶

func (q Query) And(part QueryPart) Query

type QueryPart ¶

type QueryPart interface {
	QueryPathComparable
	// Seek returns the key for cursor.Seek
	Seek() Scalar
	// Condition returns true if given key falls within this condition.
	// The optional transform fn is applied to this query part before evaluation is done.
	Condition(key Key, transform Transform) bool
}

func Eq ¶

func Eq(queryPath QueryPath, value Scalar) QueryPart

Eq creates a query part for an exact match

func NotNil ¶ added in v3.1.0

func NotNil(queryPath QueryPath) QueryPart

NotNil creates a query part where the value must exist. This is done by finding results between byte 0x0 and 0xff

func Prefix ¶

func Prefix(queryPath QueryPath, value Scalar) QueryPart

Prefix creates a query part for a partial match The beginning of a value is matched against the query.

func Range ¶

func Range(queryPath QueryPath, begin Scalar, end Scalar) QueryPart

Range creates a query part for a range query

type QueryPath ¶

type QueryPath interface {
	Equals(other QueryPath) bool
}

QueryPath is the interface for the query path given in queries

func NewIRIPath ¶

func NewIRIPath(IRIs ...string) QueryPath

NewIRIPath creates a QueryPath of JSON-LD terms

func NewJSONPath ¶

func NewJSONPath(path string) QueryPath

NewJSONPath creates a JSON path query: "person.path" or "person.children.#.path" # is used to traverse arrays

type QueryPathComparable ¶

type QueryPathComparable interface {
	// Equals returns true if the two QueryPathComparable have the same search path.
	Equals(other QueryPathComparable) bool
	// QueryPath returns the QueryPath
	QueryPath() QueryPath
}

QueryPathComparable defines if two structs can be compared on query path.

type Reference ¶

type Reference []byte

Reference equals a document hash. In an index, the values are references to docs.

func (Reference) ByteSize ¶

func (r Reference) ByteSize() int

ByteSize returns the size of the reference, eg: 32 bytes for a sha256

func (Reference) EncodeToString ¶

func (r Reference) EncodeToString() string

EncodeToString encodes the reference as hex encoded string

type ReferenceFunc ¶

type ReferenceFunc func(doc Document) Reference

ReferenceFunc is the func type used for creating references. references are the key under which a document is stored. a ReferenceFunc could be the sha256 func or something that stores document in chronological order. The first would be best for random access, the latter for chronological access

type ReferenceScanFn ¶

type ReferenceScanFn func(key []byte, value []byte) error

ReferenceScanFn is a function type which is called with an index key and a document Reference as value

type Scalar ¶

type Scalar interface {
	// Bytes returns the byte value
	Bytes() []byte
	// contains filtered or unexported methods
}

Scalar represents a JSON or JSON-LD scalar (string, number, true or false)

func JSONLDValueCollector ¶

func JSONLDValueCollector(collection *collection, document Document, queryPath QueryPath) ([]Scalar, error)

JSONLDValueCollector collects values given a list of IRIs that represent the nesting of the objects.

func JSONPathValueCollector ¶

func JSONPathValueCollector(_ *collection, document Document, queryPath QueryPath) ([]Scalar, error)

JSONPathValueCollector collects values at a given JSON path expression. Objects are delimited by a dot and lists use an extra # in the expression: object.list.#.key

func MustParseScalar ¶

func MustParseScalar(value interface{}) Scalar

MustParseScalar returns a Scalar based on an interface value. It panics when the value is not supported.

func ParseScalar ¶

func ParseScalar(value interface{}) (Scalar, error)

ParseScalar returns a Scalar based on an interface value. It returns ErrInvalidValue for unsupported values.

func ToLower ¶

func ToLower(scalar Scalar) Scalar

ToLower transforms all Unicode letters mapped to their lower case. It only transforms objects that conform to the Stringer interface.

type Store ¶

type Store interface {
	// JSONCollection creates or returns a JSON Collection.
	// On the db level it's a bucket for the documents and 1 bucket per index.
	JSONCollection(name string) Collection
	// JSONLDCollection creates or returns a JSON-LD Collection.
	// On the db level it's a bucket for the documents and 1 bucket per index.
	JSONLDCollection(name string) Collection
	// Close the bbolt DB
	Close() error
}

Store is the main interface for storing/finding documents

func NewStore ¶

func NewStore(dbFile string, options ...StoreOption) (Store, error)

NewStore creates a new store. the noSync option disables flushing to disk, ideal for testing and bulk loading

type StoreOption ¶

type StoreOption func(store *store)

StoreOption is the function type for the Store Options

func WithDocumentLoader ¶

func WithDocumentLoader(documentLoader ld.DocumentLoader) StoreOption

WithDocumentLoader overrides the default document loader

func WithoutSync ¶

func WithoutSync() StoreOption

WithoutSync is a store option which signals the underlying bbolt db to skip syncing with disk

type StringScalar ¶ added in v3.0.1

type StringScalar string

func (StringScalar) Bytes ¶ added in v3.0.1

func (ss StringScalar) Bytes() []byte

type Tokenizer ¶

type Tokenizer func(string) []string

Tokenizer is a function definition that transforms a text into tokens

type Transform ¶

type Transform func(Scalar) Scalar

Transform is a function definition for transforming values and search terms.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples
jsonld
options
vcs

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

go-leia

Table of Contents

Installing

Opening a database

Collections

Writing

Reading

Searching

Indexing

Alias option

Transform option

Tokenizer option

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func WhiteSpaceTokenizer ¶

Types ¶

type BoolScalar ¶ added in v3.0.1

func (BoolScalar) Bytes ¶ added in v3.0.1

type Collection ¶

type Document ¶

type DocumentWalker ¶

type FieldIndexer ¶

func NewFieldIndexer ¶

type Float64Scalar ¶ added in v3.0.1

func (Float64Scalar) Bytes ¶ added in v3.0.1

type Index ¶

type IndexOption ¶

func TokenizerOption ¶

func TransformerOption ¶

type Key ¶

func ComposeKey ¶

func KeyOf ¶

func (Key) Split ¶

func (Key) String ¶

type Query ¶

func New ¶

func (Query) And ¶

type QueryPart ¶

func Eq ¶

func NotNil ¶ added in v3.1.0

func Prefix ¶

func Range ¶

type QueryPath ¶

func NewIRIPath ¶

func NewJSONPath ¶

type QueryPathComparable ¶

type Reference ¶

func (Reference) ByteSize ¶

func (Reference) EncodeToString ¶

type ReferenceFunc ¶

type ReferenceScanFn ¶

type Scalar ¶

func JSONLDValueCollector ¶

func JSONPathValueCollector ¶

func MustParseScalar ¶

func ParseScalar ¶

func ToLower ¶

type Store ¶

func NewStore ¶

type StoreOption ¶

func WithDocumentLoader ¶

func WithoutSync ¶

type StringScalar ¶ added in v3.0.1

func (StringScalar) Bytes ¶ added in v3.0.1

type Tokenizer ¶

type Transform ¶

Source Files ¶

Directories ¶