indexer

package module
v0.2.12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2022 License: Apache-2.0, MIT Imports: 4 Imported by: 0

README

go-indexer-core

Go Reference Coverage Status

Storage specialized for indexing provider content

The indexer-core is a key-value store that is optimized for storing large numbers of multihashes mapping to relatively few provider data objects. A multihash (CID without codec) uniquely identifies a piece of content, and the provider data describes where and how to retrieve the content.

Content is indexed by giving a provider data object (the value) and a set of multihashes (keys) that map to that value. Typically, the provider value represents a storage deal and the multihash keys are content stored within that deal. To subsequently retrieve a provider value, the indexer-core is given a multihash key to lookup.

Provider data can be updated and removed independently from the multihashes that map to it. Provider data is uniquely identified by a provider ID and a context ID. The context ID is given to the indexer core as part of the provider data value. When a provider data object is updated, all subsequent multihash queries will return the new value for that data whether the queries are cached or not.

This indexer-core is the component of an indexer that provides data storage and retrieval for content index data. An indexer must also supply all the service functionality necessary to create an indexing service, which is not included in the indexer-core component.

Configurable Cache

An integrated cache is included to aid in fast index lookups. By default the cache is configured as a retrieval cache, meaning that items are only stored in the cache when index data is looked up, and will speed up repeated lookups of the same data. The cache can be optionally disabled, and its size is configurable. The cache interface allows alternative cache implementations to be used if desired.

See Usage Example for details.

Choice of Persistent Storage

The persistent storage is provided by a choice of storage systems that include storethehash, pogrep, and an in-memory implementation. The storage interface allows any other storage system solution to be adapted.

See Usage Example for details.

Install

 go get github.com/filecoin-project/go-indexer-core

Usage

import "github.com/filecoin-project/go-indexer-core"

See pkg.go.dev documentation

Example
package main

import (
	"log"
	"os"

	"github.com/filecoin-project/go-indexer-core"
	"github.com/filecoin-project/go-indexer-core/cache"
	"github.com/filecoin-project/go-indexer-core/cache/radixcache"
	"github.com/filecoin-project/go-indexer-core/engine"
	"github.com/filecoin-project/go-indexer-core/store/pogreb"
	"github.com/filecoin-project/go-indexer-core/store/storethehash"
	"github.com/ipfs/go-cid"
	"github.com/libp2p/go-libp2p-core/peer"
)

func main() {
	// Configuration values.
	const valueStoreDir = "/tmp/indexvaluestore"
	const storeType = "sth"
	const cacheSize = 65536

	// Create value store of configured type.
	os.Mkdir(valueStoreDir, 0770)
	var valueStore indexer.Interface
	var err error
	if storeType == "sth" {
		valueStore, err = storethehash.New(valueStoreDir)
	}
	if storeType == "prgreb" {
		valueStore, err = pogreb.New(valueStoreDir)
	}
	if err != nil {
		log.Fatal(err)
	}

	// Create result cache, or disabled it.
	var resultCache cache.Interface
	if cacheSize > 0 {
		resultCache = radixcache.New(cacheSize)
	} else {
		log.Print("Result cache disabled")
	}

	// Create indexer core.
	indexerCore := engine.New(resultCache, valueStore)

	// Put some index data into indexer core.
	cid1, _ := cid.Decode("QmPNHBy5h7f19yJDt7ip9TvmMRbqmYsa6aetkrsc1ghjLB")
	cid2, _ := cid.Decode("QmUaPc2U1nUJeVj6HxBxS5fGxTWAmpvzwnhB8kavMVAotE")
	peerID, _ := peer.Decode("12D3KooWKRyzVWW6ChFjQjK4miCty85Niy48tpPV95XdKu1BcvMA")
	ctxID := []byte("someCtxID")
	value := indexer.Value{
		ProviderID:    peerID,
		ContextID:     ctxID,
		MetadataBytes: []byte("someMetadata"),
	}
	err = indexerCore.Put(value, cid1.Hash(), cid2.Hash())
	if err != nil {
		log.Fatal(err)
	}

	// Lookup provider data by multihash.
	values, found, err := indexerCore.Get(cid1.Hash())
	if err != nil {
		log.Fatal(err)
	}
	if found {
		log.Printf("Found %d values for cid1", len(values))
	}
	
	// Remove provider values by contextID, and multihashes that map to them.
	err = indexerCore.RemoveProviderContext(peerID, ctxID)
	if err != nil {
		log.Fatal(err)                                                                                                                   
	}
}

License

This project is dual-licensed under Apache 2.0 and MIT terms:

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func MarshalValue

func MarshalValue(value Value) ([]byte, error)

MarshalValue serializes a single value

func MarshalValueKeys added in v0.2.6

func MarshalValueKeys(valKeys [][]byte) ([]byte, error)

MarshalValues serializes a Value list for storage.

TODO: Switch from JSON to a more efficient serialization format once we figure out the right data structure?

func UnmarshalValueKeys added in v0.2.6

func UnmarshalValueKeys(b []byte) ([][]byte, error)

Unmarshal serialized value keys list

Types

type Interface

type Interface interface {
	// Get retrieves a slice of Value for a multihash.
	Get(multihash.Multihash) ([]Value, bool, error)

	// Put stores a Value and adds a mapping from each of the given multihashs
	// to that Value.  If the Value has the same ProviderID and ContextID as a
	// previously stored Value, then update the metadata in the stored Value
	// with the metadata from the provided Value.  Call Put without any
	// multihashes to only update existing values.
	Put(Value, ...multihash.Multihash) error

	// Remove removes the mapping of each multihash to the specified value.
	Remove(Value, ...multihash.Multihash) error

	// RemoveProvider removes all values for specified provider.  This is used
	// when a provider is no longer indexed by the indexer.
	RemoveProvider(peer.ID) error

	// RemoveProviderContext removes all values for specified provider that
	// have the specified contextID.  This is used when a provider no longer
	// provides values for a particular context.
	RemoveProviderContext(providerID peer.ID, contextID []byte) error

	// Size returns the total bytes of storage used to store the indexed
	// content in persistent storage.  This does not include memory used by any
	// in-memory cache that the indexer implementation may have, as that would
	// only contain a limited quantity of data and not represent the total
	// amount of data stored by the indexer.
	Size() (int64, error)

	// Flush commits any changes to the value storage,
	Flush() error

	// Close gracefully closes the store flushing all pending data from memory,
	Close() error

	// Iter creates a new value store iterator.
	Iter() (Iterator, error)
}

type Iterator

type Iterator interface {
	// Next returns the next multihash and the value it indexer.  Returns
	// io.EOF when finished iterating.
	Next() (multihash.Multihash, []Value, error)
}

Iterator iterates multihashes and values in the value store.

It should be assumed that any write operation invalidates the iterator,

type Value

type Value struct {
	// PrividerID is the peer ID of the provider of the multihash.
	ProviderID peer.ID `json:"p"`
	// ContextID identifies the metadata that is part of this value.
	ContextID []byte `json:"c"`
	// MetadataBytes is serialized metadata.  The is kept serialized, because
	// the indexer only uses the serialized form of this data.
	MetadataBytes []byte `json:"m,omitempty"`
}

Value is the value of an index entry that is stored for each multihash in the indexer.

func UnmarshalValue

func UnmarshalValue(b []byte) (Value, error)

func (Value) Equal

func (v Value) Equal(other Value) bool

Equal returns true if two Value instances are identical

func (Value) Match

func (v Value) Match(other Value) bool

Match return true if both values have the same ProviderID and ContextID.

func (Value) MatchEqual

func (v Value) MatchEqual(other Value) (isMatch bool, isEqual bool)

MatchEqual returns true for the first bool if both values have the same ProviderID and ContextID, and returns true for the second value if the metadata is also equal.

Directories

Path Synopsis
store
memory
Package memory defines an in-memory value store The index data stored by the memory value store is not persisted.
Package memory defines an in-memory value store The index data stored by the memory value store is not persisted.
test
Package test provides tests and benchmarks that are usable by any store that implements store.Interface.
Package test provides tests and benchmarks that are usable by any store that implements store.Interface.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL