datalake

package
v0.9.28 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2022 License: Apache-2.0 Imports: 18 Imported by: 6

README

Data Lake

The datalake package is responsible for storing and retrieving raw data. By adding a layer of abstraction on top of various storage providers, the package enables you to access different storages in the exact same manner, as well as easily switch between them.

The package supports the following storage providers:

  • File system
  • Amazon S3
  • Redis

Usage

Let's assume we are in the process of indexing the Oasis blockchain, we fetched a slice of validators from a node and we want to store that data in an S3 bucket.

Setting up a data lake

Before we can configure a data lake, we need to initialize the storage provider of our choice. As we want to store data in an S3 bucket, we need to pass a region code and the bucket name to the NewS3Storage function.

storage := datalake.NewS3Storage(
  os.Getenv("AWS_S3_REGION"),
  os.Getenv("AWS_S3_BUCKET"),
)

Now we can execute the NewDataLake function, passing the storage object as the last parameter (along with the network name and the chain name).

dl := datalake.NewDataLake("oasis", "mainnet", storage)
Serializing a resource

The next step is creating a resource object by serializing our slice of validators into the JSON format.

res, err := datalake.NewJSONResource(validators)
if err != nil {
  log.Fatal(err)
}

The package supports the following serialization formats:

  • JSON
  • Binary
  • Base64
Storing a resource

Once the resource object is created, we can store it in the data lake with the following code:

err := dl.StoreResource(res, "validators.json")
if err != nil {
  log.Fatal(err)
}

In the example above, validators.json is an arbitrary name used to reference the resource.

As a result, the resource is stored in the S3 bucket under the oasis/mainnet/validators.json key.

Retrieving a resource

To retrieve a resource from the data lake we need to run the following code:

res, err = dl.RetrieveResource("validators.json")
if err != nil {
  log.Fatal(err)
}

Once again, validators.json is a name used to reference the resource.

Parsing the resource data

Now we need to parse the resource data using the same format it has been serialized with. In our case it's JSON, so we need to use the ScanJSON method.

var validators []Validator

err := res.ScanJSON(&validators)
if err != nil {
  log.Fatal(err)
}

As a result, the validators slice contains the validator data retrieved from the data lake.

Checking if a resource is stored

If we want to check if a resource has been stored in the data lake, we can pass its name to the IsResourceStored method.

stored, err := dl.IsResourceStored("validators.json")
if err != nil {
  log.Fatal(err)
}

The method returns true if the resource exists in the data lake and false otherwise.

Storing a resource at height

In case we want to store data associated with a specific height (such as transactions), we need to use the StoreResourceAtHeight method and pass the height number as the last parameter. For example:

res, err := datalake.NewJSONResource(transactions)
if err != nil {
  log.Fatal(err)
}

err = dl.StoreResourceAtHeight(res, "transactions.json", 3000000)
if err != nil {
  log.Fatal(err)
}

This time, the resource is stored under the oasis/mainnet/height/3000000/transactions.json key.

Retrieving a resource at height

To retrieve a resource stored in such a way, we need to use the RetrieveResourceAtHeight method and pass the same height number as the second parameter.

res, err := dl.RetrieveResourceAtHeight("transactions.json", 3000000)
if err != nil {
  log.Fatal(err)
}

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrResourceNameRequired = errors.New("resource name is required")

ErrResourceNameRequired is returned when the resource name is an empty string

Functions

This section is empty.

Types

type DataLake

type DataLake struct {
	// contains filtered or unexported fields
}

DataLake represents raw data storage

func NewDataLake

func NewDataLake(network string, chain string, storage Storage) *DataLake

NewDataLake creates a data lake with the given storage provider

func (*DataLake) IsResourceStored

func (dl *DataLake) IsResourceStored(name string) (bool, error)

IsResourceStored checks if the resource is stored

func (*DataLake) IsResourceStoredAtHeight

func (dl *DataLake) IsResourceStoredAtHeight(name string, height int64) (bool, error)

IsResourceStoredAtHeight checks if the resource is stored at the given height

func (*DataLake) RetrieveResource

func (dl *DataLake) RetrieveResource(name string) (*Resource, error)

RetrieveResource retrieves the resource data

func (*DataLake) RetrieveResourceAtHeight

func (dl *DataLake) RetrieveResourceAtHeight(name string, height int64) (*Resource, error)

RetrieveResourceAtHeight retrieves the resource data at the given height

func (*DataLake) StoreResource

func (dl *DataLake) StoreResource(res *Resource, name string) error

StoreResource stores the resource data

func (*DataLake) StoreResourceAtHeight

func (dl *DataLake) StoreResourceAtHeight(res *Resource, name string, height int64) error

StoreResourceAtHeight stores the resource data at the given height

type Resource

type Resource struct {
	Data []byte
}

Resource represents an object being stored

func NewBase64Resource

func NewBase64Resource(obj interface{}) (*Resource, error)

NewBase64Resource creates a Base64 resource

func NewBinaryResource

func NewBinaryResource(obj interface{}) (*Resource, error)

NewBinaryResource creates a binary resource

func NewJSONResource

func NewJSONResource(obj interface{}) (*Resource, error)

NewJSONResource creates a JSON resource

func NewResource

func NewResource(data []byte) *Resource

NewResource creates a resource

func (*Resource) ScanBase64

func (r *Resource) ScanBase64(obj interface{}) error

ScanBase64 parses the resource data as Base64

func (*Resource) ScanBinary

func (r *Resource) ScanBinary(obj interface{}) error

ScanBinary parses the resource data as binary

func (*Resource) ScanJSON

func (r *Resource) ScanJSON(obj interface{}) error

ScanJSON parses the resource data as JSON

type Storage

type Storage interface {
	Store(data []byte, path ...string) error
	IsStored(path ...string) (bool, error)
	Retrieve(path ...string) ([]byte, error)
}

Storage is an interface for storing and retrieving raw data

func NewFileStorage

func NewFileStorage(dir string) (Storage, error)

NewFileStorage creates a filesystem storage

func NewRedisStorage

func NewRedisStorage(addr string, exp time.Duration) Storage

NewRedisStorage creates a Redis storage

func NewS3Storage

func NewS3Storage(region string, bucket string) Storage

NewS3Storage creates an Amazon S3 storage

Directories

Path Synopsis
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL