datanode/

directory
v0.55.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 12, 2022 License: MIT

README

Data node

Version 0.53.0

A service exposing read only APIs built on top of Vega platform.

Data node provides the following core features:

  • Consume all events from Vega core
  • Aggregates received events and stores the aggregated data
  • Serves stored data via APIs
  • Allows advanced configuration Configure a node
  • For new developers, see Getting Started.
  • For updates, see the Change log for major updates.
  • For architecture, please read the documentation to learn about the design for the system and its architecture.
  • Please open an issue if anything is missing or unclear in this documentation.
Table of Contents (click to expand)

Installation

To install see Getting Started.

Configuration

Data node is initialised with a set of default configuration with the command data-node init. To override any of the defaults edit your config.toml typically found in the ~/.data-node directory. Example:

[Matching]
  Level = 0
  ProRataMode = false
  LogPriceLevelsDebug = false
  LogRemovedOrdersDebug = false

PostgreSQL

As of version 0.53, data node uses PostgreSQL as its storage back end instead of the previous mix of in-memory and BadgerDB file stores. We also make use of Postgres extension called TimescaleDB, which adds a number of time series specific features.

Postgres is not an embedded database, but a separate server application that needs to be running before datanode starts, and a side effect of this transition is a little bit of setup is required by the data node operator.

By default, data node will attempt to connect to a database called vega listening on localhost:5432, using the username and password vega. This is of course all configurable in data node’s config.toml file.

We are developing using PostgreSQL 14.2 and Timescale 2.7.1 and strongly recommend that you also use the same versions.

[SQLStore]
 UseEmbedded = false
 [SQLStore.ConnectionConfig]
   Host = "localhost"
   Port = 5432
   Username = "vega"
   Password = "vega"
   Database = "vega"
   UseTransactions = true

Persistence

Currently the database is destroyed if it exists and recreated at data node start-up, though we expect this to change in the not too distant future once the schema has settled down and we add support for starting/stopping data nodes without replaying the entire chain.

There are a few different ways you can get postgres & timescale up and running.

Using docker

This is probably the most straightforward and reliable way to get up and running.

Timescale supply a docker image, so assuming you already have docker installed, it is a simple matter of:

docker run --rm \
           -d
           -e POSTGRES_USER=vega \
           -e POSTGRES_PASSWORD=vega \
           -e POSTGRES_DB=vega \
           -p 5432:5432 \
           timescale/timescaledb:2.7.1-pg14
Using your operating system's native packages

Timescale have a set of instructions for installing Postgres/Timescale using .deb or .rpm they have built. If you follow these and get postgres running as a system service you'll then have to create a database, user, and password for the data node to use. For example:

➜  ~ sudo -u postgres psql
psql (14.3 (Ubuntu 14.3-0ubuntu0.22.04.1))
Type "help" for help.


postgres=# create database vega;
CREATE DATABASE

postgres=# create user vega with password 'vega';
CREATE ROLE

postgres=# grant all privileges on database vega to vega;
GRANT
Using 'embedded' PostgreSQL

As mentioned above, PostgreSQL is not an embedded database. However, the good folks over at embedded-postgres-go didn't let that stop them trying.

This go package allows us to start a PostgreSQL server from the data-node. It does this by

  • Examining your system to figure out what platform/architecture it is
  • Downloading an appropriate PostgreSQL binary installation
  • Unpacking it to a temporary location
  • Configuring and launching Postgres as a child process of data-node

embedded-postgres-go doesn't come with support for TimescaleDB so we forked it and built a set of our own binaries for a limited set of platforms which we host on GitHub.

We use it for running integration tests and it works quite well however, we haven't tested it on a wide range of platforms, and ran into a few odd issues usually related to linking to various system libraries or sometimes not shutting down cleanly.

You can launch postgres in this way either with the command either using

data-node postgres run

Which will launch embedded postgres in it's own process or

Or by setting

[SQLStore]
  UseEmbedded = true

Which will cause data-node to launch Postgres as it starts up, and stop it when it exits. While convenient, if data-node is forcefully killed and doesn't have chance to shutdown it is possible for postgres to keep on running. Postgres then needs to be manually killed to prevent 'unable to bind to port' errors on the next start.

In both cases, the files for the database will be stored in your 'state' directory, e.g. ~/.local/state/vega/data-node/ on Linux.

Building from source

It's quite straightforward; if this is your preferred option you probably already know how to do it. There are instructions on the timescale website.

Using a cloud database provider

This isn't something we've tested yet, but it's something we plan to investigate in the future. Feel very free to give it a try; our main concern is that the latency of the connection may cause data-node to be unable to process blocks as fast as they are produced.

Timescale provide a hosted service, I believe AWS do as well.

Vega core streaming

Data requires an instance of Vega core node for it's meaningful function. Please see Vega Getting Started. The data node will listen on default port 3002 for incoming connections from Vega core node.

APIs

In order for clients to communicate with data nodes, we expose a set of APIs and methods for reading data.

There are currently three protocols to communicate with the data node APIs:

gRPC

gRPC is an open source remote procedure call (RPC) system initially developed at Google. In data node the gRPC API features streaming of events in addition to standard procedure calls.

The default port (configurable) for the gRPC API is 3007 and matches the gRPC protobuf definition.

GraphQL

GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data, originally developed at Facebook. The Console uses the GraphQL API to retrieve data including streaming of events.

The GraphQL API is defined by a schema. External clients will use this schema to communicate with Vega.

Queries can be tested using the GraphQL playground app which is bundled with a node. The default port (configurable) for the playground app is 3008 accessing this in a web browser will show a web app for testing custom queries, mutations and subscriptions.

GraphQL SSL

GraphQL subscriptions do not work properly unless the HTTPS is enabled.

To enable TLS on the GraphQL port, set

  [Gateway.GraphQL]
    HTTPSEnabled = true

You will need your data node to be reachable over the internet with a proper fully qualified domain name, and a matching certificate. If you already have a certificate and corresponding private key file, you can specify them as follows:

  [Gateway.GraphQL]
    CertificateFile = "/path/to/certificate/file"
    KeyFile = "/path/to/key/file"

If you prefer, the data node can manage this for you by automatically generating a certificate and using LetsEncrypt to sign it for you.

  [Gateway.GraphQL]
    HTTPSEnabled = true
    AutoCertDomain = "my.lovely.domain.com"

However, it is a requirement of the LetsEncrypt validation process that the the server answering its challenge is running on the standard HTTPS port (443). This means you must either

  • Forward port 443 on your machine to the GraphQL port (3008 by default) using iptables or similar
  • Directly use port 443 for the GraphQL server in data-node by specifying
  [Gateway.GraphQL]
    Port = 443

Note that Linux systems generally require processes listening on ports under 1024 to either

  • run as root, or
  • be specifically granted permission, e.g. by launching with
setcap cap_net_bind_service=ep data-node run
GraphQL Complexity

Currently the GraphQL complexity limit is globally set to 3750. This setting is theoretic at the moment and will be refined and have different levels for different queries/resolvers in the future.

The intention behind this limit is to prevent the VEGA system from being anused by heavy queries (DOS). The complexity level is mostly affected by the number of objects a query contains. So the heaviest ones we currently have in the system (based on discussion with Matt) are: SimpleMarkets (embedded candles): 1 candle: 151 91 candles: 788

MarketInfo (embedded candles): 1 candle: 399 91 candles: 1036

Orders (embedded orders): Complexity for: 1 order is: 163 80 orders: 4003

Trades (embedded trades): Complexity with 1 trade is: 118 75 trades: 1393

Positions (embedded positions): Complexity with 1 position is 129 For 40 positions is: 2500

The approximate number of positions queries by customers is 40.

At the moment we do not have a precise idea what limit would be appropriate to set for candles and orders. This would take some time and experience. So for 40 positions - complexity is 2500. A theoretical value of 3000 is set as a maximum + 25% -> 3750. The GraphQL will return error for queries that have complexity above the set limit: "GraphQL error: Query is too complex to execute" and will not proceed with execution.

Further settings for GraphQL limits will be customized for specific evil queries and will be set for the specific GraphQL resolvers methods. That would also affect subscriptions and oneoff queries.

REST

REST provides a standard between computer systems on the web, making it easier for systems to communicate with each other. It is arguably simpler to work with than gRPC and GraphQL. In Vega the REST API is a reverse proxy to the gRPC API, however it does not support streaming.

The default port (configurable) for the REST API is 3009 and we use a reverse proxy to the gRPC API to deliver the REST API implementation.

Troubleshooting & debugging

The application has structured logging capability, the first port of call for a crash is probably the Vega and Tendermint logs which are available on the console if running locally or by journal plus syslog if running on test networks. Default location for log files:

  • /var/log/vega.log

Each internal Go package has a logging level that can be set at runtime by configuration. Setting the logging Level to -1 for a package will enable all debugging messages for the package which can be useful when trying to analyse a crash or issue.

Directories

Path Synopsis
api
Package api contains code for running the gRPC server.
Package api contains code for running the gRPC server.
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
graphql
Package gql contains code for running the GraphQL-to-gRPC gateway.
Package gql contains code for running the GraphQL-to-gRPC gateway.
graphql/mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
rest
Package rest contains code for running the REST-to-gRPC gateway.
Package rest contains code for running the REST-to-gRPC gateway.
libs
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.
mocks
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL