beam

module
v0.0.0-...-6719cd2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 18, 2019 License: Apache-2.0

README

Akutan

Build Status GoDoc

There's a blog post that's a good introduction to Akutan.

Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world. A knowledge graph store enables rich queries on its data, which can be used to power real-time interfaces, to complement machine learning applications, and to make sense of new, unstructured information in the context of the existing knowledge.

How to model your data as a knowledge graph and how to query it will feel a bit different for people coming from SQL, NoSQL, and property graph stores. In a knowledge graph, data is represented as a single table of facts, where each fact has a subject, predicate, and object. This representation enables the store to sift through the data for complex queries and to apply inference rules that raise the level of abstraction. Here's an example of a tiny graph:

subject predicate object
<John_Scalzi> <born> <Fairfield>
<John_Scalzi> <lives> <Bradford>
<John_Scalzi> <wrote> <Old_Mans_War>

To learn about how to represent and query data in Akutan, see docs/query.md.

Akutan is designed to store large graphs that cannot fit on a single server. It's scalable in how much data it can store and the rate of queries it can execute. However, Akutan serializes all changes to the graph through a central log, which fundamentally limits the total rate of change. The rate of change won't improve with a larger number of servers, but a typical deployment should be able to handle tens of thousands of changes per second. In exchange for this limitation, Akutan's architecture is a relatively simple one that enables many features. For example, Akutan supports transactional updates and historical global snapshots. We believe this trade-off is suitable for most knowledge graph use cases, which accumulate large amounts of data but do so at a modest pace. To learn more about Akutan's architecture and this trade-off, see docs/central_log_arch.md.

Akutan isn't ready for production-critical deployments, but it's useful today for some use cases. We've run a 20-server deployment of Akutan for development purposes and off-line use cases for about a year, which we've most commonly loaded with a dataset of about 2.5 billion facts. We believe Akutan's current capabilities exceed this capacity and scale; we haven't yet pushed Akutan to its limits. The project has a good architectural foundation on which additional features can be built and higher performance could be achieved.

Akutan needs more love before it can be used for production-critical deployments. Much of Akutan's code consists of high-quality, documented, unit-tested modules, but some areas of the code base are inherited from Akutan's earlier prototype days and still need attention. In other places, some functionality is lacking before Akutan could be used as a critical production data store, including deletion of facts, backup/restore, and automated cluster management. We have filed GitHub issues for these and a few other things. There are also areas where Akutan could be improved that wouldn't necessarily block production usage. For example, Akutan's query language is not quite compatible with Sparql, and its inference engine is limited.

So, Akutan has a nice foundation and may be useful to some people, but it also needs additional love. If that's not for you, here are a few alternative open-source knowledge and property graph stores that you may want to consider (we have no affiliation with these projects):

  • Blazegraph: an RDF store. Supports several query languages, including SPARQL and Gremlin. Disk-based, single-master, scales out for reads only. Seems unmaintained. Powers https://query.wikidata.org/.
  • Dgraph: a triple-oriented property graph store. GraphQL-like query language, no support for SPARQL. Disk-based, scales out.
  • Neo4j: a property graph store. Cypher query language, no support for SPARQL. Single-master, scales out for reads only.
  • See also Wikipedia's Comparison of Triplestores page.

The remainder of this README describes how to get Akutan up and running. Several documents under the docs/ directory describe aspects of Akutan in more detail; see docs/README.md for an overview.

Installing dependencies and building Akutan

Akutan has the following system dependencies:

  • It's written in Go. You'll need v1.11.5 or newer.
  • Akutan uses Protocol Buffers extensively to encode messages for gRPC, the log of data changes, and storage on disk. You'll need protobuf version 3. We reccomend 3.5.2 or later. Note that 3.0.x is the default in many Linux distributions, but doesn't work with the Akutan build.
  • Akutan's Disk Views store their facts in RocksDB.

On Mac OS X, these can all be installed via Homebrew:

$ brew install golang protobuf rocksdb zstd

On Ubuntu, refer to the files within the docker/ directory for package names to use with apt-get.

After cloning the Akutan repository, pull down several Go libraries and additional Go tools:

$ make get

Finally, build the project:

$ make build

Running Akutan locally

The fastest way to run Akutan locally is to launch the in-memory log store:

$ bin/plank

Then open another terminal and run:

$ make run

This will bring up several Akutan servers locally. It starts an API server that listens on localhost for gRPC requests on port 9987 and for HTTP requests on port 9988, such as http://localhost:9988/stats.txt.

The easiest way to interact with the API server is using bin/akutan-client. See docs/query.md for examples. The API server exposes the FactStore gRPC service defined in proto/api/akutan_api.proto.

Deployment concerns

The log

Earlier, we used bin/plank as a log store, but this is unsuitable for real usage! Plank is in-memory only, isn't replicated, and by default, it only keeps 1000 entries at a time. It's only meant for development.

Akutan also supports using Apache Kafka as its log store. This is recommended over Plank for any deployment. To use Kafka, follow the Kafka quick start guide to install Kafka, start ZooKeeper, and start Kafka. Then create a topic called "akutan" (not "test" as in the Kafka guide) with partitions set to 1. You'll want to configure Kafka to synchronously write entries to disk.

To use Kafka with Akutan, set the akutanLog's type to kafka in your Akutan configuration (default: local/config.json), and update the locator's addresses accordingly (Kafka uses port 9092 by default). You'll need to clear out Akutan's Disk Views' data before restarting the cluster. The Disk Views by default store their data in $TMPDIR/rocksdb-akutan-diskview-{space}-{partition} so you can delete them all with rm -rf $TMPDIR/rocksdb-akutan-diskview*

Docker and Kubernetes

This repository includes support for running Akutan inside Docker and Minikube. These environments can be tedious for development purposes, but they're useful as a step towards a modern and robust production deployment.

See cluster/k8s/Minikube.md file for the steps to build and deploy Akutan services in Minikube. It also includes the steps to build the Docker images.

Distributed tracing

Akutan generates distributed OpenTracing traces for use with Jaeger. To try it, follow the Jaeger Getting Started Guide for running the all-in-one Docker image. The default make run is configured to send traces there, which you can query at http://localhost:16686. The Minikube cluster also includes a Jaeger all-in-one instance.

Development

VS Code

You can use whichever editor you'd like, but this repository contains some configuration for VS Code. We suggest the following extensions:

Override the default settings in .vscode/settings.json with ./vscode-settings.json5.

Test targets

The Makefile contains various targets related to running tests:

Target Description
make test run all the akutan unit tests
make cover run all the akutan unit tests and open the web-based coverage viewer
make lint run basic code linting
make vet run all static analysis tests including linting and formatting

License Information

Copyright 2019 eBay Inc.

Primary authors: Simon Fell, Diego Ongaro, Raymond Kroeker, Sathish Kandasamy

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Note the project was renamed to Akutan in July 2019.

Directories

Path Synopsis
src
github.com/ebay/akutan/api
Package api contains ProtoBuf-generated types for the external gRPC API to Akutan.
Package api contains ProtoBuf-generated types for the external gRPC API to Akutan.
github.com/ebay/akutan/api/akutan-api
Command akutan-api runs an Akutan API server daemon.
Command akutan-api runs an Akutan API server daemon.
github.com/ebay/akutan/api/impl
Package impl implements the external gRPC and HTTP API servers for Akutan.
Package impl implements the external gRPC and HTTP API servers for Akutan.
github.com/ebay/akutan/api/impl/kgstats
Package kgstats fetches and caches runtime statistics about the cluster and dataset.
Package kgstats fetches and caches runtime statistics about the cluster and dataset.
github.com/ebay/akutan/blog
Package blog contains interfaces to Akutan's data log.
Package blog contains interfaces to Akutan's data log.
github.com/ebay/akutan/blog/kafka
Package kafka implements a Kafka client as a blog.AkutanLog.
Package kafka implements a Kafka client as a blog.AkutanLog.
github.com/ebay/akutan/blog/logspecclient
Package logspecclient implements a client for the akutan/logspec API.
Package logspecclient implements a client for the akutan/logspec API.
github.com/ebay/akutan/blog/mockblog
Package mockblog contains an in-process, in-memory implementation of a Akutan log client and server.
Package mockblog contains an in-process, in-memory implementation of a Akutan log client and server.
github.com/ebay/akutan/config
Package config contains the configuration for a Akutan server.
Package config contains the configuration for a Akutan server.
github.com/ebay/akutan/discovery
Package discovery defines basic concepts around service discovery and locating endpoints.
Package discovery defines basic concepts around service discovery and locating endpoints.
github.com/ebay/akutan/discovery/discoveryfactory
Package discoveryfactory constructs service discovery implementations.
Package discoveryfactory constructs service discovery implementations.
github.com/ebay/akutan/discovery/kubediscovery
Package kubediscovery provides an implementation of the Locator interface backed by Kubernetes service discovery.
Package kubediscovery provides an implementation of the Locator interface backed by Kubernetes service discovery.
github.com/ebay/akutan/diskview
Package diskview implements a view service that serves facts from an ordered key-value store.
Package diskview implements a view service that serves facts from an ordered key-value store.
github.com/ebay/akutan/diskview/akutan-diskview
Command akutan-diskview runs a DiskView daemon.
Command akutan-diskview runs a DiskView daemon.
github.com/ebay/akutan/diskview/database
Package database defines an abstract ordered Key/Value store that can be used as a backing store by the Disk View.
Package database defines an abstract ordered Key/Value store that can be used as a backing store by the Disk View.
github.com/ebay/akutan/diskview/keys
Package keys provides support for building and parsing the DiskView's binary key format that facts are encoded into.
Package keys provides support for building and parsing the DiskView's binary key format that facts are encoded into.
github.com/ebay/akutan/diskview/rocksdb
Package rocksdb provides an implementation of the Database interface that is backed by a local RocksDB Key/Value store
Package rocksdb provides an implementation of the Database interface that is backed by a local RocksDB Key/Value store
github.com/ebay/akutan/facts/cache
Package cache provides for caching facts that were infered during a query and pottentially reusing them for subsequent operations with the same query.
Package cache provides for caching facts that were infered during a query and pottentially reusing them for subsequent operations with the same query.
github.com/ebay/akutan/infer
Package infer implements fact inference by traversing transitive predicates.
Package infer implements fact inference by traversing transitive predicates.
github.com/ebay/akutan/logentry
Package logentry contains all the types generated from the protobuf files.
Package logentry contains all the types generated from the protobuf files.
github.com/ebay/akutan/logentry/logencoder
Package logencoder handles serialization and deserialization of logentry.*Command in to/from bytes.
Package logencoder handles serialization and deserialization of logentry.*Command in to/from bytes.
github.com/ebay/akutan/logentry/logread
Package logread deals with mapping from logentry types into rpc types.
Package logread deals with mapping from logentry types into rpc types.
github.com/ebay/akutan/logentry/logwrite
Package logwrite contains helper functions to create instances of types in the logentry package.
Package logwrite contains helper functions to create instances of types in the logentry package.
github.com/ebay/akutan/logspec
Package logspec contains ProtoBuf-generated types for Akutan's log.
Package logspec contains ProtoBuf-generated types for Akutan's log.
github.com/ebay/akutan/msg/facts
Package facts defines the well known base set of facts that are needed to bootstrap the graph
Package facts defines the well known base set of facts that are needed to bootstrap the graph
github.com/ebay/akutan/msg/kgobject
Package kgobject contains helper methods to construct common api KGObject instances
Package kgobject contains helper methods to construct common api KGObject instances
github.com/ebay/akutan/partitioning
Package partitioning provides ways to describe how the set of facts have been partitioned.
Package partitioning provides ways to describe how the set of facts have been partitioned.
github.com/ebay/akutan/query
Package query provides a high level entry point for executing AkutanQL queries.
Package query provides a high level entry point for executing AkutanQL queries.
github.com/ebay/akutan/query/exec
Package exec is used to execute a KG query that was built by the query planner.
Package exec is used to execute a KG query that was built by the query planner.
github.com/ebay/akutan/query/internal/debug
Package debug contains functions to help track details about query processing and report them.
Package debug contains functions to help track details about query processing and report them.
github.com/ebay/akutan/query/parser
Package parser implements a parser combinator for the akutan query language.
Package parser implements a parser combinator for the akutan query language.
github.com/ebay/akutan/query/planner
Package planner is the KG/Akutan-specific query optimizer.
Package planner is the KG/Akutan-specific query optimizer.
github.com/ebay/akutan/query/planner/plandef
Package plandef defines the output of the query planner.
Package plandef defines the output of the query planner.
github.com/ebay/akutan/query/planner/search
Package search implements a generic query optimizer algorithm.
Package search implements a generic query optimizer algorithm.
github.com/ebay/akutan/rpc
Package rpc contains ProtoBuf-generated types for the messages communicated between Akutan servers.
Package rpc contains ProtoBuf-generated types for the messages communicated between Akutan servers.
github.com/ebay/akutan/space
Package space defines abstract notions of points and ranges.
Package space defines abstract notions of points and ranges.
github.com/ebay/akutan/tools/akutan-client
Command bc provides command line access to the akutan GRPC API
Command bc provides command line access to the akutan GRPC API
github.com/ebay/akutan/tools/carousel-client
Command carousel-client is a low level carousel client tool for helping investigate performance etc.
Command carousel-client is a low level carousel client tool for helping investigate performance etc.
github.com/ebay/akutan/tools/db-scan
Command db-scan reads all keys from a Rocks database.
Command db-scan reads all keys from a Rocks database.
github.com/ebay/akutan/tools/dep
Command dep checks / fetches / update dependencies
Command dep checks / fetches / update dependencies
github.com/ebay/akutan/tools/gen-kube
Command generate writes out Kubernetes configuration for portions of the Akutan cluster.
Command generate writes out Kubernetes configuration for portions of the Akutan cluster.
github.com/ebay/akutan/tools/gen-local
Command gen-local writes out files used to run a Akutan cluster locally.
Command gen-local writes out files used to run a Akutan cluster locally.
github.com/ebay/akutan/tools/gen-local/gen
Package gen is used in generating configurations for an entire Akutan cluster.
Package gen is used in generating configurations for an entire Akutan cluster.
github.com/ebay/akutan/tools/grpcbench
Command grpcbench is a small benchmark tool for gRPC.
Command grpcbench is a small benchmark tool for gRPC.
github.com/ebay/akutan/tools/log-client
Command log-client is a tool for low-level access to Akutan log servers.
Command log-client is a tool for low-level access to Akutan log servers.
github.com/ebay/akutan/tools/plank
Command plank implements a logspec server by storing entries in local memory only.
Command plank implements a logspec server by storing entries in local memory only.
github.com/ebay/akutan/tools/view-client
Command view-client is command line tool for calling Akutan views.
Command view-client is command line tool for calling Akutan views.
github.com/ebay/akutan/txtimeoutview
Package txtimeoutview implements a view service that times out slow transactions and measures the log's latency.
Package txtimeoutview implements a view service that times out slow transactions and measures the log's latency.
github.com/ebay/akutan/txtimeoutview/akutan-txview
Command akutan-txview runs a TxTimeoutView daemon.
Command akutan-txview runs a TxTimeoutView daemon.
github.com/ebay/akutan/txtimeoutview/logping
Package logping measures the latency of Akutan's log by appending to it and reading from it.
Package logping measures the latency of Akutan's log by appending to it and reading from it.
github.com/ebay/akutan/txtimeoutview/txtimer
Package txtimer watches for slow transactions and aborts them.
Package txtimer watches for slow transactions and aborts them.
github.com/ebay/akutan/update
Package update handles requests to modify the graph.
Package update handles requests to modify the graph.
github.com/ebay/akutan/update/conv
Package conv helps convert between related types as an update request is processed.
Package conv helps convert between related types as an update request is processed.
github.com/ebay/akutan/util/bytes
Package bytes aids in manipulating byte slices and writing bytes and strings.
Package bytes aids in manipulating byte slices and writing bytes and strings.
github.com/ebay/akutan/util/clocks
Package clocks provides a mockable way to measure time and set timers.
Package clocks provides a mockable way to measure time and set timers.
github.com/ebay/akutan/util/cmp
Package cmp provides common operators on a number of scalar types
Package cmp provides common operators on a number of scalar types
github.com/ebay/akutan/util/debuglog
Package debuglog configures Logrus.
Package debuglog configures Logrus.
github.com/ebay/akutan/util/errors
Package errors aids in handling errors.
Package errors aids in handling errors.
github.com/ebay/akutan/util/graphviz
Package graphviz generates diagrams from dot input.
Package graphviz generates diagrams from dot input.
github.com/ebay/akutan/util/grpc/client
Package grpcclientutil has helpers for configuring gRPC clients.
Package grpcclientutil has helpers for configuring gRPC clients.
github.com/ebay/akutan/util/grpc/server
Package grpcserverutil has helpers for configuring gRPC servers
Package grpcserverutil has helpers for configuring gRPC servers
github.com/ebay/akutan/util/metrics
Package metrics aids in defining Prometheus metrics.
Package metrics aids in defining Prometheus metrics.
github.com/ebay/akutan/util/parallel
Package parallel is a utility package for running parallel/concurrent tasks.
Package parallel is a utility package for running parallel/concurrent tasks.
github.com/ebay/akutan/util/perfbenchmarks
Package perfbenchmarks contains benchmarks for Go language and standard library features.
Package perfbenchmarks contains benchmarks for Go language and standard library features.
github.com/ebay/akutan/util/profiling
Package profiling assists in gathering CPU profiles.
Package profiling assists in gathering CPU profiles.
github.com/ebay/akutan/util/random
Package random helps seed the math/rand pseudo-random number generator.
Package random helps seed the math/rand pseudo-random number generator.
github.com/ebay/akutan/util/signals
Package signals aids in POSIX signal handling.
Package signals aids in POSIX signal handling.
github.com/ebay/akutan/util/stats
Package stats contains a pretty-printer for statistics about the facts stored on DiskViews.
Package stats contains a pretty-printer for statistics about the facts stored on DiskViews.
github.com/ebay/akutan/util/table
Package table formats data into a text-based table for human consumption.
Package table formats data into a text-based table for human consumption.
github.com/ebay/akutan/util/tracing
Package tracing assists with reporting OpenTracing traces.
Package tracing assists with reporting OpenTracing traces.
github.com/ebay/akutan/util/unicode
Package unicode contains Unicode text functionality for Akutan store.
Package unicode contains Unicode text functionality for Akutan store.
github.com/ebay/akutan/util/web
Package web aids in writing HTTP servers.
Package web aids in writing HTTP servers.
github.com/ebay/akutan/viewclient
Package viewclient provides functionality for querying view servers.
Package viewclient provides functionality for querying view servers.
github.com/ebay/akutan/viewclient/fanout
Package fanout is useful for invoking RPCs across a bunch of servers.
Package fanout is useful for invoking RPCs across a bunch of servers.
github.com/ebay/akutan/viewclient/lookups
Package lookups defines go interfaces that the various LookupXX rpc wrappers expose, this can be useful in decoupling the actual Loopup implementation from its usage, allowing for easier testing
Package lookups defines go interfaces that the various LookupXX rpc wrappers expose, this can be useful in decoupling the actual Loopup implementation from its usage, allowing for easier testing
github.com/ebay/akutan/viewclient/lookups/mocklookups
Package mocklookups provides a mock implementation of the various Fact lookup RPCs.
Package mocklookups provides a mock implementation of the various Fact lookup RPCs.
github.com/ebay/akutan/viewclient/mockstore
Package mockstore provides various mocks that store facts and can execute lookups against them.
Package mockstore provides various mocks that store facts and can execute lookups against them.
github.com/ebay/akutan/viewclient/viewreg
Package viewreg tracks all the known view servers in the cluster.
Package viewreg tracks all the known view servers in the cluster.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL