provly

module
v0.0.0-...-c50d277 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 25, 2019 License: Apache-2.0

README

Provly

Go Report Card CircleCIGoDoc

provly logo

Provly & Provenance

Provly is a rough implementation of the W3C's provenance standandards. This maps the relationship between objects across time. It is worth noting that this could be the same object changing over time. This is covered in the data models section.

In order to do this Provly provides the following data models:

  • Entity
  • Activities
  • Agents

These models are connected through a set of relationships defined in the Prov spec.

The goal of this particular provenance implementation is to map activities in the research process to make scientific reproducability easier. To accomplish this goal the implementation occasionally strays from the W3C reference when necessary.

Example

graph

This represents an example graph that might be stored in Provly to describe a hypothentical experiment. We can see this experiment resolved around a packet of seeds. While most papers written about this would describe the packet of seed linking to this prov graph gives a description of the origin of this seed packet as well as the anlysis that was done to create the paper. We can see who was involved for each process of the experiment and where entities or agents represent software we are given hashes of the document to ensure exact replication.

Getting Started

This section describes setting up Provly on your machine for development/personal usage. If you are interested in using an existing Provly instance through an API contact the repo owners.

Required software:

  • Go v1.13
  • Docker
# Start the Databse
make start-db

# Start tracing server
make start-zipkins

# Run migrations
make migrate

# Start server
go run ./cmd/provly-api --zipkin-reporter-uri=0.0.0.0:9411
Command default options
--web-api-host=0.0.0.0:3000
--web-debug-host=0.0.0.0:4000
--web-read-timeout=10s
--web-write-timeout=10s
--web-shutdown-timeout=5s
--db-user=root
--db-host=[http://localhost:8529]
--db-name=provly
--zipkin-local-endpoint=0.0.0.0:3000
--zipkin-reporter-uri=http://zipkins:9411/api/v2/spans
--zipkin-service-name=provly-api
--zipkin-probability=0.05
Testing
go test ./...
Loading demo data

The data used to create diagram above can be loaded into the database by running

make demo
Points of interest

There are now four services that you can interact with to help with development.

  • API - running on :3000.
  • Monitoring & Debug - running on :4000
  • Arango Graph DB - running on :8529
  • Zipkins Tracing - running on :9411

Data models

A goal of provenance is to track relationships between entities across time using activities as the main catalyst for change. This model is often conceptually different from data models used in applications. Understanding these differences is key to using Provly effectivly.

While most applications build up relationships between different objects at a single point in time (normally as current as possible), Provly builds up relationships between a single object across a time range. This results in each item in Provly having two identifiers. A canonical ID is the identifier that defines the resource to the outside world, and a provenance ID which defines a particular version of that resource.

If this is hard to conceptualize consider the proverb "If the blade of an axe is replaced, and then its handle, it is still the same axe?" This could be modelled in Provly as follows:

data model diagram

As you can see as the axe goes through transformations its canonical ID does not change, but it gets a new provenance ID after each transformation.

Contributing

All contributions are welcome. Please contact the authors to get involved!

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL