pachyderm

package module

v1.9.0 Latest Latest Go to latest Published: Jun 12, 2019 License: Apache-2.0 Imports: 0 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/amundsenjunior/pachyderm

Links

Open Source Insights

README ¶

Pachyderm: Data Versioning, Data Pipelines, and Data Lineage

Pachyderm is a tool for production data pipelines. If you need to chain together data scraping, ingestion, cleaning, munging, wrangling, processing, modeling, and analysis in a sane way, then Pachyderm is for you. If you have an existing set of scripts which do this in an ad-hoc fashion and you're looking for a way to "productionize" them, Pachyderm can make this easy for you.

Features

Containerized: Pachyderm is built on Docker and Kubernetes. Whatever languages or libraries your pipeline needs, they can run on Pachyderm which can easily be deployed on any cloud provider or on prem.
Version Control: Pachyderm version controls your data as it's processed. You can always ask the system how data has changed, see a diff, and, if something doesn't look right, revert.
Provenance (aka data lineage): Pachyderm tracks where data comes from. Pachyderm keeps track of all the code and data that created a result.
Parallelization: Pachyderm can efficiently schedule massively parallel workloads.
Incremental Processing: Pachyderm understands how your data has changed and is smart enough to only process the new data.

Getting Started

Install Pachyderm locally or deploy on AWS/GCE/Azure in about 5 minutes.

You can also refer to our complete developer docs to see tutorials, check out example projects, and learn about advanced features of Pachyderm.

If you'd like to see some examples and learn about core use cases for Pachyderm:

Examples
Use Cases
Case Studies: Learn how General Fusion uses Pachyderm to power commercial fusion research.

Documentation

Official Documentation

Community

Keep up to date and get Pachyderm support via:

Follow us on Twitter.
Join our community Slack Channel to get help from the Pachyderm team and other users.

Contributing

To get started, sign the Contributor License Agreement.

You should also check out our contributing guide.

Send us PRs, we would love to see what you do! You can also check our GH issues for things labeled "help-wanted" as a good place to start. We're sometimes bad about keeping that label up-to-date, so if you don't see any, just let us know.

Join Us

WE'RE HIRING! Love Docker, Go and distributed systems? Learn more about our open positions or email us at jobs@pachyderm.io.

Usage Metrics

Pachyderm automatically reports anonymized usage metrics. These metrics help us understand how people are using Pachyderm and make it better. They can be disabled by setting the env variable METRICS to false in the pachd container.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

pachyderm.go

Directories ¶

Path	Synopsis
etc
build command
testing command
testing/kafka command
examples
redshift/json_to_sql command
word_count command
src
client
client/admin
client/admin/1_7/auth
client/admin/1_7/enterprise
client/admin/1_7/hashtree
client/admin/1_7/pfs
client/admin/1_7/pps
client/admin/1_8/auth
client/admin/1_8/enterprise
client/admin/1_8/pfs
client/admin/1_8/pps
client/auth
client/debug
client/enterprise
client/health
client/limit Package limit provides primitives to limit concurrency.	Package limit provides primitives to limit concurrency.
client/pfs
client/pkg/config
client/pkg/discovery
client/pkg/grpcutil
client/pkg/pbutil
client/pkg/require
client/pkg/shard
client/pkg/tracing
client/pkg/tracing/extended
client/pps
client/transaction
client/version
client/version/versionpb
plugin/vault/pachyderm
plugin/vault/pachyderm-plugin command
server/admin/cmds
server/admin/server
server/auth/cmds
server/auth/server
server/auth/testing
server/cmd/pachctl command
server/cmd/pachctl-doc command
server/cmd/pachctl/cmd
server/cmd/pachd command
server/cmd/worker command
server/debug/cmds
server/debug/server
server/enterprise/cmds
server/enterprise/server
server/health
server/http
server/pfs
server/pfs/cmds
server/pfs/fuse
server/pfs/pretty
server/pfs/s3
server/pfs/server
server/pkg/ancestry
server/pkg/backoff Package backoff implements backoff algorithms for retrying operations.	Package backoff implements backoff algorithms for retrying operations.
server/pkg/bloom
server/pkg/cache/groupcachepb
server/pkg/cache/server
server/pkg/cert
server/pkg/cmdutil
server/pkg/collection
server/pkg/dag
server/pkg/deploy
server/pkg/deploy/assets
server/pkg/deploy/cmds
server/pkg/deploy/images
server/pkg/dlock Package dlock implements a distributed lock on top of etcd.	Package dlock implements a distributed lock on top of etcd.
server/pkg/errutil
server/pkg/exec Package exec runs external commands.	Package exec runs external commands.
server/pkg/hashtree
server/pkg/lease
server/pkg/localcache
server/pkg/log
server/pkg/metrics
server/pkg/netutil
server/pkg/obj
server/pkg/pfsdb Package pfsdb contains the database schema that PFS uses.	Package pfsdb contains the database schema that PFS uses.
server/pkg/pool
server/pkg/ppsconsts Package ppsconsts constains constants relevant to PPS that are used across Pachyderm.	Package ppsconsts constains constants relevant to PPS that are used across Pachyderm.
server/pkg/ppsdb Package ppsdb contains the database schema that PPS uses.	Package ppsdb contains the database schema that PPS uses.
server/pkg/ppsutil Package ppsutil contains utilities for various PPS-related tasks, which are shared by both the PPS API and the worker binary.	Package ppsutil contains utilities for various PPS-related tasks, which are shared by both the PPS API and the worker binary.
server/pkg/pretty
server/pkg/serviceenv
server/pkg/sql
server/pkg/storage/chunk
server/pkg/storage/fileset
server/pkg/storage/fileset/index
server/pkg/storage/hash
server/pkg/sync Package sync provides utility functions similar to `git pull/push` for PFS	Package sync provides utility functions similar to `git pull/push` for PFS
server/pkg/tabwriter
server/pkg/testutil
server/pkg/transactiondb Package transactiondb contains the database schema that Pachyderm transactions use.	Package transactiondb contains the database schema that Pachyderm transactions use.
server/pkg/transactionenv
server/pkg/uuid
server/pkg/watch Package watch implements better watch semantics on top of etcd.	Package watch implements better watch semantics on top of etcd.
server/pkg/workload
server/pps
server/pps/cmds
server/pps/example
server/pps/pretty
server/pps/server
server/pps/server/githook Package githook adds support for git-based sources in pipeline specs.	Package githook adds support for git-based sources in pipeline specs.
server/transaction/cmds
server/transaction/pretty
server/transaction/server
server/worker
testing/loadtest/split/cmd/pipeline command main implements a the user logic run by the "split" loadtest (in loadtest/loadtest.go)	main implements a the user logic run by the "split" loadtest (in loadtest/loadtest.go)
testing/loadtest/split/cmd/supervisor command
testing/match command
testing/saml-idp command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL