pachyderm

package module
v2.7.0-nightly.20230526 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2023 License: Apache-2.0 Imports: 0 Imported by: 0

README

GitHub release GitHub license GoDoc Go Report Card Slack Status CLA assistant

Pachyderm – Automate data transformations with data versioning and lineage

Pachyderm is cost-effective at scale, enabling data engineering teams to automate complex pipelines with sophisticated data transformations across any type of data. Our unique approach provides parallelized processing of multi-stage, language-agnostic pipelines with data versioning and data lineage tracking. Pachyderm delivers the ultimate CI/CD engine for data.

Features

  • Data-driven pipelines automatically trigger based on detecting data changes.
  • Immutable data lineage with data versioning of any data type.
  • Autoscaling and parallel processing built on Kubernetes for resource orchestration.
  • Uses standard object stores for data storage with automatic deduplication.
  • Runs across all major cloud providers and on-premises installations.

Getting Started

To start deploying your end-to-end version-controlled data pipelines, run Pachyderm locally or you can also deploy on AWS/GCE/Azure in about 5 minutes.

You can also refer to our complete documentation to see tutorials, check out example projects, and learn about advanced features of Pachyderm.

If you'd like to see some examples and learn about core use cases for Pachyderm:

Documentation

Official Documentation

Community

Keep up to date and get Pachyderm support via:

  • Twitter Follow us on Twitter.
  • Slack Status Join our community Slack Channel to get help from the Pachyderm team and other users.

Contributing

To get started, sign the Contributor License Agreement.

You should also check out our contributing guide.

Send us PRs, we would love to see what you do! You can also check our GH issues for things labeled "help-wanted" as a good place to start. We're sometimes bad about keeping that label up-to-date, so if you don't see any, just let us know.

Join Us

WE'RE HIRING! Love Docker, Go and distributed systems? Learn more about our open positions

Usage Metrics

Pachyderm automatically reports anonymized usage metrics. These metrics help us understand how people are using Pachyderm and make it better. They can be disabled by setting the env variable METRICS to false in the pachd container.

License Information

Pachyderm has moved some components of Pachyderm Platform to a source-available limited license.

We remain committed to the culture of open source, developing our product transparently and collaboratively with our community, and giving our community and customers source code access and the ability to study and change the software to suit their needs.

Under the Pachyderm Community License, you can access the source code and modify or redistribute it; there is only one thing you cannot do, and that is use it to make a competing offering.

Check out our License FAQ Page for more information.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
etc
examples
src
client/limit
Package limit provides primitives to limit concurrency.
Package limit provides primitives to limit concurrency.
constants
Package constants contains constants shared among packages.
Package constants contains constants shared among packages.
internal/archiveserver
Package archiveserver implements an HTTP server for downloading archives.
Package archiveserver implements an HTTP server for downloading archives.
internal/backoff
Package backoff implements backoff algorithms for retrying operations.
Package backoff implements backoff algorithms for retrying operations.
internal/clusterstate
DO NOT MODIFY THIS STATE IT HAS ALREADY SHIPPED IN A RELEASE
DO NOT MODIFY THIS STATE IT HAS ALREADY SHIPPED IN A RELEASE
internal/cmputil
Package cmputil provides utilities for cmp.Diff.
Package cmputil provides utilities for cmp.Diff.
internal/dlock
Package dlock implements a distributed lock on top of etcd.
Package dlock implements a distributed lock on top of etcd.
internal/dockertestenv
package dockertestenv provides test environment where service dependencies are docker containers
package dockertestenv provides test environment where service dependencies are docker containers
internal/limit
Package limit provides primitives to limit concurrency.
Package limit provides primitives to limit concurrency.
internal/log
Package log is Pachyderm's logger.
Package log is Pachyderm's logger.
internal/meters
Package meters implements lightweight metrics for internal use.
Package meters implements lightweight metrics for internal use.
internal/middleware/logging/client
Package client contains GRPC client interceptors for logging.
Package client contains GRPC client interceptors for logging.
internal/miscutil
Package miscutil provides an "Island of Misfit Toys", but for helper functions
Package miscutil provides an "Island of Misfit Toys", but for helper functions
internal/pachctl
Package pachctl contains utilities for implementing pachctl commands.
Package pachctl contains utilities for implementing pachctl commands.
internal/pachd
Package pachd implements the Pachyderm dæmon and its various modes.
Package pachd implements the Pachyderm dæmon and its various modes.
internal/pctx
Package pctx implements contexts for Pachyderm.
Package pctx implements contexts for Pachyderm.
internal/pfsdb
Package pfsdb contains the database schema that PFS uses.
Package pfsdb contains the database schema that PFS uses.
internal/ppsdb
Package ppsdb contains the database schema that PPS uses.
Package ppsdb contains the database schema that PPS uses.
internal/ppsutil
Package ppsutil contains utilities for various PPS-related tasks, which are shared by both the PPS API and the worker binary.
Package ppsutil contains utilities for various PPS-related tasks, which are shared by both the PPS API and the worker binary.
internal/proc
Package proc contains utilities for monitoring the resource use of processes.
Package proc contains utilities for monitoring the resource use of processes.
internal/profileutil
Profileutil contains functionality to export performance information to external systems.
Profileutil contains functionality to export performance information to external systems.
internal/promutil
Package promutil contains utilities for collecting Prometheus metrics.
Package promutil contains utilities for collecting Prometheus metrics.
internal/sdata/csv
Package csv reads and writes comma-separated values (CSV) files.
Package csv reads and writes comma-separated values (CSV) files.
internal/serde
Package serde contains Pachyderm-specific data structures for marshalling and unmarshalling Go structs and maps to structured text formats (currently just JSON and YAML).
Package serde contains Pachyderm-specific data structures for marshalling and unmarshalling Go structs and maps to structured text formats (currently just JSON and YAML).
internal/storage/chunk
Package chunk provides access to data through content-addressed chunks.
Package chunk provides access to data through content-addressed chunks.
internal/storage/fileset
Package fileset provides access to files through file sets.
Package fileset provides access to files through file sets.
internal/storage/fileset/index
Package index provides access to files through multilevel indexes.
Package index provides access to files through multilevel indexes.
internal/stream
Package stream contains interfaces and helper functions for managing iterators that can block.
Package stream contains interfaces and helper functions for managing iterators that can block.
internal/testsnowflake
package testsnowflake provides convenience functions for creating Snowflake databases for testing.
package testsnowflake provides convenience functions for creating Snowflake databases for testing.
internal/transactiondb
Package transactiondb contains the database schema that Pachyderm transactions use.
Package transactiondb contains the database schema that Pachyderm transactions use.
internal/transforms
package transforms contains PPS Pipeline Transform implementations
package transforms contains PPS Pipeline Transform implementations
internal/watch
Package watch implements better watch semantics on top of etcd.
Package watch implements better watch semantics on top of etcd.
pfs
pps
protoextensions
Package protoextensions is the runtime support code for protoc-gen-zap (in ../etc/proto).
Package protoextensions is the runtime support code for protoc-gen-zap (in ../etc/proto).
server/cmd/local-download-server
Command local-download-server runs the PFS download server locally, against your current pachctl context.
Command local-download-server runs the PFS download server locally, against your current pachctl context.
server/pfs/s3
TODO: the s2 library checks the type of the error to decide how to handle it, which doesn't work properly with wrapped errors
TODO: the s2 library checks the type of the error to decide how to handle it, which doesn't work properly with wrapped errors

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL