pachyderm

package module
v2.9.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2024 License: Apache-2.0 Imports: 0 Imported by: 0

README

GitHub release GitHub license GoDoc Go Report Card Slack Status CLA assistant

Pachyderm – Automate data transformations with data versioning and lineage

Pachyderm is cost-effective at scale, enabling data engineering teams to automate complex pipelines with sophisticated data transformations across any type of data. Our unique approach provides parallelized processing of multi-stage, language-agnostic pipelines with data versioning and data lineage tracking. Pachyderm delivers the ultimate CI/CD engine for data.

Features

  • Data-driven pipelines automatically trigger based on detecting data changes.
  • Immutable data lineage with data versioning of any data type.
  • Autoscaling and parallel processing built on Kubernetes for resource orchestration.
  • Uses standard object stores for data storage with automatic deduplication.
  • Runs across all major cloud providers and on-premises installations.

Getting Started

To start deploying your end-to-end version-controlled data pipelines, run Pachyderm locally or you can also deploy on AWS/GCE/Azure in about 5 minutes.

You can also refer to our complete documentation to see tutorials, check out example projects, and learn about advanced features of Pachyderm.

If you'd like to see some examples and learn about core use cases for Pachyderm:

Documentation

Official Documentation

Community

Keep up to date and get Pachyderm support via:

  • Twitter Follow us on Twitter.
  • Slack Status Join our community Slack Channel to get help from the Pachyderm team and other users.

Contributing

To get started, sign the Contributor License Agreement.

You should also check out our contributing guide.

Send us PRs, we would love to see what you do! You can also check our GH issues for things labeled "help-wanted" as a good place to start. We're sometimes bad about keeping that label up-to-date, so if you don't see any, just let us know.

Usage Metrics

Pachyderm automatically reports anonymized usage metrics. These metrics help us understand how people are using Pachyderm and make it better. They can be disabled by setting the env variable METRICS to false in the pachd container.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
etc
examples
src
admin
Package admin is a reverse proxy.
Package admin is a reverse proxy.
auth
Package auth is a reverse proxy.
Package auth is a reverse proxy.
client/limit
Package limit provides primitives to limit concurrency.
Package limit provides primitives to limit concurrency.
cmd/compile-with-coverage
Binary compile-with-coverage takes a GOPATH created by the Bazel go_path rule, resolves all symlinks inside it by copying, and invokes the Go compiler.
Binary compile-with-coverage takes a GOPATH created by the Bazel go_path rule, resolves all symlinks inside it by copying, and invokes the Go compiler.
cmd/starpach
Command starpach is a tool for developers of Pachyderm.
Command starpach is a tool for developers of Pachyderm.
constants
Package constants contains constants shared among packages.
Package constants contains constants shared among packages.
debug
Package debug is a reverse proxy.
Package debug is a reverse proxy.
enterprise
Package enterprise is a reverse proxy.
Package enterprise is a reverse proxy.
identity
Package identity is a reverse proxy.
Package identity is a reverse proxy.
internal/archiveserver
Package archiveserver implements an HTTP server for downloading archives.
Package archiveserver implements an HTTP server for downloading archives.
internal/backoff
Package backoff implements backoff algorithms for retrying operations.
Package backoff implements backoff algorithms for retrying operations.
internal/clusterstate
DO NOT MODIFY THIS STATE IT HAS ALREADY SHIPPED IN A RELEASE
DO NOT MODIFY THIS STATE IT HAS ALREADY SHIPPED IN A RELEASE
internal/clusterstate/migrationutils
sanitize.go is taken from https://github.com/jackc/pgx/blob/v5.5.0/internal/sanitize/sanitize.go
sanitize.go is taken from https://github.com/jackc/pgx/blob/v5.5.0/internal/sanitize/sanitize.go
internal/cmputil
Package cmputil provides utilities for cmp.Diff.
Package cmputil provides utilities for cmp.Diff.
internal/conditionalrequest
Package conditionalrequest handles HTTP conditional requests based on modification time and etags.
Package conditionalrequest handles HTTP conditional requests based on modification time and etags.
internal/dlock
Package dlock implements a distributed lock on top of etcd.
Package dlock implements a distributed lock on top of etcd.
internal/dockertestenv
package dockertestenv provides test environment where service dependencies are docker containers
package dockertestenv provides test environment where service dependencies are docker containers
internal/fileserver
Package fileserver implements a server for downloading PFS files over plain HTTP (i.e.
Package fileserver implements a server for downloading PFS files over plain HTTP (i.e.
internal/jsonschema
Package jsonschema bundles the generated JSON schemas.
Package jsonschema bundles the generated JSON schemas.
internal/kindenv
Package kindenv manages Kind (github.com/kubernetes-sigs/kind) environments.
Package kindenv manages Kind (github.com/kubernetes-sigs/kind) environments.
internal/limit
Package limit provides primitives to limit concurrency.
Package limit provides primitives to limit concurrency.
internal/log
Package log is Pachyderm's logger.
Package log is Pachyderm's logger.
internal/meters
Package meters implements lightweight metrics for internal use.
Package meters implements lightweight metrics for internal use.
internal/middleware/auth/httpauth
Package httpauth extracts auth information from an HTTP request.
Package httpauth extracts auth information from an HTTP request.
internal/middleware/logging/client
Package client contains GRPC client interceptors for logging.
Package client contains GRPC client interceptors for logging.
internal/miscutil
Package miscutil provides an "Island of Misfit Toys", but for helper functions
Package miscutil provides an "Island of Misfit Toys", but for helper functions
internal/pachconfig
Package pachconfig contains the configuration models for Pachyderm.
Package pachconfig contains the configuration models for Pachyderm.
internal/pachctl
Package pachctl contains utilities for implementing pachctl commands.
Package pachctl contains utilities for implementing pachctl commands.
internal/pachd
Package pachd implements the Pachyderm dæmon and its various modes.
Package pachd implements the Pachyderm dæmon and its various modes.
internal/pctx
Package pctx implements contexts for Pachyderm.
Package pctx implements contexts for Pachyderm.
internal/pfsdb
Package pfsdb contains the database schema that PFS uses.
Package pfsdb contains the database schema that PFS uses.
internal/ppsdb
Package ppsdb contains the database schema that PPS uses.
Package ppsdb contains the database schema that PPS uses.
internal/ppsutil
Package ppsutil contains utilities for various PPS-related tasks, which are shared by both the PPS API and the worker binary.
Package ppsutil contains utilities for various PPS-related tasks, which are shared by both the PPS API and the worker binary.
internal/preflight
Package preflight offers checks that can be run by pachd in preflight mode.
Package preflight offers checks that can be run by pachd in preflight mode.
internal/proc
Package proc contains utilities for monitoring the resource use of processes.
Package proc contains utilities for monitoring the resource use of processes.
internal/profileutil
Profileutil contains functionality to export performance information to external systems.
Profileutil contains functionality to export performance information to external systems.
internal/promutil
Package promutil contains utilities for collecting Prometheus metrics.
Package promutil contains utilities for collecting Prometheus metrics.
internal/protoutil
Package protoutil contains some utilities for interacting with protocol buffer objects.
Package protoutil contains some utilities for interacting with protocol buffer objects.
internal/sdata/csv
Package csv reads and writes comma-separated values (CSV) files.
Package csv reads and writes comma-separated values (CSV) files.
internal/serde
Package serde contains Pachyderm-specific data structures for marshalling and unmarshalling Go structs and maps to structured text formats (currently just JSON and YAML).
Package serde contains Pachyderm-specific data structures for marshalling and unmarshalling Go structs and maps to structured text formats (currently just JSON and YAML).
internal/setupenv
Package setupenv manages creating pachd.*Envs from pachconfig objects.
Package setupenv manages creating pachd.*Envs from pachconfig objects.
internal/signals
Package signals implements cross-platform signal-handling.
Package signals implements cross-platform signal-handling.
internal/starlark
Package starlark runs Pachyderm-specific starlark programs.
Package starlark runs Pachyderm-specific starlark programs.
internal/starlark/lib/k8s
Package k8s is a Kubernetes API binding for Starlark.
Package k8s is a Kubernetes API binding for Starlark.
internal/starlark/starcmp
Package starcmp provides utilities for running cmp.Diff on Starlark values.
Package starcmp provides utilities for running cmp.Diff on Starlark values.
internal/storage/chunk
Package chunk provides access to data through content-addressed chunks.
Package chunk provides access to data through content-addressed chunks.
internal/storage/fileset
Package fileset provides access to files through file sets.
Package fileset provides access to files through file sets.
internal/storage/fileset/index
Package index provides access to files through multilevel indexes.
Package index provides access to files through multilevel indexes.
internal/storage/kv
Package kv provides a Key-Value Store interface and a few implementations.
Package kv provides a Key-Value Store interface and a few implementations.
internal/stream
Package stream contains interfaces and helper functions for managing iterators that can block.
Package stream contains interfaces and helper functions for managing iterators that can block.
internal/transactiondb
Package transactiondb contains the database schema that Pachyderm transactions use.
Package transactiondb contains the database schema that Pachyderm transactions use.
internal/transforms
package transforms contains PPS Pipeline Transform implementations
package transforms contains PPS Pipeline Transform implementations
internal/watch
Package watch implements better watch semantics on top of etcd.
Package watch implements better watch semantics on top of etcd.
internal/weblinker
Package weblinker generates links to Pachyderm resources served over HTTP.
Package weblinker generates links to Pachyderm resources served over HTTP.
license
Package license is a reverse proxy.
Package license is a reverse proxy.
pfs
Package pfs is a reverse proxy.
Package pfs is a reverse proxy.
pjs
Package pjs is a reverse proxy.
Package pjs is a reverse proxy.
pps
Package pps is a reverse proxy.
Package pps is a reverse proxy.
proto/protoc-gen-zap
Command protoc-gen-zap generates MarshalLogObject methods for protocol buffer messages; allowing them to be printed with zap.Object().
Command protoc-gen-zap generates MarshalLogObject methods for protocol buffer messages; allowing them to be printed with zap.Object().
protoextensions
Package protoextensions is the runtime support code for protoc-gen-zap (in ../etc/proto).
Package protoextensions is the runtime support code for protoc-gen-zap (in ../etc/proto).
proxy
Package proxy is a reverse proxy.
Package proxy is a reverse proxy.
server/cmd/pachhttp
Command pachhttp runs the Pachyderm HTTP server locally, against your current pachctl context.
Command pachhttp runs the Pachyderm HTTP server locally, against your current pachctl context.
server/debug/server/debugstar
Package debugstar lets parts of the debug dump machinery be used by Starlark scripts.
Package debugstar lets parts of the debug dump machinery be used by Starlark scripts.
server/http
Package http is a browser-targeted HTTP server for Pachyderm.
Package http is a browser-targeted HTTP server for Pachyderm.
server/pfs/s3
TODO: the s2 library checks the type of the error to decide how to handle it, which doesn't work properly with wrapped errors
TODO: the s2 library checks the type of the error to decide how to handle it, which doesn't work properly with wrapped errors
transaction
Package transaction is a reverse proxy.
Package transaction is a reverse proxy.
version/versionpb
Package versionpb is a reverse proxy.
Package versionpb is a reverse proxy.
worker
Package worker is a reverse proxy.
Package worker is a reverse proxy.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL