argo-dataflow

module
v0.10.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 25, 2022 License: Apache-2.0

README

Dataflow

Build codecov

Summary

Dataflow is a Kubernetes-native platform for executing large parallel data-processing pipelines.

Each pipeline is specified as a Kubernetes custom resource which consists of one or more steps which source and sink messages from data sources such Kafka, NATS Streaming, or HTTP services.

Each step runs zero or more pods, and can scale horizontally using HPA or based on queue length using built-in scaling rules. Steps can be scaled-to-zero, in which case they periodically briefly scale-to-one to measure queue length so they can scale a back up.

Learn more about features.

Introduction to Dataflow

Use Cases

  • Real-time "click" analytics
  • Anomaly detection
  • Fraud detection
  • Operational (including IoT) analytics

Screenshot

Screenshot

Example

pip install git+https://github.com/argoproj-labs/argo-dataflow#subdirectory=dsls/python
from argo_dataflow import cron, pipeline

if __name__ == '__main__':
    (pipeline('hello')
     .namespace('argo-dataflow-system')
     .step(
        (cron('*/3 * * * * *')
         .cat()
         .log())
    )
     .run())

Documentation

Read in order:

Beginner:

Intermediate:

Advanced

Architecture Diagram

Architecture

Directories

Path Synopsis
api
v1alpha1
Package v1alpha1 contains API Schema definitions for the dataflow v1alpha1 API group +kubebuilder:object:generate=true +groupName=dataflow.argoproj.io
Package v1alpha1 contains API Schema definitions for the dataflow v1alpha1 API group +kubebuilder:object:generate=true +groupName=dataflow.argoproj.io
git
hack
kafka Module
runtimes
sdks
golang
Code generated by gen.sh.
Code generated by gen.sh.
shared

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL