argo-dataflow

module
v0.0.74 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 26, 2021 License: Apache-2.0

README

Argo Dataflow

Build codecov

Summary

Argo Dataflow is a cloud-native platform for executing large parallel data-processing pipelines.

Each pipeline is a Kubernetes custom resource which consists of one or more steps which source and sink messages from data sources such Kafka, NATS Streaming, or HTTP services.

Each step runs 0 or more pods, and can scale horizontally using HPA or based on queue length using built-in scaling rules. Steps can be scaled-to-zero, in which case they periodically briefly scale-to-one to measure queue length so they can scale a back up

Use Cases

  • Real-time "click" analytics
  • Anomaly detection
  • Fraud detection
  • Operational (including IoT) analytics

Screenshot

Screenshot

Example

pip install git+https://github.com/argoproj-labs/argo-dataflow#subdirectory=dsls/python
from argo_dataflow import cron, pipeline

if __name__ == '__main__':
    (pipeline('hello')
     .namespace('argo-dataflow-system')
     .step(
        (cron('*/3 * * * * *')
         .cat('main')
         .log())
    )
     .run())

Documentation

Read in order:

Beginner:

Intermediate:

Advanced

Architecture Diagram

Architecture

Directories

Path Synopsis
api
v1alpha1
Package v1alpha1 contains API Schema definitions for the dataflow v1alpha1 API group +kubebuilder:object:generate=true +groupName=dataflow.argoproj.io
Package v1alpha1 contains API Schema definitions for the dataflow v1alpha1 API group +kubebuilder:object:generate=true +groupName=dataflow.argoproj.io
hack
kafka Module
cat
map
sdks
shared

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL