leoflow

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 3, 2026 License: Apache-2.0 Imports: 1 Imported by: 0

README ΒΆ

Leoflow

Leoflow

The workflow orchestrator that ate Apache Airflow's lunch.
Same UI. Same vocabulary. Ten times the speed. Zero of the pain.
Native map-reduce for ML/AI β€” fan-out + reduce as a Python list comprehension, no XCom plumbing, no broker, no special operator.

Go Report Card CI Security OpenSSF Scorecard OpenSSF Best Practices


πŸ“š Documentation

Full docs β†’ https://neochaotic.github.io/leoflow/ (DAG authoring, deploy, API reference, architecture).

Operating modes Β· Editions Lite Β· Pro Β· Demo β€” the runtime split and the packaging split
DAG authoring write a DAG; the Lite β†’ deploy lifecycle
Map-reduce for ML fan-out + reduce as a Python list comprehension
CI/CD & deploy examples GitHub Actions Β· GitLab Β· Cloud Build/Run Β· generic
Helm chart Pro install: values reference, hardening, PoC recipe
HTTP API (Scalar) Β· Go packages API references
Concepts & glossary Β· Architecture the model & the why

⚑ Install

Leoflow ships in two editions. Pick the track that matches your target:

πŸ₯ˆ Lite β€” local laptop, single VM, internal demo

curl -fsSL https://raw.githubusercontent.com/neochaotic/leoflow/main/install.sh | sh
leoflow lite                # hot-reload at http://localhost:8088 (LITE badge)

Installs the binaries and runs leoflow setup (ensures Python, provisions the parser, creates your workspace at ~/leoflow/) β€” no sudo, no system Python, no package manager. Docker is optional and only unlocks the Kubernetes executor for higher-fidelity local runs. Linux + macOS, amd64 + arm64 (Windows via WSL2). Each DAG gets its own per-DAG venv under ~/.leoflow/dev/venvs/<dag_id>/ so conflicting dependencies between DAGs coexist out of the box.

πŸ₯‡ Pro β€” Kubernetes cluster (Helm)

helm repo add leoflow https://neochaotic.github.io/leoflow   # (charts published per release)
kubectl create namespace leoflow
helm install lf leoflow/leoflow -n leoflow \
  --set image.tag=v0.0.1 \
  --set migrations.image.tag=v0.0.1 \
  --set database.url='postgres://USER:PASS@HOST:5432/leoflow?sslmode=verify-full' \
  --set redis.url='rediss://HOST:6380/0' \
  --set auth.jwtSecret="$(openssl rand -base64 64)" \
  --set secretKey="$(openssl rand -hex 16)" \
  --set bootstrap.password='change-me'

Pro deploys the control plane on a real cluster (leoflow-server Deployment

  • RBAC for the pod-per-task executor + a pre-install migrations Job). External Postgres 13+ and Redis 6+ are required (the chart fails the install otherwise β€” embedded datastores are Lite-only). Managed datastores work out of the box (Cloud SQL / RDS / Memorystore / ElastiCache / Azure Cache), with optional caConfigMap knobs for verified TLS. See the chart docs.

Full guide for both tracks β†’ Installation.


The Five Wounds Apache Airflow Will Not Heal

Airflow is the most widely deployed workflow orchestrator on earth. It is also the one that bleeds the most in production. Every data engineer recognizes these wounds:

  • The scheduler that stalls. Three seconds between tasks. Ten when the cluster is busy. Pipelines that should run in two minutes take twenty.
  • The triggerer that suffocates. Above five hundred concurrent sensors, the Python asyncio loop chokes. Sensors stop firing. SLAs miss.
  • The DAG file that re-parses itself to death. Every scheduler loop opens every .py file in /dags. CPU spikes for nothing. Memory grows. Restarts become a ritual.
  • The worker that leaks until it dies. Long-running Celery workers accumulate file descriptors, database connections, half-loaded modules. OOMKilled at three in the morning. Always.
  • The dependency hell that has no door. pandas==1.0 for the legacy DAG, pandas==2.0 for the new one. One Airflow image. Pick a side. Cry either way.

Leoflow was built to close these five wounds, on day one, by construction.

How It Closes Them

Leoflow does not invent a new execution model. Pod-per-task is the right pattern, and Airflow's KubernetesExecutor proved it years ago. What Leoflow does is strip out the Python overhead from every layer of the orchestration stack:

Wound Airflow today Leoflow
Scheduler latency 3-10 seconds per decision <200 ms β€” native Go, zero GIL
Sensor concurrency ~500 (asyncio Triggerer) 100,000+ β€” each sensor is a 2 KB goroutine
DAG parsing cost Re-parsed every scheduler loop Zero β€” DAG is pre-compiled to immutable JSON
Worker lifecycle Long-lived, leak-prone Ephemeral pod per task β€” spawn, run, die
Worker image size 1.5 GB+ Airflow base 200 MB typical β€” each DAG is its own slim image
Dependency isolation Workaround via KubernetesPodOperator Native β€” every DAG is a container
Cold start 15-45 s 2-5 s target β€” agent is a 15 MB static binary
Observability Retrofitted with effort Native β€” Prometheus + OpenTelemetry + structured logs from commit one

This is not marketing. This is what falls out of replacing a Python control plane with Go and embracing the container as the unit of isolation.

What Leoflow Is

Leoflow is a GitOps-first, container-native workflow orchestrator written in Go. Each phrase carries weight:

  • GitOps-first. Your DAG is a versioned artifact (dag.json + container image), not live source code. CI builds it. The registry stores it. Rollback is a tag change.
  • Container-native. Each DAG is its own container image, with its own dependencies, its own Python version, its own everything. Built automatically from a one-page leoflow.yaml β€” you never touch Docker unless you want to.
  • Airflow-UI compatible. The MVP runs the unmodified Apache Airflow 3.2.x UI. Your team's muscle memory survives the migration. No new tool to learn.
  • Go performance, Go discipline. Static binary. No GIL. Goroutines for concurrency. Test-driven from the first commit. Go Report Card A+ enforced in CI.

What It Looks Like to Use

A complete Leoflow DAG project. No Dockerfile. No requirements.txt. No CI plumbing to invent.

# leoflow.yaml
dag_id: etl_vendas
python_version: "3.11"
dependencies:
  - pandas==2.1.0
  - requests==2.31.0
# dag.py
from leoflow import DAG, task

@task
def fetch():
    import requests
    return requests.get("https://api.example.com/orders").json()

@task
def transform(orders):
    return [{"id": o["id"], "value": o["amount"] * 1.1} for o in orders]

with DAG("etl_vendas", schedule="0 5 * * *") as dag:
    raw = fetch()
    processed = transform(raw)
leoflow compile .              # generates Dockerfile, builds image, produces dag.json
leoflow push ./dag.json        # registers with the control plane

That is the entire developer surface. The CLI builds the image against an official base (leoflow/python-runtime:3.11), pushes to your registry, and registers a versioned DAG. The Airflow UI shows it at the next refresh.

Native map-reduce for ML/AI

Hyperparameter search, k-fold cross-validation, ensemble training, batch inference, sharded preprocessing, Monte Carlo β€” every parallel ML workload is map-reduce. Most orchestrators make you build it: an operator per fan-out, a broker for the intermediate values, shared storage for the artifacts, and a custom reducer that knows how to find them all. Leoflow expresses the whole pattern in two lines of Python:

from airflow.sdk import DAG, task

@task
def trial(lr: float) -> dict:
    return train_one(lr)                            # map

@task
def select_best(trials: list[dict]) -> dict:
    return max(trials, key=lambda r: r["score"])    # reduce

with DAG("hparam_search", schedule=None):
    select_best([trial(lr) for lr in [0.001, 0.01, 0.05, 0.1, 0.5]])

That [trial(lr) for lr in …] is the whole map. trials: list[dict] is the whole reduce. No XCom plumbing, no broker setup, no shared filesystem, no special operator β€” the parser captures the list shape at compile time; the runtime assembles the upstream XComs in declaration order and delivers them as a real Python list. Per-trial isolation (own pod / own process), per-trial retry, deterministic ordering, and a 256 KB cap per upstream β€” and a null slot for any upstream that legitimately produced no result.

ML pattern Map Reduce
Hyperparameter search one task per (lr, batch, seed) triple pick the best metric
K-fold cross-validation one task per fold average the metrics
Ensemble training one task per base model combine predictions / stack
Batch inference one task per partition collect predictions to a sink
Monte Carlo one task per worker average / sum results

Runnable example: examples/ml_hparam_search/. Full reference: Map-reduce for ML β€” guarantees, limits, what activates fan-in vs what does not, and the on-disk dag.json shape.

Architecture

A DAG is compiled into an immutable artifact (a dag.json spec plus a container image) and pushed to the control plane. A Go control plane schedules it and, for each task, dispatches an ephemeral worker pod whose leoflow-agent runs the user code and reports back over gRPC. Postgres holds metadata; Redis holds XCom values and live-log fan-out.

flowchart LR
    subgraph dev["Author / CI"]
        src["leoflow.yaml Β· dag.py Β· Dockerfile"]
    end

    subgraph cp["Control plane β€” Go"]
        direction TB
        api["HTTP API /api/v2<br/>JWT Β· RBAC Β· multi-tenant"]
        sched["Scheduler<br/>state machine Β· cron<br/>leader election Β· retries"]
        asvc["Agent gRPC service<br/>task spec Β· state Β· XCom Β· logs"]
        api --- sched --- asvc
    end

    src -->|"leoflow compile / push"| api
    sched --- pg[("Postgres<br/>metadata")]
    asvc --- redis[("Redis<br/>XCom Β· log tail")]

    sched -->|"dispatch: one pod per task"| pod
    subgraph k8s["Kubernetes"]
        pod["Worker pod = your DAG image<br/>leoflow-agent ⇄ your Python / Bash"]
    end
    pod -->|"gRPC: register Β· fetch spec Β· push XCom Β· stream logs Β· report state"| asvc

    classDef store fill:#1f2937,stroke:#4b5563,color:#e5e7eb;
    class pg,redis store;

Short-lived http_api tasks skip the pod and run inline as goroutines in the control plane (capped); everything else runs pod-per-task. Read the ADRs for the reasoning behind every decision.

Status

πŸ§ͺ Experimental β€” pre-1.0. The HTTP API (/api/v2), CLI, and Helm chart values may change between minor versions until v1.0.0 locks them. Production Pro deployments are supported by the Helm chart today, with the usual caveats that come with a pre-1.0 codebase: pin to a specific tag, read the upgrades guide before bumping, and exercise backup/restore before you need to.

Versioning follows ADR 0037: v0.0.1 ends the pre-alpha series; every release after is vX.Y.Z-rc.N β†’ vX.Y.Z.

Implemented today (Phases 1–4):

  • CLI + parser β€” leoflow init / validate / compile / push / runs trigger / runs status / auth create-token; the Python DAG parser; compile --build / --push builds and pushes the DAG image (out-of-process).
  • Control plane β€” Airflow-compatible /api/v2 API, JWT auth + RBAC + multi-tenant, the scheduler state machine with cron scheduling, Postgres advisory-lock leader election, task retries, embedded Scalar API docs, and Prometheus + OpenTelemetry observability.
  • Execution β€” real pod-per-task execution via the leoflow-agent over gRPC (Kubernetes, ADR 0015), plus inline http_api goroutines for short calls; orphaned-pod reconciliation and completed-pod garbage collection.
  • Data flow β€” XCom on Redis (256 KB limit, TTL, optional schema validation) passed between tasks; log shipping to disk with a read API and live tailing over Redis pub/sub.

Not yet implemented: load tests (Phase 6) and S3/GCS log sinks. Tracked refinements live in the issue tracker.

Features in the MVP

Shipping in v0.1.0:

  • Python, Bash, and HTTP API operators
  • DAG-as-Image model with automatic image build via leoflow.yaml
  • Hybrid DAG authoring: Python source parsed at compile time, or declarative YAML
  • XCom on Redis with 256 KB limit, TTL, and optional schema validation
  • Apache Airflow 3.2.x UI compatibility (no fork required)
  • JWT authentication, RBAC, multi-tenant schema (OIDC-ready)
  • Kubernetes-native execution (no worker pool to manage)
  • Local development on Kubernetes (k3d/kind) or a dev-only subprocess executor (ADR 0015)
  • Trigger rules: all_success, all_failed, all_done, one_success, one_failed
  • Clear task instance to re-run failed tasks
  • Leader election via Postgres advisory locks
  • OpenSSF Best Practices compliance, signed releases (cosign), supply chain scanning (govulncheck + Trivy + CodeQL)

On the post-MVP roadmap:

  • Optimized backfill (parallel execution with throttling)
  • UI scaling for 10,000+ DAGs (caching, server-side pagination)
  • Dynamic task mapping
  • OIDC authentication (Google, Azure AD, Keycloak, Okta)
  • Mark success/failed manually
  • Custom UI (replacing the Airflow UI)
  • Deferrable tasks (efficient dispatch + long-poll pattern, native Go implementation without a separate Triggerer process β€” see ADR 0016)

Getting Started

Try it locally (one command)

After the Lite install above, just run:

leoflow lite

…then open http://localhost:8088 (the LITE badge confirms you're on the Lite instance). The first run provisions a managed Postgres + admin login, drops example DAGs in ~/leoflow/examples/, and hot-reloads every save. Recover the admin password any time with leoflow lite reset-password.

Lite is the primary local path. The legacy Docker-Compose demo profile (docker compose --profile demo up --build, login admin@leoflow.local / admin) still works for CI / containerized-only environments β€” see docs/local-deploy.md. The pinned Airflow 3.2.x UI is a tactical MVP choice; a purpose-built Leoflow UI is the long-term direction (ADR 0018).

Local development

git clone https://github.com/neochaotic/leoflow
cd leoflow
make setup            # Go tools, Python parser, pre-commit hook
make build            # builds bin/leoflow, bin/leoflow-server, bin/leoflow-agent

# Start Postgres + Redis (Docker) and apply migrations
make dev-up           # docker compose up --wait + migrate-up; `make dev-down` to stop

# Run the control plane (bootstraps a default admin user)
LEOFLOW_AUTH_JWT_SECRET=dev LEOFLOW_BOOTSTRAP_PASSWORD=admin123 ./bin/leoflow-server &
# API docs (Scalar) at http://localhost:8080/docs ; metrics at http://localhost:9090/metrics

# Author, compile, and register a DAG
./bin/leoflow init my-dag
./bin/leoflow compile my-dag --image my-dag:dev -o my-dag/dag.json
TOKEN=$(./bin/leoflow auth create-token --username admin@leoflow.local --password admin123)
./bin/leoflow push my-dag/dag.json --token "$TOKEN"

Two dev environments. make dev-up runs Postgres + Redis as plain Docker containers on the host for a fast inner loop (control plane on the host). For full in-cluster execution (control plane and dependencies on a local Kubernetes cluster, mirroring production and exercising real task pods), the Helm chart is installable on any K8s cluster β€” chart-test CI gates every change with helm lint + helm-unittest (41 tests) + kind install/upgrade smoke. Task execution is on Kubernetes only (ADR 0015); the host containers are dev dependencies, not the execution path.

The Airflow 3.2.1 UI ships embedded in the server and is served at / (see the one-command demo above and docs/ui-compatibility.md). The Scalar API reference is at /docs. Load tests are the remaining Phase 6 work.

Honest Comparison

We have no patience for marketing fiction. Here is where Leoflow sits in the landscape:

Airflow Argo Workflows Prefect Dagster Leoflow
Language of control plane Python Go Python Python Go
Pod-per-task model Optional (KubernetesExecutor) Yes Optional Optional Yes, only mode
Dependency isolation per DAG Workaround Manual Partial Partial Native
UI familiar to Airflow users Yes No No No Yes (Airflow UI)
GitOps-first DAG model No Yes Partial Partial Yes
Scheduler in Go (no GIL) No Yes No No Yes
Native observability Add-on Partial Partial Yes Built-in
Mental model Celery-era K8s-native Python-native Software-defined assets K8s-native + Airflow vocabulary

We borrow from Argo Workflows (container-native), from Prefect (modern developer experience), and from Airflow (the UI and vocabulary). We do not pretend we invented any of those. We just put them together in a way nobody had.

Documentation

Engineering Discipline

Leoflow holds itself to a higher bar than most open source projects, because workflow orchestrators must be boring and reliable to be useful:

  • Strict TDD β€” every line of production code is preceded by a failing test (ADR 0011)
  • Go Report Card A+ β€” enforced in CI from the first commit (ADR 0012)
  • GoDocs on every exported identifier β€” no exceptions
  • Supply chain security from day one β€” govulncheck, gosec, Trivy, CodeQL, Scorecard, signed releases (ADR 0014)
  • Per-phase coverage floors β€” rising from 70% to 85% across the MVP phases
  • Native observability β€” Prometheus, OpenTelemetry, structured logs from commit one (ADR 0010)

If you contribute, read the CONTRIBUTING guide first.

License

Apache License 2.0. See LICENSE.

Acknowledgements

Leoflow stands on the shoulders of Apache Airflow. The team behind Airflow defined the vocabulary, proved the architecture, and built the UI that Leoflow reuses without modification in the MVP. This project would not exist without their work, and we credit them at every layer of our documentation.

We also studied the source of Argo Workflows, Prefect, and Dagster carefully. Each made decisions worth borrowing, and we did.


Star the repo if you have ever waited five seconds for an Airflow task to start. Watch the repo if you want a heads-up on every release. Open an issue if you have a chronic Airflow pain we have not addressed yet β€” pre-1.0 is the time to shape the API.

Documentation ΒΆ

Overview ΒΆ

Package leoflow embeds the Python parser and runtime package sources so a binary-only install (no source checkout) can provision them. The dev source trees under parser/ and runtime/python/ remain the canonical copies; this embed reads them directly at build time, so there is no duplicated source.

The all: prefix is required on the package directories: Python packages contain __init__.py, and go:embed's default rules drop names beginning with "_" or ".". The patterns are scoped to the package dir plus pyproject.toml / README.md (what pip needs) so the dev-only test fixtures and dot-caches (.pytest_cache, .ruff_cache, .coverage) are not embedded. Build from a clean tree (no __pycache__) so stale bytecode is not embedded.

Index ΒΆ

Constants ΒΆ

This section is empty.

Variables ΒΆ

This section is empty.

Functions ΒΆ

func DevCompose ΒΆ

func DevCompose() []byte

DevCompose returns the embedded docker-compose for Leoflow Lite's local Postgres + Redis, so a binary-only install (no source checkout) can bring the datastores up with `leoflow lite` alone β€” it is materialized under ~/.leoflow on first run.

func ExampleDAGs ΒΆ

func ExampleDAGs() embed.FS

ExampleDAGs returns the embedded DAG examples (one subdirectory per DAG, each with dag.py + leoflow.yaml). The Lite IDE's "Download examples" button materializes them into the user's workspace under examples/, so a fresh install can try every operator without a separate git checkout. Root is "examples/" β€” the same layout as the source tree.

func PythonSources ΒΆ

func PythonSources() embed.FS

PythonSources returns the embedded parser and runtime package sources, rooted so that "parser/..." and "runtime/python/..." resolve.

Types ΒΆ

This section is empty.

Directories ΒΆ

Path Synopsis
cmd
leoflow command
Command leoflow is the developer CLI for authoring and compiling DAGs.
Command leoflow is the developer CLI for authoring and compiling DAGs.
leoflow-agent command
Command leoflow-agent runs as PID 1 inside every task pod.
Command leoflow-agent runs as PID 1 inside every task pod.
leoflow-server command
Command leoflow-server runs the Leoflow control plane: the HTTP API, auth, metrics, and (when enabled) the scheduler.
Command leoflow-server runs the Leoflow control plane: the HTTP API, auth, metrics, and (when enabled) the scheduler.
internal
agent
Package agent contains the worker-side logic that runs inside the task container: building the user process command, injecting XCom inputs, reading the return value, and retry backoff.
Package agent contains the worker-side logic that runs inside the task container: building the user process command, injecting XCom inputs, reading the return value, and retry backoff.
agentrpc
Package agentrpc implements the control-plane side of the agent gRPC protocol: it authenticates each in-pod agent by its per-task-instance token, serves the task specification, and records the state transitions the agent reports.
Package agentrpc implements the control-plane side of the agent gRPC protocol: it authenticates each in-pod agent by its per-task-instance token, serves the task specification, and records the state transitions the agent reports.
api
Package api implements the Airflow-compatible HTTP control plane (ADR 0007).
Package api implements the Airflow-compatible HTTP control plane (ADR 0007).
auth
Package auth provides JWT authentication, password hashing, the RBAC permission model, and login rate limiting for the control plane (ADR 0008).
Package auth provides JWT authentication, password hashing, the RBAC permission model, and login rate limiting for the control plane (ADR 0008).
cli
Package cli implements the leoflow command-line interface.
Package cli implements the leoflow command-line interface.
config
Package config loads Leoflow configuration from defaults, an optional config file, and LEOFLOW_* environment variables, with flags taking precedence.
Package config loads Leoflow configuration from defaults, an optional config file, and LEOFLOW_* environment variables, with flags taking precedence.
dispatch
Package dispatch launches pod-path task instances: it resolves a task's execution context, mints the agent's identity token, and routes the request to the executor.
Package dispatch launches pod-path task instances: it resolves a task's execution context, mints the agent's identity token, and routes the request to the executor.
domain
Package domain defines the core Leoflow types (DAG, Task, project config) and validates them against the canonical JSON Schemas in docs/api.
Package domain defines the core Leoflow types (DAG, Task, project config) and validates them against the canonical JSON Schemas in docs/api.
executor
Package executor runs task instances via Kubernetes, Docker, a subprocess, or inline HTTP, selected by the Router (ADR 0002).
Package executor runs task instances via Kubernetes, Docker, a subprocess, or inline HTTP, selected by the Router (ADR 0002).
logs
Package logs ships task logs from the agent to durable storage and serves them back via the API, so logs remain available after the task pod is gone.
Package logs ships task logs from the agent to durable storage and serves them back via the API, so logs remain available after the task pod is gone.
observability
Package observability wires structured logging, Prometheus metrics, and OpenTelemetry tracing for the Leoflow control plane (ADR 0010).
Package observability wires structured logging, Prometheus metrics, and OpenTelemetry tracing for the Leoflow control plane (ADR 0010).
scheduler
Package scheduler implements the Leoflow scheduling state machine and loop.
Package scheduler implements the Leoflow scheduling state machine and loop.
secrets
Package secrets encrypts sensitive values (e.g.
Package secrets encrypts sensitive values (e.g.
setup
Package setup implements host detection and bootstrap for `leoflow setup` and `leoflow doctor`: it determines the platform, which dependencies are present, and which operating tier is achievable, preferring relocatable downloads into ~/.leoflow over system package managers.
Package setup implements host detection and bootstrap for `leoflow setup` and `leoflow doctor`: it determines the platform, which dependencies are present, and which operating tier is achievable, preferring relocatable downloads into ~/.leoflow over system package managers.
storage
Package storage wraps the Postgres and Redis connections used by the control plane, exposing the sqlc-generated query set and health checks.
Package storage wraps the Postgres and Redis connections used by the control plane, exposing the sqlc-generated query set and health checks.
ui
Package ui embeds and serves the pinned Apache Airflow 3.2.1 React SPA bundle.
Package ui embeds and serves the pinned Apache Airflow 3.2.1 React SPA bundle.
version
Package version exposes build metadata embedded into the binary at link time.
Package version exposes build metadata embedded into the binary at link time.
workspace
Package workspace provides a filesystem confined to a single root directory.
Package workspace provides a filesystem confined to a single root directory.
xcom
Package xcom implements the Leoflow XCom subsystem: small typed payloads passed between tasks, stored in Redis with a hard size limit and a TTL, with metadata indexed in Postgres for retrieval (ADR 0006).
Package xcom implements the Leoflow XCom subsystem: small typed payloads passed between tasks, stored in Redis with a hard size limit and a TTL, with metadata indexed in Postgres for retrieval (ADR 0006).
Package migrations embeds the SQL migration files so the control plane can apply them without the source tree present β€” a step toward a binaries-only `leoflow dev` (no checked-out repo required).
Package migrations embeds the SQL migration files so the control plane can apply them without the source tree present β€” a step toward a binaries-only `leoflow dev` (no checked-out repo required).
proto

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL