dppctl

command module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 4, 2023 License: MIT Imports: 6 Imported by: 0

README

Getting started with Go for Data Processing Pipeline (DPP) CLI tool

OpenSSF Scorecard

Overview

This project intends to develop and maintain a command-line (CLI) utility in Go to help deploy data engineering pipelines on modern data stack (MDS).

Even though the members of the GitHub organization may be employed by some companies, they speak on their personal behalf and do not represent these companies.

References

AWS SDK for Go

Getting started

$ go get github.com/data-engineering-helpers/dppctl@vx.y.z
  • Clone and edit the YAML deployment specification. For instance, for a deployment on AWS cloud:
$ cp depl/aws-dev-sample.yaml depl/aws-dev.yaml
$ vi depl/aws-dev.yaml
  • Check the version of the dppctl utility:
$ dppctl -v
[dppctl] 0.0.x-alpha.x
  • Launch the dppctl utility in checking mode (which is the default one):
$ dppctl -f depl/aws-dev.yaml
  • Launch the dppctl utility in deployment mode:
$ dppctl -f depl/aws-dev.yaml -c deploy

Publish the module

  • Recompute the dependencies:
$ go mod tidy
  • Check that the tests pass:
$ go test
  • Tag the Git repository:
$ git commit -m "[Release] v0.0.x-alpha.x"
$ git push
$ git tag -a v0.0.x-alpha.x -m "[Release] v0.0.x-alpha.x"
$ git push --tags
  • Publish the module:
$ GOPROXY=proxy.golang.org go list -m github.com/data-engineering-helpers/dppctl@v0.0.x-alpha.x
github.com/data-engineering-helpers/data-pipeline-deployment v0.0.x-alpha.x

Troubleshooting

AWS Airflow (MWAA)

As of beginning of 2023, apparently for security reasons, it does not seem possible to target/use the Airflow API directly on the AWS managed service (MWAA). One has to use instead the API backend of the MWAA CLI. That is why the Go code of the corresponding AWSAirflowCLI() function is not straightforward. Note that the use of the MWAA CLI API (through curl) is itself convoluted, as detailed below.

References
Listing the DAGs
  • Configuration:
$ export MWAA_ENV="<the-MWAA-environment-name"
  export AWS_REGION="eu-west-1"
  export CLI_TOKEN
  export WEB_SERVER_HOSTNAME
  • Create a CLI (command-line) token:
$ aws mwaa --region $AWS_REGION create-cli-token --name $MWAA_ENV
{
    "CliToken": "someToken",
    "WebServerHostname": "<airflow-id>.$AWS_REGION.airflow.amazonaws.com"
}
  • Copy/paste the web server hostname and the CLI token and save them as environment variables:
$ CLI_TOKEN="someToken"
  WEB_SERVER_HOSTNAME="<airflow-id>.$AWS_REGION.airflow.amazonaws.com"
  • Note that the CLI token is very short-lived (valid for only one or two times) and the two operations (aws mwaa create-cli-token and CLI_TOKEN="some-token") must be repeated every time before the following commands are perfomed

  • Invoke an Airflow command through the API wrapping the MWAA CLI

    • Raw (not formatted) outpout:
$ curl -s --request POST "https://$WEB_SERVER_HOSTNAME/aws_mwaa/cli" --header "Authorization: Bearer $CLI_TOKEN" --header "Content-Type: text/plain" --data-raw "dags list -o json"|jq -r ".stdout" | base64 -d
...
[{"dag_id": "dag_name", "filepath": "prefix/script.py", "owner": "airflow", "paused": "True"}, {"dag_id": ...}, ...]
  • CSV-formatted outpout (list of DAGs):
$ curl -s --request POST "https://$WEB_SERVER_HOSTNAME/aws_mwaa/cli" --header "Authorization: Bearer $CLI_TOKEN" --header "Content-Type: text/plain" --data-raw "dags list -o json"|jq -r ".stdout" | base64 -d | grep "^\[{\"dag_id\"" | jq -r ".[]|[.dag_id,.filepath,.owner,.paused]|@csv" | sed -e s/\"//g
...
...
dag_name,prefix/script.py,airflow,True
...

Directories

Path Synopsis
File: https://github.com/data-engineering-helpers/dppctl/blob/main/service/aws.go
File: https://github.com/data-engineering-helpers/dppctl/blob/main/service/aws.go
tests module
File: https://github.com/data-engineering-helpers/dppctl/blob/main/utilities/depl.go
File: https://github.com/data-engineering-helpers/dppctl/blob/main/utilities/depl.go
File: https://github.com/data-engineering-helpers/dppctl/blob/main/workflow/workflow.go
File: https://github.com/data-engineering-helpers/dppctl/blob/main/workflow/workflow.go

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL