Version: v2.32.0+incompatible Latest Latest

This package is not in the latest version of its module.

Go to latest
Published: Aug 10, 2021 License: Apache-2.0, BSD-3-Clause, MIT Imports: 35 Imported by: 0



Package dataflowlib translates a Beam pipeline model to the Dataflow API job model, for submission to Google Cloud Dataflow.



This section is empty.


This section is empty.


func Execute

func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL, endpoint string, async bool) (*dataflowPipelineResult, error)

Execute submits a pipeline as a Dataflow job.

func Fixup

func Fixup(p *pipepb.Pipeline) (*pipepb.Pipeline, error)

Fixup proto pipeline with Dataflow quirks.

func FromMetricUpdates

func FromMetricUpdates(allMetrics []*df.MetricUpdate, job *df.Job) *metrics.Results

FromMetricUpdates extracts metrics from a slice of MetricUpdate objects and groups them into counters, distributions and gauges.

Dataflow currently only reports Counter and Distribution metrics to Cloud Monitoring. Gauge metrics are not supported. The output metrics.Results will not contain any gauges.

func GetMetrics

func GetMetrics(ctx context.Context, client *df.Service, project, region, jobID string) (*df.JobMetrics, error)

GetMetrics returns a collection of metrics describing the progress of a job by making a call to Cloud Monitoring service.

func NewClient

func NewClient(ctx context.Context, endpoint string) (*df.Service, error)

NewClient creates a new dataflow client with default application credentials and CloudPlatformScope. The Dataflow endpoint is optionally overridden.

func PrintJob

func PrintJob(ctx context.Context, job *df.Job)

PrintJob logs the Dataflow job.

func ResolveXLangArtifacts

func ResolveXLangArtifacts(ctx context.Context, edges []*graph.MultiEdge, project, url string) ([]string, error)

ResolveXLangArtifacts resolves cross-language artifacts with a given GCS URL as a destination, and then stages all local artifacts to that URL. This function returns a list of staged artifact URLs.

func StageFile

func StageFile(ctx context.Context, project, url, filename string) error

StageFile uploads a file to GCS.

func StageModel

func StageModel(ctx context.Context, project, modelURL string, model []byte) error

StageModel uploads the pipeline model to GCS as a unique object.

func Submit

func Submit(ctx context.Context, client *df.Service, project, region string, job *df.Job) (*df.Job, error)

Submit submits a prepared job to Cloud Dataflow.

func Translate

func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL string) (*df.Job, error)

Translate translates a pipeline to a Dataflow job.

func WaitForCompletion

func WaitForCompletion(ctx context.Context, client *df.Service, project, region, jobID string) error

WaitForCompletion monitors the given job until completion. It logs any messages and state changes received.


type JobOptions

type JobOptions struct {
	// Name is the job name.
	Name string
	// Experiments are additional experiments.
	Experiments []string
	// Pipeline options
	Options runtime.RawOptions

	Project             string
	Region              string
	Zone                string
	Network             string
	Subnetwork          string
	NoUsePublicIPs      bool
	NumWorkers          int64
	DiskSizeGb          int64
	MachineType         string
	Labels              map[string]string
	ServiceAccountEmail string
	WorkerRegion        string
	WorkerZone          string
	ContainerImage      string
	ArtifactURLs        []string // Additional packages for workers.

	// Autoscaling settings
	Algorithm     string
	MaxNumWorkers int64

	TempLocation string

	// Worker is the worker binary override.
	Worker string
	// WorkerJar is a custom worker jar.
	WorkerJar string

	TeardownPolicy string

JobOptions capture the various options for submitting jobs to Dataflow.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL