Version: v2.40.0-RC2 Latest Latest

This package is not in the latest version of its module.

Go to latest
Published: Jun 23, 2022 License: Apache-2.0, BSD-3-Clause, MIT Imports: 38 Imported by: 0



Package dataflowlib translates a Beam pipeline model to the Dataflow API job model, for submission to Google Cloud Dataflow.



This section is empty.


This section is empty.


func Execute

func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL, endpoint string, async bool) (*dataflowPipelineResult, error)

Execute submits a pipeline as a Dataflow job.

func Fixup

func Fixup(p *pipepb.Pipeline) (*pipepb.Pipeline, error)

Fixup proto pipeline with Dataflow quirks.

func FromMetricUpdates

func FromMetricUpdates(allMetrics []*df.MetricUpdate, p *pipepb.Pipeline) *metrics.Results

FromMetricUpdates extracts metrics from a slice of MetricUpdate objects and groups them into counters, distributions and gauges.

Dataflow currently only reports Counter and Distribution metrics to Cloud Monitoring. Gauge metrics are not supported. The output metrics.Results will not contain any gauges.

func GetMetrics

func GetMetrics(ctx context.Context, client *df.Service, project, region, jobID string) (*df.JobMetrics, error)

GetMetrics returns a collection of metrics describing the progress of a job by making a call to Cloud Monitoring service.

func GetRunningJobByName added in v2.37.0

func GetRunningJobByName(client *df.Service, project, region string, name string) (*df.Job, error)

GetRunningJobByName gets a Dataflow job running by its name and returns an error if none match.

func NewClient

func NewClient(ctx context.Context, endpoint string) (*df.Service, error)

NewClient creates a new dataflow client with default application credentials and CloudPlatformScope. The Dataflow endpoint is optionally overridden.

func PrintJob

func PrintJob(ctx context.Context, job *df.Job)

PrintJob logs the Dataflow job.

func ResolveXLangArtifacts

func ResolveXLangArtifacts(ctx context.Context, edges []*graph.MultiEdge, project, url string) ([]string, error)

ResolveXLangArtifacts resolves cross-language artifacts with a given GCS URL as a destination, and then stages all local artifacts to that URL. This function returns a list of staged artifact URLs.

func StageModel

func StageModel(ctx context.Context, project, modelURL string, model []byte) error

StageModel uploads the pipeline model to GCS as a unique object.

func Submit

func Submit(ctx context.Context, client *df.Service, project, region string, job *df.Job, updateJob bool) (*df.Job, error)

Submit submits a prepared job to Cloud Dataflow.

func Translate

func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL string) (*df.Job, error)

Translate translates a pipeline to a Dataflow job.

func WaitForCompletion

func WaitForCompletion(ctx context.Context, client *df.Service, project, region, jobID string) error

WaitForCompletion monitors the given job until completion. It logs any messages and state changes received.


type JobOptions

type JobOptions struct {
	// Name is the job name.
	Name string
	// Experiments are additional experiments.
	Experiments []string
	// DataflowServiceOptions are additional job modes and configurations for Dataflow
	DataflowServiceOptions []string
	// Pipeline options
	Options runtime.RawOptions

	Project             string
	Region              string
	Zone                string
	KmsKey              string
	Network             string
	Subnetwork          string
	NoUsePublicIPs      bool
	NumWorkers          int64
	DiskSizeGb          int64
	DiskType            string
	MachineType         string
	Labels              map[string]string
	ServiceAccountEmail string
	WorkerRegion        string
	WorkerZone          string
	ContainerImage      string
	ArtifactURLs        []string // Additional packages for workers.
	FlexRSGoal          string
	EnableHotKeyLogging bool

	// Streaming update settings
	Update               bool
	TransformNameMapping map[string]string

	// Autoscaling settings
	Algorithm            string
	MaxNumWorkers        int64
	WorkerHarnessThreads int64

	TempLocation     string
	TemplateLocation string

	// Worker is the worker binary override.
	Worker string
	// WorkerJar is a custom worker jar.
	WorkerJar string

	TeardownPolicy string

JobOptions capture the various options for submitting jobs to Dataflow.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL