Documentation ¶
Overview ¶
Package dataflowlib translates a Beam pipeline model to the Dataflow API job model, for submission to Google Cloud Dataflow.
Index ¶
- func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, ...) (*dataflowPipelineResult, error)
- func FromMetricUpdates(allMetrics []*df.MetricUpdate, p *pipepb.Pipeline) *metrics.Results
- func GetMetrics(ctx context.Context, client *df.Service, project, region, jobID string) (*df.JobMetrics, error)
- func GetRunningJobByName(client *df.Service, project, region string, name string) (*df.Job, error)
- func NewClient(ctx context.Context, endpoint string) (*df.Service, error)
- func PrintJob(ctx context.Context, job *df.Job)
- func ResolveXLangArtifacts(ctx context.Context, edges []*graph.MultiEdge, project, url string) ([]string, error)
- func StageModel(ctx context.Context, project, modelURL string, model []byte) error
- func Submit(ctx context.Context, client *df.Service, project, region string, job *df.Job, ...) (*df.Job, error)
- func Translate(ctx context.Context, p *pipepb.Pipeline, opts *JobOptions, ...) (*df.Job, error)
- func WaitForCompletion(ctx context.Context, client *df.Service, project, region, jobID string) error
- type JobOptions
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Execute ¶
func Execute(ctx context.Context, raw *pipepb.Pipeline, opts *JobOptions, workerURL, jarURL, modelURL, endpoint string, async bool) (*dataflowPipelineResult, error)
Execute submits a pipeline as a Dataflow job.
func FromMetricUpdates ¶
FromMetricUpdates extracts metrics from a slice of MetricUpdate objects and groups them into counters, distributions and gauges.
Dataflow currently only reports Counter and Distribution metrics to Cloud Monitoring. Gauge metrics are not supported. The output metrics.Results will not contain any gauges.
func GetMetrics ¶
func GetMetrics(ctx context.Context, client *df.Service, project, region, jobID string) (*df.JobMetrics, error)
GetMetrics returns a collection of metrics describing the progress of a job by making a call to Cloud Monitoring service.
func GetRunningJobByName ¶ added in v2.37.0
GetRunningJobByName gets a Dataflow job running by its name and returns an error if none match.
func NewClient ¶
NewClient creates a new dataflow client with default application credentials and CloudPlatformScope. The Dataflow endpoint is optionally overridden.
func ResolveXLangArtifacts ¶
func ResolveXLangArtifacts(ctx context.Context, edges []*graph.MultiEdge, project, url string) ([]string, error)
ResolveXLangArtifacts resolves cross-language artifacts with a given GCS URL as a destination, and then stages all local artifacts to that URL. This function returns a list of staged artifact URLs.
func StageModel ¶
StageModel uploads the pipeline model to GCS as a unique object.
func Submit ¶
func Submit(ctx context.Context, client *df.Service, project, region string, job *df.Job, updateJob bool) (*df.Job, error)
Submit submits a prepared job to Cloud Dataflow.
Types ¶
type JobOptions ¶
type JobOptions struct { // Name is the job name. Name string // Experiments are additional experiments. Experiments []string // DataflowServiceOptions are additional job modes and configurations for Dataflow DataflowServiceOptions []string // Pipeline options Options runtime.RawOptions Project string Region string Zone string KmsKey string Network string Subnetwork string NoUsePublicIPs bool NumWorkers int64 DiskSizeGb int64 DiskType string MachineType string Labels map[string]string ServiceAccountEmail string WorkerRegion string WorkerZone string ContainerImage string ArtifactURLs []string // Additional packages for workers. FlexRSGoal string EnableHotKeyLogging bool // Streaming update settings Update bool TransformNameMapping map[string]string // Autoscaling settings Algorithm string MaxNumWorkers int64 WorkerHarnessThreads int64 TempLocation string TemplateLocation string // Worker is the worker binary override. Worker string // WorkerJar is a custom worker jar. WorkerJar string TeardownPolicy string }
JobOptions capture the various options for submitting jobs to Dataflow.