Documentation

Overview

Package graph is the internal representation of the Beam execution plan. This package is used by the public-facing Beam package to organize the user's pipeline into a connected graph structure. This graph is a precise, strongly-typed representation of the user's intent, and allows the runtime to verify typing of collections, and tracks the data dependency relationships to allow an optimizer to schedule the work.

Index

Constants

const (
	MainUnknown mainInputs = -1 // Number of inputs is unknown for DoFn validation.
	MainSingle  mainInputs = 1  // Number of inputs for single value elements.
	MainKv      mainInputs = 2  // Number of inputs for KV elements.
)

The following constants prefixed with "Main" represent valid numbers of DoFn main inputs for DoFn construction and validation.


const CombinePerKeyScope = "CombinePerKey"

CombinePerKeyScope is the Go SDK canonical name for the combine composite scope. With Beam Portability, "primitive" composite transforms like combine have their URNs & payloads attached to a high level scope, with a default representation beneath. The use of this const permits the translation layer to confirm the SDK expects this combine to be liftable by a runner and should set this scope's URN and Payload accordingly.


Variables

var (
	// SourceInputTag is a constant random string used when an ExternalTransform
	// expects a single unnamed input. xlangx and graphx use it to explicitly
	// bypass steps in pipeline construction meant for named inputs
	SourceInputTag string

	// SinkOutputTag is a constant random string used when an ExternalTransform
	// expects a single unnamed output. xlangx and graphx use it to explicitly
	// bypass steps in pipeline construction meant for named outputs.
	SinkOutputTag string

	// NewNamespace is a utility random string generator used by the xlang to
	// scope individual ExternalTransforms by a unique namespace
	NewNamespace func() string
)

Functions

func Bounded

func Bounded(ns []*Node) bool

Bounded returns true iff all nodes are bounded.

func CoGBKMainInput

func CoGBKMainInput(components int) func(*config)

CoGBKMainInput is an optional config to NewDoFn which specifies the number of components of a CoGBK input to the DoFn being created, allowing for more complete validation.

Example usage:

var col beam.PCollection
graph.NewDoFn(fn, graph.CoGBKMainInput(len(col.Type().Components())))

func IsLifecycleMethod

func IsLifecycleMethod(n string) bool

lifecycleMethodName returns if the passed in string is one of the lifecycle method names used by the Go SDK as DoFn or CombineFn lifecycle methods. These are the only methods that need shims generated for them.

func NewNamespaceGenerator

func NewNamespaceGenerator(n int) func() string

NewNamespaceGenerator returns a functions that generates a random string of n alphabets

func NodeTypes

func NodeTypes(list []*Node) []typex.FullType

NodeTypes returns the fulltypes of the supplied slice of nodes.

func NumMainInputs

func NumMainInputs(num mainInputs) func(*config)

NumMainInputs is an optional config to NewDoFn which specifies the number of main inputs to the DoFn being created, allowing for more complete validation. Valid inputs are the package constants of type mainInputs.

Example usage:

graph.NewDoFn(fn, graph.NumMainInputs(graph.MainKv))

Types

type CombineFn

type CombineFn Fn

CombineFn represents a CombineFn.

func AsCombineFn

func AsCombineFn(fn *Fn) (*CombineFn, error)

AsCombineFn converts a Fn to a CombineFn, if possible.

func NewCombineFn

func NewCombineFn(fn interface{}) (*CombineFn, error)

NewCombineFn constructs a CombineFn from the given value, if possible.

func (*CombineFn) AddInputFn

func (f *CombineFn) AddInputFn() *funcx.Fn

AddInputFn returns the "AddInput" function, if present.

func (*CombineFn) CompactFn

func (f *CombineFn) CompactFn() *funcx.Fn

CompactFn returns the "Compact" function, if present.

func (*CombineFn) CreateAccumulatorFn

func (f *CombineFn) CreateAccumulatorFn() *funcx.Fn

CreateAccumulatorFn returns the "CreateAccumulator" function, if present.

func (*CombineFn) ExtractOutputFn

func (f *CombineFn) ExtractOutputFn() *funcx.Fn

ExtractOutputFn returns the "ExtractOutput" function, if present.

func (*CombineFn) MergeAccumulatorsFn

func (f *CombineFn) MergeAccumulatorsFn() *funcx.Fn

MergeAccumulatorsFn returns the "MergeAccumulators" function. If it is the only method present, then InputType == AccumulatorType == OutputType.

func (*CombineFn) Name

func (f *CombineFn) Name() string

Name returns the name of the function or struct.

func (*CombineFn) SetupFn

func (f *CombineFn) SetupFn() *funcx.Fn

SetupFn returns the "Setup" function, if present.

func (*CombineFn) TeardownFn

func (f *CombineFn) TeardownFn() *funcx.Fn

TeardownFn returns the "Teardown" function, if present.

type DoFn

type DoFn Fn

DoFn represents a DoFn.

func AsDoFn

func AsDoFn(fn *Fn, numMainIn mainInputs) (*DoFn, error)

AsDoFn converts a Fn to a DoFn, if possible. numMainIn specifies how many main inputs are expected in the DoFn's method signatures. Valid inputs are the package constants of type mainInputs. If that number is MainUnknown then validation is done by best effort and may miss some edge cases.

func NewDoFn

func NewDoFn(fn interface{}, options ...func(*config)) (*DoFn, error)

NewDoFn constructs a DoFn from the given value, if possible.

func (*DoFn) FinishBundleFn

func (f *DoFn) FinishBundleFn() *funcx.Fn

FinishBundleFn returns the "FinishBundle" function, if present.

func (*DoFn) IsSplittable

func (f *DoFn) IsSplittable() bool

IsSplittable returns whether the DoFn is a valid Splittable DoFn.

func (*DoFn) Name

func (f *DoFn) Name() string

Name returns the name of the function or struct.

func (*DoFn) ProcessElementFn

func (f *DoFn) ProcessElementFn() *funcx.Fn

ProcessElementFn returns the "ProcessElement" function.

func (*DoFn) SetupFn

func (f *DoFn) SetupFn() *funcx.Fn

SetupFn returns the "Setup" function, if present.

func (*DoFn) StartBundleFn

func (f *DoFn) StartBundleFn() *funcx.Fn

StartBundleFn returns the "StartBundle" function, if present.

func (*DoFn) TeardownFn

func (f *DoFn) TeardownFn() *funcx.Fn

TeardownFn returns the "Teardown" function, if present.

type DynFn

type DynFn struct {
	// Name is the name of the function. It does not have to be a valid symbol.
	Name string
	// T is the type of the generated function
	T reflect.Type
	// Data holds the data, if any, for the generator. Each function
	// generator typically needs some configuration data, which is
	// required by the DynFn to be encoded.
	Data []byte
	// Gen is the function generator. The function generator itself must be a
	// function with a unique symbol.
	Gen func(string, reflect.Type, []byte) reflectx.Func
}

DynFn is a generator for dynamically-created functions:

gen: (name string, t reflect.Type, []byte) -> func : T

where the generated function, fn : T, is re-created at runtime. This concept allows serialization of dynamically-generated functions, which do not have a valid (unique) symbol such as one created via reflect.MakeFunc.

type ExpandedTransform

type ExpandedTransform struct {
	Components   interface{} // *pipepb.Components
	Transform    interface{} //*pipepb.PTransform
	Requirements []string
}

ExpandedTransform stores the expansion response associated to each ExternalTransform.

Components and Transform fields are purposely typed as interface{} to avoid unnecesary proto related imports into graph.

type ExternalTransform

type ExternalTransform struct {
	Namespace string

	Urn           string
	Payload       []byte
	ExpansionAddr string

	InputsMap  map[string]int
	OutputsMap map[string]int

	Expanded *ExpandedTransform
}

ExternalTransform represents the cross-language transform in and out of pipeline graph. It is associated with each MultiEdge and it's Inbound and Outbound links. It also stores the associated expansion response within the Expanded field.

func (ExternalTransform) WithNamedInputs

func (ext ExternalTransform) WithNamedInputs(inputsMap map[string]int) ExternalTransform

WithNamedInputs adds a map (tag -> index of Inbound in MultiEdge.Input) of named inputs corresponsing to ExternalTransform's InputsMap

func (ExternalTransform) WithNamedOutputs

func (ext ExternalTransform) WithNamedOutputs(outputsMap map[string]int) ExternalTransform

WithNamedOutputs adds a map (tag -> index of Outbound in MultiEdge.Output) of named outputs corresponsing to ExternalTransform's OutputsMap

type Fn

type Fn struct {
	// Fn holds the function, if present. If Fn is nil, Recv must be
	// non-nil.
	Fn *funcx.Fn
	// Recv hold the struct receiver, if present. If Recv is nil, Fn
	// must be non-nil.
	Recv interface{}
	// DynFn holds the function-generator, if dynamic. If not nil, Fn
	// holds the generated function.
	DynFn *DynFn
	// contains filtered or unexported fields
}

Fn holds either a function or struct receiver.

func NewFn

func NewFn(fn interface{}) (*Fn, error)

NewFn pre-processes a function, dynamic function or struct for graph construction.

func (*Fn) Name

func (f *Fn) Name() string

Name returns the name of the function or struct.

type Graph

type Graph struct {
	// contains filtered or unexported fields
}

Graph represents an in-progress deferred execution graph and is easily translatable to the model graph. This graph representation allows precise control over scope and connectivity.

func New

func New() *Graph

New returns an empty graph with the scope set to the root.

func (*Graph) Build

func (g *Graph) Build() ([]*MultiEdge, []*Node, error)

Build performs finalization on the graph. It verifies the correctness of the graph structure, typechecks the plan and returns a slice of the edges in the graph.

func (*Graph) NewEdge

func (g *Graph) NewEdge(parent *Scope) *MultiEdge

NewEdge creates a new edge of the graph in the supplied scope.

func (*Graph) NewNode

func (g *Graph) NewNode(t typex.FullType, w *window.WindowingStrategy, bounded bool) *Node

NewNode creates a new node in the graph of the supplied fulltype.

func (*Graph) NewScope

func (g *Graph) NewScope(parent *Scope, name string) *Scope

NewScope creates and returns a new scope that is a child of the supplied scope.

func (*Graph) Root

func (g *Graph) Root() *Scope

Root returns the root scope of the graph.

func (*Graph) String

func (g *Graph) String() string

type Inbound

type Inbound struct {
	// Kind presents the form of the data that the edge expects. Main input
	// must be processed element-wise, but side input may take several
	// convenient forms. For example, a DoFn that processes ints may choose
	// among the following parameter types:
	//
	//   * Main:      int
	//   * Singleton: int
	//   * Slice:     []int
	//   * Iter:      func(*int) bool
	//   * ReIter:    func() func(*int) bool
	//
	// If the DoFn is generic then int may be replaced by any of the type
	// variables. For example,
	//
	//   * Slice:     []typex.T
	//   * Iter:      func(*typex.X) bool
	//
	// If the input type is KV<int,string>, say, then the options are:
	//
	//   * Main:      int, string  (as two separate parameters)
	//   * Map:       map[int]string
	//   * MultiMap:  map[int][]string
	//   * Iter:      func(*int, *string) bool
	//   * ReIter:    func() func(*int, *string) bool
	//
	// As above, either int, string, or both can be replaced with type
	// variables. For example,
	//
	//   * Map:       map[typex.X]typex.Y
	//   * MultiMap:  map[typex.T][]string
	//   * Iter:      func(*typex.Z, *typex.Z) bool
	//
	// Note that in the last case the parameter type requires that both
	// the key and value types are identical. Bind enforces such constraints.
	Kind InputKind

	// From is the incoming node in the graph.
	From *Node

	// Type is the fulltype matching the actual type used by the transform.
	// Due to the loose signatures of DoFns, we can only determine the
	// inbound structure when the fulltypes of the incoming links are present.
	// For example,
	//
	//     func (ctx context.Context, key int, value typex.X) error
	//
	// is a generic DoFn that if bound to KV<int,string> would have one
	// Inbound link with type KV<int, X>.
	Type typex.FullType
}

Inbound represents an inbound data link from a Node.

func NamedInboundLinks(ins map[string]*Node) (map[string]int, []*Inbound)

NamedInboundLinks returns an array of new Inbound links and a map (tag -> index of Inbound in MultiEdge.Input) of corresponding indices with respect to their names.

func (*Inbound) String

func (i *Inbound) String() string

type InputKind

type InputKind string

InputKind represents the role of the input and its shape.

const (
	Main      InputKind = "Main"
	Singleton InputKind = "Singleton"
	Slice     InputKind = "Slice"
	Map       InputKind = "Map"      // TODO: allow?
	MultiMap  InputKind = "MultiMap" // TODO: allow?
	Iter      InputKind = "Iter"
	ReIter    InputKind = "ReIter"
)

Valid input kinds.

func Bind

func Bind(fn *funcx.Fn, typedefs map[string]reflect.Type, in ...typex.FullType) ([]typex.FullType, []InputKind, []typex.FullType, []typex.FullType, error)

Bind returns the inbound, outbound and underlying output types for a Fn, when bound to the underlying input types. The complication of bind is primarily that UserFns have loose signatures and bind must produce valid type information for the execution plan.

For example,

func (t EventTime, k typex.X, v int, emit func(string, typex.X))

or

func (context.Context, k typex.X, v int) (string, typex.X, error)

are UserFns that may take one or two incoming fulltypes: either KV<X,int> or X with a singleton side input of type int. For the purpose of the shape of data processing, the two forms are equivalent. The non-data types, context.Context and error, are not part of the data signature, but in play only at runtime.

If either was bound to the input type [KV<string,int>], bind would return:

inbound:  [Main: KV<X,int>]
outbound: [KV<string,X>]
output:   [KV<string,string>]

Note that it propagates the assignment of X to string in the output type.

If either was instead bound to the input fulltypes [float, int], the result would be:

inbound:  [Main: X, Singleton: int]
outbound: [KV<string,X>]
output:   [KV<string, float>]

Here, the inbound shape and output types are different from before.

type MultiEdge

type MultiEdge struct {
	Op               Opcode
	DoFn             *DoFn              // ParDo
	RestrictionCoder *coder.Coder       // SplittableParDo
	CombineFn        *CombineFn         // Combine
	AccumCoder       *coder.Coder       // Combine
	Value            []byte             // Impulse
	External         *ExternalTransform // Current External Transforms API
	Payload          *Payload           // Legacy External Transforms API
	WindowFn         *window.Fn         // WindowInto

	Input  []*Inbound
	Output []*Outbound
	// contains filtered or unexported fields
}

MultiEdge represents a primitive data processing operation. Each non-user code operation may be implemented by either the harness or the runner.

func NewCoGBK

func NewCoGBK(g *Graph, s *Scope, ns []*Node) (*MultiEdge, error)

NewCoGBK inserts a new CoGBK edge into the graph.

func NewCombine

func NewCombine(g *Graph, s *Scope, u *CombineFn, in *Node, ac *coder.Coder, typedefs map[string]reflect.Type) (*MultiEdge, error)

NewCombine inserts a new Combine edge into the graph. Combines cannot have side input.

func NewCrossLanguage

func NewCrossLanguage(g *Graph, s *Scope, ext *ExternalTransform, ins []*Inbound, outs []*Outbound) (*MultiEdge, func(*Node, bool))

NewCrossLanguage inserts a Cross-langugae External transform using initialized input and output nodes

func NewExternal

func NewExternal(g *Graph, s *Scope, payload *Payload, in []*Node, out []typex.FullType, bounded bool) *MultiEdge

NewExternal inserts an External transform. The system makes no assumptions about what this transform might do.

func NewFlatten

func NewFlatten(g *Graph, s *Scope, in []*Node) (*MultiEdge, error)

NewFlatten inserts a new Flatten edge in the graph. Flatten output type is the shared input type.

func NewImpulse

func NewImpulse(g *Graph, s *Scope, value []byte) *MultiEdge

NewImpulse inserts a new Impulse edge into the graph. It must use the built-in bytes coder.

func NewParDo

func NewParDo(g *Graph, s *Scope, u *DoFn, in []*Node, rc *coder.Coder, typedefs map[string]reflect.Type) (*MultiEdge, error)

NewParDo inserts a new ParDo edge into the graph.

func NewReshuffle

func NewReshuffle(g *Graph, s *Scope, in *Node) (*MultiEdge, error)

NewReshuffle inserts a new Reshuffle edge into the graph.

func NewWindowInto

func NewWindowInto(g *Graph, s *Scope, wfn *window.Fn, in *Node) *MultiEdge

NewWindowInto inserts a new WindowInto edge into the graph.

func (*MultiEdge) ID

func (e *MultiEdge) ID() int

ID returns the graph-local identifier for the edge.

func (*MultiEdge) Name

func (e *MultiEdge) Name() string

Name returns a not-necessarily-unique name for the edge.

func (*MultiEdge) Scope

func (e *MultiEdge) Scope() *Scope

Scope return the scope.

func (*MultiEdge) String

func (e *MultiEdge) String() string

type Node

type Node struct {

	// Coder defines the data encoding. It can be changed, but must be of
	// the underlying type, t.
	Coder *coder.Coder
	// contains filtered or unexported fields
}

Node is a typed connector describing the data type and encoding. A node may have multiple inbound and outbound connections. The underlying type must be a complete type, i.e., not include any type variables.

func (*Node) Bounded

func (n *Node) Bounded() bool

Bounded returns true iff the collection is bounded.

func (*Node) ID

func (n *Node) ID() int

ID returns the graph-local identifier for the node.

func (*Node) String

func (n *Node) String() string

func (*Node) Type

func (n *Node) Type() typex.FullType

Type returns the underlying full type of the data, such as KV<int,string>.

func (*Node) WindowingStrategy

func (n *Node) WindowingStrategy() *window.WindowingStrategy

WindowingStrategy returns the window applied to the data.

type Opcode

type Opcode string

Opcode represents a primitive Beam instruction kind.

const (
	Impulse    Opcode = "Impulse"
	ParDo      Opcode = "ParDo"
	CoGBK      Opcode = "CoGBK"
	Reshuffle  Opcode = "Reshuffle"
	External   Opcode = "External"
	Flatten    Opcode = "Flatten"
	Combine    Opcode = "Combine"
	WindowInto Opcode = "WindowInto"
)

Valid opcodes.

type Outbound

type Outbound struct {
	// To is the outgoing node in the graph.
	To *Node

	// Type is the fulltype matching the actual type used by the transform.
	// For DoFns, unlike inbound, the outbound types closely mimic the type
	// signature. For example,
	//
	//     func (ctx context.Context, emit func (key int, value typex.X)) error
	//
	// is a generic DoFn that produces one Outbound link of type KV<int,X>.
	Type typex.FullType // representation type of data
}

Outbound represents an outbound data link to a Node.

func NamedOutboundLinks(g *Graph, outs map[string]typex.FullType) (map[string]int, []*Outbound)

NamedOutboundLinks returns an array of new Outbound links and a map (tag -> index of Outbound in MultiEdge.Output) of corresponding indices with respect to their names.

func (*Outbound) String

func (o *Outbound) String() string

type Payload

type Payload struct {
	URN  string
	Data []byte
}

Payload represents an external payload.

type Scope

type Scope struct {

	// Label is the human-visible label for this scope.
	Label string
	// Parent is the parent scope, if nested.
	Parent *Scope
	// contains filtered or unexported fields
}

Scope is a syntactic Scope, such as arising from a composite Transform. It has no semantic meaning at execution time. Used by monitoring.

func (*Scope) ID

func (s *Scope) ID() int

ID returns the graph-local identifier for the scope.

func (*Scope) String

func (s *Scope) String() string

type SplittableDoFn

type SplittableDoFn DoFn

SplittableDoFn represents a DoFn implementing SDF methods.

func (*SplittableDoFn) CreateInitialRestrictionFn

func (f *SplittableDoFn) CreateInitialRestrictionFn() *funcx.Fn

CreateInitialRestrictionFn returns the "CreateInitialRestriction" function, if present.

func (*SplittableDoFn) CreateTrackerFn

func (f *SplittableDoFn) CreateTrackerFn() *funcx.Fn

CreateTrackerFn returns the "CreateTracker" function, if present.

func (*SplittableDoFn) Name

func (f *SplittableDoFn) Name() string

Name returns the name of the function or struct.

func (*SplittableDoFn) RestrictionSizeFn

func (f *SplittableDoFn) RestrictionSizeFn() *funcx.Fn

RestrictionSizeFn returns the "RestrictionSize" function, if present.

func (*SplittableDoFn) RestrictionT

func (f *SplittableDoFn) RestrictionT() reflect.Type

RestrictionT returns the restriction type from the SDF.

func (*SplittableDoFn) SplitRestrictionFn

func (f *SplittableDoFn) SplitRestrictionFn() *funcx.Fn

SplitRestrictionFn returns the "SplitRestriction" function, if present.

Directories

Path Synopsis
coder Package coder contains coder representation and utilities.
mtime Package mtime contains a millisecond representation of time.
window Package window contains window representation, windowing strategies and utilities.