Back to

Package morebeam

Latest Go to latest

The latest major version is .

Published: Dec 16, 2018 | License: Apache-2.0 | Module:


Package morebeam provides additional functions useful for building Apache Beam pipelines.



package main

import (


// Painting represents a single record in the csv file.
type Painting struct {
	Artist  string `csv:"artist"`
	Title   string `csv:"title"`
	Year    int    `csv:"year"`
	NotUsed string `csv:"-"` // Ignored field

func main() {

	p, s := beam.NewPipelineWithRoot()

	// Read the CSV file as a PCollection<Painting>.
	paintings := csvio.Read(s, "paintings.csv", reflect.TypeOf(Painting{}))

	// Reshuffle the CSV output to improve parallelism.
	paintings = morebeam.Reshuffle(s, paintings)

	// Return a new PCollection<KV<string, Painting>> where the key is the artist.
	paintingsByArtist := morebeam.AddKey(s, func(painting Painting) string {
		return painting.Artist
	}, paintings)

	debug.Print(s, paintingsByArtist)

	beamx.Run(context.Background(), p)




func AddKey

func AddKey(s beam.Scope, keyFn interface{}, col beam.PCollection) beam.PCollection

AddKey takes a PCollection<V> and returns a PCollection<KV<K, V>> where the key is calculated from keyFn.

func Join

func Join(elem ...string) string

Join is similar to path.Join but safe to use on URLs or filepaths.

func Reshuffle

func Reshuffle(s beam.Scope, col beam.PCollection) beam.PCollection

Reshuffle takes a PCollection<A> and shuffles the data to help increase parallelism.

Documentation was rendered with GOOS=linux and GOARCH=amd64.

Jump to identifier

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to identifier