large_wordcount

command

v2.40.0 Latest Latest Go to latest Published: Jun 23, 2022 License: Apache-2.0, BSD-3-Clause, MIT Imports: 24 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/apache/beam

Links

Open Source Insights

Documentation ¶

Overview ¶

large_wordcount is an example that demonstrates a more complex version of a wordcount pipeline. It uses a SplittableDoFn for reading the text files, then uses a map side input to build sorted shards.

This example, large_wordcount, is the fourth in a series of five successively more detailed 'word count' examples. You may first want to take a look at minimal_wordcount and wordcount. Then look at debugging_worcount for some testing and validation concepts. After you've looked at this example, follow up with the windowed_wordcount pipeline, for introduction of additional concepts.

Basic concepts, also in the minimal_wordcount and wordcount examples: Reading text files; counting a PCollection; executing a Pipeline both locally and using a selected runner; defining DoFns.

New Concepts:

Using a SplittableDoFn transform to read the IOs.
Using a Map Side Input to access values for specific keys.
Testing your Pipeline via passert and metrics, using Go testing tools.

This example will not be enumerating concepts, but will document them as they appear. There may be repetition from previous examples.

To change the runner, specify:

--runner=YOUR_SELECTED_RUNNER

The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with --input.

Source Files ¶

View all Source files

large_wordcount.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL