debugging_wordcount

command
v2.6.0+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 3, 2018 License: Apache-2.0, BSD-3-Clause, MIT Imports: 11 Imported by: 0

Documentation

Overview

debugging_wordcount is an example that verifies word counts in Shakespeare and includes Beam best practices.

This example, debugging_wordcount, is the third in a series of four successively more detailed 'word count' examples. You may first want to take a look at minimal_wordcount and wordcount. After you've looked at this example, then see the windowed_wordcount pipeline, for introduction of additional concepts.

Basic concepts, also in the minimal_wordcount and wordcount examples: Reading text files; counting a PCollection; executing a Pipeline both locally and using a selected runner; defining DoFns.

New Concepts:

  1. Using the richer struct DoFn form and accessing optional arguments.
  2. Logging using the Beam log package, even in a distributed environment
  3. Testing your Pipeline via passert

To change the runner, specify:

--runner=YOUR_SELECTED_RUNNER

The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with --input.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL