beam-wordcount-go

command module
v0.0.0-...-1d5fefc Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 18, 2021 License: Apache-2.0 Imports: 11 Imported by: 0

README

beam-wordcount-go

Example from https://github.com/apache/beam/blob/master/sdks/go/examples/wordcount/wordcount.go

Read More

Windows Tools

  • PowerShell (cross-platform)
  • Git
  • Chocolatey (Windows Package Manager)

Install Go

choco install golang -y
refreshenv
choco list --local-only 

Verify

go version

Review the installed files at C:\Program Files\Go.

Get Beam Go SDK

go get -u github.com/apache/beam/sdks/v2/go/pkg/beam

Run WordCount Example

go install github.com/apache/beam/sdks/v2/go/examples/wordcount@latest
wordcount --input <PATH_TO_INPUT_FILE> --output counts

Examples:

wordcount --input 'C:\Users\dcase\Documents\44-517\beam-wordcount-go\data.txt' --output counts
wordcount --input data.txt --output counts.yaml

Review the local dependencies at C:\Users<username>\AppData\Local\go-build.

About Go

  • go get - updates dependencies/versions listed in go.mod and updates local cache
  • go run - comples and runs the file
  • [go install] is used to build and install the provided source file
  • go.mod can be found at $GOPATH/misc

Documentation

Overview

wordcount is an example that counts words in Shakespeare and includes Beam best practices.

This example is the second in a series of four successively more detailed 'word count' examples. You may first want to take a look at minimal_wordcount. After you've looked at this example, then see the debugging_workcount pipeline, for introduction of additional concepts.

For a detailed walkthrough of this example, see

https://beam.apache.org/get-started/wordcount-example/

Basic concepts, also in the minimal_wordcount example: Reading text files; counting a PCollection; writing to text files

New Concepts:

  1. Executing a Pipeline both locally and using the selected runner
  2. Defining your own pipeline options
  3. Using ParDo with static DoFns defined out-of-line
  4. Building a composite transform

Concept #1: you can execute this pipeline either locally or using by selecting another runner. These are now command-line options added by the 'beamx' package and not hard-coded as they were in the minimal_wordcount example. The 'beamx' package also registers all included runners and filesystems as a convenience.

To change the runner, specify:

--runner=YOUR_SELECTED_RUNNER

To execute this pipeline, specify a local output file (if using the 'direct' runner) or a remote file on a supported distributed file system.

--output=[YOUR_LOCAL_FILE | YOUR_REMOTE_FILE]

The input file defaults to a public data set containing the text of of King Lear, by William Shakespeare. You can override it and choose your own input with --input.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL