zample

package module
v0.0.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2024 License: BSD-2-Clause-Views Imports: 1 Imported by: 0

README

zample: streaming line filter

EXAMPLE

$ cd examples

$ zample romeo-and-juliet.txt
  Escalus, Prince of Verona.
  Friar John, Franciscan.
  Three Musicians.
  An Officer.
  Nurse to Juliet.
    In fair Verona, where we lay our scene,
                                                         [Exit.]

ABOUT

zample selects random lines from text inputs. This kind of filter has several uses:

  • Statistics
  • Random name generators
  • Text processing
  • Telemetry downsampling
  • File previews

For example, head/tail may show the start and end of a document. Whereas zample shows a more representative view of the overall document body. In this way, zample behaves akin to less/more, but in a compact, lossy form.

LICENSE

BSD-2-Clause

RUNTIME REQUIREMENTS

(None)

CONTRIBUTING

See DEVELOPMENT.md.

DOWNLOAD

https://github.com/mcandre/zample/releases

API DOCUMENTATION

https://pkg.go.dev/github.com/mcandre/zample

USAGE

By default, the selection rate of each text line is a 0.10 (10%) chance per line. That is, about 10% of text lines may become output, with the remaining 90% being stripped away from the output.

The sampling rate can be customized with a -rate <value> flag, using values in the range [0.0, ... 1.0]. For example, to select 0.05 (5%) of stellar constellations:

$ zample -rate 0.05 constellations.txt
Aquarius
Pisces

zample supports multiple file paths.

$ zample constellations.txt cities.txt colors.txt
Pisces
Yokohama
Red

zample is optimized for large data sets, and does not support robust entry reordering. Any apparent reorderding is an accidental artifact. This is a consequence of how zample optimizes for large data sets, for example by deciding each text line catch/release chance in a streaming fashion. Each chance resolves independently of the other. For deliberate shuffling of your data, you may pipe zample to additional tools such as shuf.

Likewise, different runs of the same zample experiments may produce different result sizes. For best effect, generate more input data, increase the selection rate, or try the -skip option.

-skip <n> deterministically skips every nth text line. This disables probabalistic rate behavior.

$ zample -skip 2 cities.txt
Amsterdam
Casablanca
Edison
Gallipoli
Italia
Kilogramme
Madagascar
Oslo
Quebec
Santiago
Upsala
Washington
Yokohama

By default, zample reads from stdin.

See zample -help for more information.

RESOURCES

  • awk, a complex line processor
  • head/tail, basic text truncators
  • hellcat, a portable hex dumper
  • less, an interactive text file reader
  • more, a limited interactive text file reader
  • od, a classic hex dumper
  • perl, a very complex text processor
  • sed, a simple line processor
  • shuf, a line shuffler
  • uniq, a text filter for uniqueness
  • wc provides basic text file metrics

🧪

Documentation

Index

Constants

View Source
const DefaultRate = float64(0.1)

DefaultRate controls the normal probability preservation rate of each line.

View Source
const Version = "0.0.10"

Version is semver.

Variables

This section is empty.

Functions

func Zample

func Zample(rate *float64, skip *int64) (chan<- string, <-chan string, chan<- struct{})

Zample selects strings from a random source.

rate specifies the probability of preserving each string.

skip specifies deterministic skipping each nth file entry. This option disables probabilistic rate behavior.

Sampling is tuneable via the Seed function from math/rand.

Returns an input channel for submitting population strings; an output channel for receiving select strings.

Types

This section is empty.

Directories

Path Synopsis
cmd
zample command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL