zample

package module

v0.0.10 Latest Latest Go to latest Published: Jan 14, 2024 License: BSD-2-Clause-Views Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/mcandre/zample

Links

Open Source Insights

README ¶

zample: streaming line filter

EXAMPLE

$ cd examples

$ zample romeo-and-juliet.txt
  Escalus, Prince of Verona.
  Friar John, Franciscan.
  Three Musicians.
  An Officer.
  Nurse to Juliet.
    In fair Verona, where we lay our scene,
                                                         [Exit.]

ABOUT

zample selects random lines from text inputs. This kind of filter has several uses:

Statistics
Random name generators
Text processing
Telemetry downsampling
File previews

For example, head/tail may show the start and end of a document. Whereas zample shows a more representative view of the overall document body. In this way, zample behaves akin to less/more, but in a compact, lossy form.

LICENSE

BSD-2-Clause

RUNTIME REQUIREMENTS

(None)

USAGE

By default, the selection rate of each text line is a 0.10 (10%) chance per line. That is, about 10% of text lines may become output, with the remaining 90% being stripped away from the output.

The sampling rate can be customized with a -rate <value> flag, using values in the range [0.0, ... 1.0]. For example, to select 0.05 (5%) of stellar constellations:

$ zample -rate 0.05 constellations.txt
Aquarius
Pisces

zample supports multiple file paths.

$ zample constellations.txt cities.txt colors.txt
Pisces
Yokohama
Red

zample is optimized for large data sets, and does not support robust entry reordering. Any apparent reorderding is an accidental artifact. This is a consequence of how zample optimizes for large data sets, for example by deciding each text line catch/release chance in a streaming fashion. Each chance resolves independently of the other. For deliberate shuffling of your data, you may pipe zample to additional tools such as shuf.

Likewise, different runs of the same zample experiments may produce different result sizes. For best effect, generate more input data, increase the selection rate, or try the -skip option.

-skip <n> deterministically skips every nth text line. This disables probabalistic rate behavior.

$ zample -skip 2 cities.txt
Amsterdam
Casablanca
Edison
Gallipoli
Italia
Kilogramme
Madagascar
Oslo
Quebec
Santiago
Upsala
Washington
Yokohama

By default, zample reads from stdin.

See zample -help for more information.

RESOURCES

awk, a complex line processor
head/tail, basic text truncators
hellcat, a portable hex dumper
less, an interactive text file reader
more, a limited interactive text file reader
od, a classic hex dumper
perl, a very complex text processor
sed, a simple line processor
shuf, a line shuffler
uniq, a text filter for uniqueness
wc provides basic text file metrics

🧪

Documentation ¶

Index ¶

Constants
func Zample(rate *float64, skip *int64) (chan<- string, <-chan string, chan<- struct{})

Constants ¶

View Source

const DefaultRate = float64(0.1)

DefaultRate controls the normal probability preservation rate of each line.

View Source

const Version = "0.0.10"

Version is semver.

Variables ¶

This section is empty.

Functions ¶

func Zample ¶

func Zample(rate *float64, skip *int64) (chan<- string, <-chan string, chan<- struct{})

Zample selects strings from a random source.

rate specifies the probability of preserving each string.

skip specifies deterministic skipping each nth file entry. This option disables probabilistic rate behavior.

Sampling is tuneable via the Seed function from math/rand.

Returns an input channel for submitting population strings; an output channel for receiving select strings.

Types ¶