snare

package module
v0.0.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 24, 2022 License: BSD-2-Clause-Views Imports: 1 Imported by: 0

README

snare: text sampler

bear catching salmon

EXAMPLE

$ cd examples

$ snare romeo-and-juliet.txt
  Escalus, Prince of Verona.
  Friar John, Franciscan.
  Three Musicians.
  An Officer.
  Nurse to Juliet.
    In fair Verona, where we lay our scene,
                                                         [Exit.]

ABOUT

snare catches lines at random from text files, and tosses the rest back onto /dev/null. This stochastic filtering has several uses:

  • Statistics
  • Random name generators
  • Text processing
  • Telemetry downsampling
  • File previews

For example, head/tail may show the start and end of a document. Whereas snare shows a more representative sample of the overall document body. In this way, snare behaves akin to less/more, but in a compact, lossy form.

LICENSE

FreeBSD

DOWNLOAD

https://github.com/mcandre/snare/releases

API DOCUMENTATION

https://pkg.go.dev/github.com/mcandre/snare

USAGE

By default, the catch rate of each text line is 0.10 (10%). That is, 10% of text lines become output, with the remaining 90% slipping away.

The catch rate can be customized with a -rate <value> flag, using values in the range [0.0, ... 1.0]. For example, to sample 0.05 (5%) of stellar constellations:

$ snare -rate 0.05 constellations.txt
Aquarius
Pisces

snare supports multiple file paths.

$ snare constellations.txt cities.txt colors.txt
Pisces
Yokohama
Red

snare is optimized for large data sets, and does not support robust entry reordering. Any apparent reorderding is an accidental artifact. This is a consequence of how snare optimizes for large data sets, for example by deciding each text line catch/release chance in a streaming fashion. Each chance resolves independently of the other. For deliberate shuffling of your data, you may pipe snare to additional tools such as shuf.

Likewise, different runs of the same snare experiments may produce different sample sizes. For best effect, generate more input data, increase the sample rate, or try the -skip option.

-skip <n> deterministically skips every nth text line. This disables probabalistic rate behavior.

$ snare -skip 2 cities.txt
Amsterdam
Casablanca
Edison
Gallipoli
Italia
Kilogramme
Madagascar
Oslo
Quebec
Santiago
Upsala
Washington
Yokohama

By default, snare reads from stdin.

See snare -help for more information.

CONTRIBUTING

See DEVELOPMENT.md.

salmon run

SEE ALSO

  • awk, a complex line processor
  • head/tail, basic text truncators
  • less, an interactive text file reader
  • more, a limited interactive text file reader
  • perl, a very complex text processor
  • sed, a simple line processor
  • shuf, a line shuffler
  • uniq, a text filter for uniqueness
  • wc provides basic text file metrics

Documentation

Index

Constants

View Source
const DefaultRate = float64(0.1)

DefaultRate controls the normal probability preservation rate of each line.

View Source
const Version = "0.0.5"

Version is semver.

Variables

This section is empty.

Functions

func Snare

func Snare(rate *float64, skip *int64) (chan<- string, <-chan string, chan<- struct{})

Snare samples strings.

rate specifies the probability of preserving each string.

skip specifies deterministic skipping each nth file entry. This option disables probabilistic rate behavior.

Sampling is tuneable via the Seed function from math/rand.

Returns an input channel for submitting population strings; an output channel for receiving sample strings.

Types

This section is empty.

Directories

Path Synopsis
cmd
snare command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL