zample: streaming line filter
EXAMPLE
$ cd examples
$ zample romeo-and-juliet.txt
Escalus, Prince of Verona.
Friar John, Franciscan.
Three Musicians.
An Officer.
Nurse to Juliet.
In fair Verona, where we lay our scene,
[Exit.]
ABOUT
zample selects random lines from text inputs. This kind of filter has several uses:
- Statistics
- Random name generators
- Text processing
- Telemetry downsampling
- File previews
For example, head/tail may show the start and end of a document. Whereas zample shows a more representative view of the overall document body. In this way, zample behaves akin to less/more, but in a compact, lossy form.
LICENSE
BSD-2-Clause
RUNTIME REQUIREMENTS
(None)
CONTRIBUTING
See DEVELOPMENT.md.
DOWNLOAD
https://github.com/mcandre/zample/releases
API DOCUMENTATION
https://pkg.go.dev/github.com/mcandre/zample
USAGE
By default, the selection rate of each text line is a 0.10 (10%) chance per line. That is, about 10% of text lines may become output, with the remaining 90% being stripped away from the output.
The sampling rate can be customized with a -rate <value> flag, using values in the range [0.0, ... 1.0]. For example, to select 0.05 (5%) of stellar constellations:
$ zample -rate 0.05 constellations.txt
Aquarius
Pisces
zample supports multiple file paths.
$ zample constellations.txt cities.txt colors.txt
Pisces
Yokohama
Red
zample is optimized for large data sets, and does not support robust entry reordering. Any apparent reorderding is an accidental artifact. This is a consequence of how zample optimizes for large data sets, for example by deciding each text line catch/release chance in a streaming fashion. Each chance resolves independently of the other. For deliberate shuffling of your data, you may pipe zample to additional tools such as shuf.
Likewise, different runs of the same zample experiments may produce different result sizes. For best effect, generate more input data, increase the selection rate, or try the -skip option.
-skip <n> deterministically skips every nth text line. This disables probabalistic rate behavior.
$ zample -skip 2 cities.txt
Amsterdam
Casablanca
Edison
Gallipoli
Italia
Kilogramme
Madagascar
Oslo
Quebec
Santiago
Upsala
Washington
Yokohama
By default, zample reads from stdin.
See zample -help for more information.
RESOURCES
- awk, a complex line processor
- head/tail, basic text truncators
- hellcat, a portable hex dumper
- less, an interactive text file reader
- more, a limited interactive text file reader
- od, a classic hex dumper
- perl, a very complex text processor
- sed, a simple line processor
- shuf, a line shuffler
- uniq, a text filter for uniqueness
- wc provides basic text file metrics
🧪