slice

package module
v0.0.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2022 License: BSD-2-Clause-Views Imports: 1 Imported by: 0

README

slice: text sampler

slice samples random lines from your texts.

EXAMPLE

$ cd examples

$ slice romeo-and-juliet.txt
  Escalus, Prince of Verona.
  Friar John, Franciscan.
  Three Musicians.
  An Officer.
  Nurse to Juliet.
    In fair Verona, where we lay our scene,
                                                         [Exit.]

ABOUT

slice extracts random lines from text. This is useful for a variety of applications.

  • Statistics
  • Random name generator
  • Text processing
  • File previewing

For example, head/tail show only the very start and end of a document. Whereas slice shows a more representative interlace of the overall content. Akin to less/more, but in a compact, intentionally lossy form.

Usage

By default, the preservation rate of each line is 0.1 (10%).

This probability can be customized with a -rate flag, as a value in [0.0, 1.0]. For example, to sample 5% of the stellar constellations:

$ slice -rate 0.05 constellations.txt
Ara
Bootes
Canis

slice supports sampling multiple text files concurrently.

$ slice constellations.txt cities.txt colors.txt
Ara
Yellow
Aries
Gallipoli
Cassiopeia
Coma
Washington
Zurich
Equuleus
Leo
Lupus
Phoenix
Vulpecula

slice is not primarily a reordering tool. Neither for line reordering nor file path reordering. Any apparent shuffling is a natural consequence of the input data and thread timings. Where deliberate shuffling is desired, slice may pipe with additional tools like shuf.

For small data sets, slice can produce very short output, or even no output. This artifact diminishes as the rate and/or input line count grows. In order to optimize the sampling algorithm for large data sets, we evaluate the chance of preservation once per line, at the time that line is processed. In other words, different runs at the same rating, may produce different sample output line counts, as well as different output contents. For best effect, generate more input data, or try the -skip option.

slice can deterministically skip every nth line of source text with a -skip flag. This disables probabalistic rate behavior.

$ slice -skip 2 cities.txt
Amsterdam
Casablanca
Edison
Gallipoli
Italia
Kilogramme
Madagascar
Oslo
Quebec
Santiago
Upsala
Washington
Yokohama

By default, slice reads from stdin.

See slice -help for more information.

DOWNLOAD

https://github.com/mcandre/slice/releases

DOCUMENTATION

https://pkg.go.dev/github.com/mcandre/slice

CONTRIBUTING

See DEVELOPMENT.md.

LICENSE

FreeBSD

SEE ALSO

  • awk, a complex line processor
  • head/tail, basic text truncators
  • less, an interactive text file reader
  • more, a limited interactive text file reader
  • perl, a very complex text processor
  • sed, a simple line processor
  • shuf, a line shuffler
  • uniq, a text filter for uniqueness
  • wc provides basic text file metrics

🔪

Documentation

Index

Constants

View Source
const DefaultRate = float64(0.1)

DefaultRate controls the normal probability preservation rate of each line.

View Source
const Version = "0.0.3"

Version is semver.

Variables

This section is empty.

Functions

func Slice

func Slice(rate *float64, skip *int64) (chan<- string, <-chan string, chan<- struct{})

Slice samples text.

rate specifies the probability of preserving each line.

Sampling is tuneable via the Seed function from math/rand.

Returns an input channel for submitting population lines; an output channel for receiving sample lines; and a done channel for concluding the sampling operation.

Types

This section is empty.

Directories

Path Synopsis
cmd
slice command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL