textutil

package
v0.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2021 License: BSD-3-Clause Imports: 7 Imported by: 18

Documentation

Overview

Package textutil implements utilities for handling human-readable text.

This package includes a combination of low-level and high-level utilities. The main high-level utilities are:

NewUTF8WrapWriter: Text formatter with line-based word wrapping.
PrefixWriter:      Add prefix to output.
PrefixLineWriter:  Add prefix to each line in output.
ByteReplaceWriter: Replace single byte with bytes in output.

Index

Constants

View Source
const (
	EOF                = rune(-1) // Indicates the end of a rune stream.
	LineSeparator      = '\u2028' // Unicode line separator rune.
	ParagraphSeparator = '\u2029' // Unicode paragraph separator rune.
)

Variables

This section is empty.

Functions

func ByteReplaceWriter

func ByteReplaceWriter(w io.Writer, old byte, new string) io.Writer

ByteReplaceWriter returns an io.Writer that wraps w, where all occurrences of the old byte are replaced with the new string on Write calls.

func FlushRuneChunk

func FlushRuneChunk(d RuneChunkDecoder, fn func(rune) error) error

FlushRuneChunk is a helper that repeatedly calls d.FlushRune until EOF, calling fn for every rune that is decoded. If fn returns an error, Flush will return with that error, without processing any more data.

This is a convenience for implementing an additional Flush() call on an implementation of io.Writer, given a RuneChunkDecoder.

func PrefixWriter

func PrefixWriter(w io.Writer, prefix string) io.Writer

PrefixWriter returns an io.Writer that wraps w, where the prefix is written out immediately before the first non-empty Write call.

func TerminalSize

func TerminalSize() (row, col int, _ error)

TerminalSize returns the dimensions of the terminal, if it's available from the OS, otherwise returns an error.

func WriteRuneChunk

func WriteRuneChunk(d RuneChunkDecoder, fn func(rune) error, chunk []byte) (int, error)

WriteRuneChunk is a helper that repeatedly calls d.DecodeRune(chunk) until EOF, calling fn for every rune that is decoded. Returns the number of bytes in data that were successfully processed. If fn returns an error, WriteRuneChunk will return with that error, without processing any more data.

This is a convenience for implementing io.Writer, given a RuneChunkDecoder.

Types

type RuneChunkDecoder

type RuneChunkDecoder interface {
	// DecodeRune returns the next rune in chunk, and its width in bytes.  If
	// chunk represents a partial rune, the chunk is buffered and returns EOF and
	// the size of the chunk.  Subsequent calls to DecodeRune will combine
	// previously buffered data when decoding.
	DecodeRune(chunk []byte) (r rune, n int)
	// FlushRune returns the next buffered rune.  Returns EOF when all buffered
	// data is returned.
	FlushRune() rune
}

RuneChunkDecoder is the interface to a decoder of a stream of encoded runes that may be arbitrarily chunked.

Implementations of RuneChunkDecoder are commonly used to implement io.Writer wrappers, to handle buffering when chunk boundaries may occur in the middle of an encoded rune.

type RuneEncoder

type RuneEncoder interface {
	// Encode encodes r into buf.
	Encode(r rune, buf *bytes.Buffer)
}

RuneEncoder is the interface to an encoder of a stream of runes into bytes.Buffer.

type UTF8ChunkDecoder

type UTF8ChunkDecoder struct {
	// contains filtered or unexported fields
}

UTF8ChunkDecoder implements RuneChunkDecoder for a stream of UTF-8 data that is arbitrarily chunked.

UTF-8 is a byte-wise encoding that may use multiple bytes to encode a single rune. This decoder buffers partial runes that have been split across chunks, so that a full rune is returned when the subsequent data chunk is provided.

This is commonly used to implement an io.Writer wrapper over UTF-8 text. It is useful since the data provided to Write calls may be arbitrarily chunked.

The zero UTF8ChunkDecoder is a decoder with an empty buffer.

func (*UTF8ChunkDecoder) DecodeRune

func (d *UTF8ChunkDecoder) DecodeRune(chunk []byte) (rune, int)

DecodeRune implements the RuneChunkDecoder interface method.

Invalid encodings are transformed into U+FFFD, one byte at a time. See unicode/utf8.DecodeRune for details.

func (*UTF8ChunkDecoder) FlushRune

func (d *UTF8ChunkDecoder) FlushRune() rune

FlushRune implements the RuneChunkDecoder interface method.

Since the only data that is buffered is the final partial rune, the return value will only ever be U+FFFD or EOF. No valid runes are ever returned by this method, but multiple U+FFFD may be returned before EOF.

type UTF8Encoder

type UTF8Encoder struct{}

UTF8Encoder implements RuneEncoder for the UTF-8 encoding.

func (UTF8Encoder) Encode

func (UTF8Encoder) Encode(r rune, buf *bytes.Buffer)

Encode encodes r into buf in the UTF-8 encoding.

type WrapWriter

type WrapWriter struct {
	// contains filtered or unexported fields
}

WrapWriter implements an io.Writer filter that formats input text into output lines with a given target width in runes.

Each input rune is classified into one of three kinds:

EOL:    end-of-line, consisting of \f, \n, \r, \v, U+2028 or U+2029
Space:  defined by unicode.IsSpace
Letter: everything else

The input text is expected to consist of words, defined as sequences of letters. Sequences of words form paragraphs, where paragraphs are separated by either blank lines (that contain no letters), or an explicit U+2029 ParagraphSeparator. Input lines with leading spaces are treated verbatim.

Paragraphs are output as word-wrapped lines; line breaks only occur at word boundaries. Output lines are usually no longer than the target width. The exceptions are single words longer than the target width, which are output on their own line, and verbatim lines, which may be arbitrarily longer or shorter than the width.

Output lines never contain trailing spaces. Only verbatim output lines may contain leading spaces. Spaces separating input words are output verbatim, unless it would result in a line with leading or trailing spaces.

EOL runes within the input text are never written to the output; the output line terminator and paragraph separator may be configured, and some EOL may be output as a single space ' ' to maintain word separation.

The algorithm greedily fills each output line with as many words as it can, assuming that all Unicode code points have the same width. Invalid UTF-8 is silently transformed to the replacement character U+FFFD and treated as a single rune.

Flush must be called after the last call to Write; the input is buffered.

Implementation note: line breaking is a complicated topic.  This approach
attempts to be simple and useful; a full implementation conforming to
Unicode Standard Annex #14 would be complicated, and is not implemented.
Languages that don't use spaces to separate words (e.g. CJK) won't work
well under the current approach.

http://www.unicode.org/reports/tr14 [Unicode Line Breaking Algorithm]
http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf [5.8 Newline Guidelines]

func NewUTF8WrapWriter

func NewUTF8WrapWriter(w io.Writer, width int) *WrapWriter

NewUTF8WrapWriter returns a new WrapWriter filter that implements io.Writer, and decodes and encodes runes in UTF-8.

func NewWrapWriter

func NewWrapWriter(w io.Writer, width int, dec RuneChunkDecoder, enc RuneEncoder) *WrapWriter

NewWrapWriter returns a new WrapWriter with the given target width in runes, producing output on the underlying writer w. The dec and enc are used to respectively decode runes from Write calls, and encode runes to w.

func (*WrapWriter) Flush

func (w *WrapWriter) Flush() error

Flush flushes any remaining buffered text, and resets the paragraph line count back to 0, so that indents will be applied starting from the first line. It does not imply a paragraph separator; repeated calls to Flush with no intervening calls to other methods is equivalent to a single Flush.

Flush must be called after the last call to Write, and may be called an arbitrary number of times before the last Write.

func (*WrapWriter) ForceVerbatim

func (w *WrapWriter) ForceVerbatim(v bool) error

ForceVerbatim forces w to stay in verbatim mode if v is true, or lets w perform its regular line writing algorithm if v is false. This is useful if there is a sequence of lines that should be written verbatim, even if the lines don't start with spaces.

Calls Flush internally, and returns any Flush error.

func (*WrapWriter) SetIndents

func (w *WrapWriter) SetIndents(indents ...string) error

SetIndents sets the indentation for subsequent Write calls. Multiple indents may be set, corresponding to the indent to use for the corresponding paragraph line. E.g. SetIndents("AA", "BBB", C") means the first line in each paragraph is indented with "AA", the second line in each paragraph is indented with "BBB", and all subsequent lines in each paragraph are indented with "C".

SetIndents() is equivalent to SetIndents(""), SetIndents("", ""), etc.

A new WrapWriter instance has no indents by default.

Calls Flush internally, and returns any Flush error.

func (*WrapWriter) SetLineTerminator

func (w *WrapWriter) SetLineTerminator(term string) error

SetLineTerminator sets the line terminator for subsequent Write calls. Every output line is terminated with term; EOL runes from the input are never written to the output. A new WrapWriter instance uses "\n" as the default line terminator.

Calls Flush internally, and returns any Flush error.

func (*WrapWriter) SetParagraphSeparator

func (w *WrapWriter) SetParagraphSeparator(sep string) error

SetParagraphSeparator sets the paragraph separator for subsequent Write calls. Every consecutive pair of non-empty paragraphs is separated with sep; EOL runes from the input are never written to the output. A new WrapWriter instance uses "\n" as the default paragraph separator.

Calls Flush internally, and returns any Flush error.

func (*WrapWriter) Width

func (w *WrapWriter) Width() int

Width returns the target width in runes. If width < 0 the width is unlimited; each paragraph is output as a single line.

func (*WrapWriter) Write

func (w *WrapWriter) Write(data []byte) (int, error)

Write implements io.Writer by buffering data into the WrapWriter w. Actual writes to the underlying writer may occur, and may include data buffered in either this Write call or previous Write calls.

Flush must be called after the last call to Write.

type WriteFlusher

type WriteFlusher interface {
	io.Writer
	Flush() error
}

WriteFlusher is the interface that groups the basic Write and Flush methods.

Flush is typically provided when Write calls perform buffering; Flush immediately outputs the buffered data. Flush must be called after the last call to Write, and may be called an arbitrary number of times before the last Write.

func PrefixLineWriter

func PrefixLineWriter(w io.Writer, prefix string) WriteFlusher

PrefixLineWriter returns a WriteFlusher that wraps w. Each occurrence of EOL (\f, \n, \r, \v, LineSeparator or ParagraphSeparator) causes the preceding line to be written to w, with the given prefix, in a single Write call. Data without EOL is buffered until the next EOL or Flush call. Flush appends \n to buffered data that doesn't end in EOL.

A single Write call on the returned WriteFlusher may result in zero or more Write calls on the underlying w.

If w implements WriteFlusher, each Flush call on the returned WriteFlusher results in exactly one Flush call on the underlying w.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL