linewrap

package module
v0.0.0-...-092733b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 17, 2017 License: Apache-2.0 Imports: 3 Imported by: 0

README

linewrap

GoDocBuild Status
Wraps either a string or a byte slice so that each line doesn't exceed the specified number of characters. A character is defined as a unicode code point, not a byte. Any \r in the input will be elided.

Trailing and leading spaces on wrapped lines are elided.

Linewrap can also indent wrapped lines or format the input as comments:

#      line comment
//     line comment
/* */  block comment

References used:

Jukka "Yucca" Korpela's unicode tables were used as the main references especially his line break page: https://www.cs.tut.fi/~jkorpela/unicode/linebr.html and his unicode spaces and dash pages listed in their respective sections in this document.

In addition, some of the symbols were pulled from http://www.unicode.org/reports/tr14/#Properties.

The list of symbols handled is not exhaustive.

Hyphen and spaces

Linewrap will wrap lines on most unicode whitespace and dash characters, with some exceptions. Characters in the not considered list will not be considered points at which the input can be wrapped. If there are any characters that are unaccounted for, please file an issue or make a pull request. Before doing so, check the docs and/or the code to see if it has been already listed as an exception.

The \n and \t characters are handled separately. The width used for tabs is set by Wrap.TabSize(int), which defaults to 8 spaces.

Spaces

Whitespace tokens are mostly from https://www.cs.tut.fi/~jkorpela/chars/spaces.html

Not considered whitespace characters:
code point symbol name
U+00A0 no-break space
U+202F zero width no-break space
Whitespace characters

code point|symbol name
--|:--|:--
U+0020|space
U+1680|ogham space mark
U+180E|mongolian vowel separator
U+2000|en quad
U+2001|em quad
U+2002|en space
U+2003|em space
U+2004|three per em space
U+2005|four per em space
U+2006|six per em space
U+2007|figure space
U+2008|punctuation space
U+2009|thin space
U+200A|hair space
U+200B|zero width space
U+205F|medium mathematical space
U+3000|ideographic space

Dashes (Hyphens)

Dash tokens are mostly from dash tokens from https://www.cs.tut.fi/~jkorpela/dashes.html

Additional explanations to entries in the tables: The em dash (U+2014) symbol can have a break before or after its occurrence but linewrap only breaks after its occurrence.

The hyphen minus (U+002D) is not supposed to break on a numeric context but linewrap does not make such a differentiation.

Dash characters not considered dashes
code point symbol name
U+007E tilde
U+2212 minus sign
U+301C wavy dash
U+3939 wavy dash
U+1806 mongolian todo hyphen
Dash characters
code point symbol name
U+002D hyphen minus
U+00AD soft hyphen
U+058A armenian hyphen
U+2010 hyphen
U+2012 figure dash
U+2013 en dash
U+2014 em dash
U+2015 horizontal bar
U+2053 swung dash
U+207B superscript mnus
U+208B subscript minus
U+2E3A two em dash
U+2E3B three em dash
U+FE31 presentation form for vertical em dash
U+FE32 presentation form for vertical en dash
U+FE58 small em dash
U+FE63 small hyphen minus
U+FF0D full width hyphen minus

Documentation

Overview

Package linewrap wraps text so that they are n characters, or less in length. Wrapped lines can be indented or turned into comments; c, c++, and shell style comments are supported.

Any /r characters encountered will be elided during the wrapping process; only /n is supported for new lines.

The size of tabs is configurable.

With a few exceptions, lines can be wrapped at unicode dash and whitespace characters.

The classification of unicode tokens is drawn from Jukka "Yucca" Korpela's unicode tables on: https://www.cs.tut.fi/~jkorpela/unicode/linebr.html, https://www.cs.tut.fi/~jkorpela/chars/spaces.html, and https://www.cs.tut.fi/~jkorpela/dashes.html. Additionally, information from http://www.unicode.org/reports/tr14/#Properties was used.

The list of symbols handled is not exhaustive.

Line breaks may be inserted before or after whitespace characters. Any trailing spaces on a line will be elided. With the exception of indentation, all leading whitespaces on a wrapped line will be elided.

space                      U+0020
ogham space mark           U+1680
mongolian vowel separator  U+180E
en quad                    U+2000
em quad                    U+2001
en space                   U+2002
em space                   U+2003
three per em space         U+2004
four per em space          U+2005
six per em space           U+2006
figure space               U+2007
punctuation space          U+2008
thin space                 U+2009
hair space                 U+200A
zero width space           U+200B
medium mathematical space  U+205F
ideographic space          U+3000

Exceptions to whitespace characters (no break will occur):

no-break space             U+00A0
zero width no-break space  U+202F

Line breaks may be inserted after a dash (hyphen) character. An em dash (U+2014) can have a break before or after its occurrence but linewrap will only break after its occurrence. A hyphen minus (U+002D) is not supposed to break on a numeric context but linewrap does not make that differentiation.

hyphen minus                            U+002D
soft hyphen                             U+00AD
armenian hyphen                         U+058A
hyphen                                  U+2010
figure dash                             U+2012
en dash                                 U+2013
em dash                                 U+2014
horizontal bar                          U+2015
swung dash                              U+2053
superscript mnus                        U+207B
subscript minus                         U+208B
two em dash                             U+2E3A
three em dash                           U+2E3B
presentation form for vertical em dash  U+FE31
presentation form for vertical en dash  U+FE32
small em dash                           U+FE58
small hyphen minus                      U+FE63
full width hyphen minus                 U+FF0D

Exceptions to dash characters (no break will occur):

tilde                  U+007E
minus sign             U+2212
wavy dash              U+301C
wavy dash              U+3939
mongolian todo hyphen  U+1806

Index

Constants

View Source
const (
	LineLength = 80 // default line length
	TabSize    = 8  // default tab size
)

Variables

This section is empty.

Functions

This section is empty.

Types

type CommentStyle

type CommentStyle int
const (
	NoComment    CommentStyle = iota
	CPPComment                // C++ style line comment: //
	ShellComment              // shell style line comment: #
	CComment                  // c style block comment: /* */
)

func ParseCommentStyle

func ParseCommentStyle(s string) CommentStyle

func (CommentStyle) String

func (c CommentStyle) String() string

type Pos

type Pos int

Pos is a byte position in the original input text.

type Wrapper

type Wrapper struct {
	Length int // Max length of the line.

	CommentStyle // the type of comment,
	// contains filtered or unexported fields
}

Wrapper wraps lines so that the output is lines of Length characters or less.

func New

func New() *Wrapper

New returns a new Wrap with default Length and TabWidth.

func (*Wrapper) Bytes

func (w *Wrapper) Bytes(s []byte) (b []byte, err error)

Wrap bytes and return the wrapped bytes

func (*Wrapper) IndentText

func (w *Wrapper) IndentText(s string)

IndentText sets the value that should be used to indent wrapped lines.

func (*Wrapper) Reset

func (w *Wrapper) Reset()

Reset resets the non-configuration fields so that it's usable for a new input. The Wrapper's configuration is not affected.

func (*Wrapper) String

func (w *Wrapper) String(s string) (string, error)

String returns a wrapped string. The resulting string will be consistent with Wrap's configuration.

func (*Wrapper) TabSize

func (w *Wrapper) TabSize(i int)

Sets the tabsize for line length calculations, when a tab is encountered. Actual tabsize may vary. See TabSize for the default value.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL