texst

package module

v0.5.0 Latest Latest Go to latest Published: May 29, 2021 License: AGPL-3.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fractalqb/texst

Links

Open Source Insights

README ¶

texst – Text Tests

Package texst checks text files against a reference text specifications. The simplest reference text would be the verbatim text with each line prefixed with a 'reference text' line tag, e.g. "> ". This would only match exactly the verbatim text. To do more complex matching one can add other line types to the reference text specification.

Line types are recognised by the rune in the first column of each line in the reference text specification. There are line types that serve different purposes.

Most often one might need to mark parts of a reference line that do not need to match exactly to the checked “subject” text. texst does not embed markers into the reference text line because it would need some very sophisticated escaping to make arbitrary reference text feasible. Instead each reference text line may be followed by argument lines, that modify the way the reference text is matched against the checked text. Argument lines start with ' ' (U+0020). Some types of argument lines are used to mark segments of the reference text to not match exactly to the subject text:

> This is some reference text content
 =        xxxx

The above example says that the four runes above the non-space part of the argument line, i.e. "some", are not compared to the checked text. The '=' identifies the specific type of argument line (see Types of argument lines). So the text

This is blue reference text content

would perfectly match the reference text example. Argument lines can be stacked and are applied in order to their reference text line up to the next non-argument line.

> This is some reference text content
 =        xxxx
 =                       yyyy

would be the same as

> This is some reference text content
 =        xxxx           yyyy

For some files, e.g. log files, it would be rather tedious if one had to mark each timestamp in the reference text line:

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

To solve this one can set a global segment line after the preamble and between reference text specifications. For our example one would write:

*=ttt tt tt tt tt ttt
> Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
> Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
> Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

With a little attention, you notice that the log lines are from different threads. I.e. one cannot rely on the order of lines in the reference text specification. But at least the lines from one thread shall be in exactly the same order as given in the reference.

We declare two “interleaving groups” '1' and '2' in the preamble and mark the reference text lines to be in the specific group:

\%12
*=ttt tt tt tt tt ttt
>1Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
>2Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
>1Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

Now, both subjects

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

and

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

match the reference. For more details use the reference documentation.

Documentation ¶

Overview ¶

Package texst compares text files against a reference text specifications. The simplest reference text would be the verbatim text with each line prefixed with a 'reference text' line tag, e.g. "> ". This would only match exactly the verbatim text. To do more complex matching one can add other line types to the reference text specification. Line types are recognised by the rune in the first column of each line in the reference text specification. There are line types that serve different purposes.

Most often one might need to mark parts of a reference line that do not need to match exactly the checked “subject” text. We will call these parts 'masks'. texst does not embed markers into the reference text line to identify masks because it would need some very sophisticated escaping to make arbitrary reference text feasible. Instead each reference text line may be followed by argument lines, that define masks and the way the reference text is matched against them. Argument lines start with ' ' (U+0020). There are different types of argument lines, e.g. this one starting with " =":

> This is some reference text content
 =        xxxx

The above example says that the four runes above the non-space part of the argument line, i.e. "some", are not compared to the subject text. The second column, here '=', identifies the specific type of argument line, for details see Types of argument lines. The text

This is blue reference text content

would perfectly match the reference text example. Argument lines can be stacked and are applied in order to their reference text line up to the next non-argument line.

> This is some reference text content
 =        xxxx
 =                       yyyy

would be the same as

> This is some reference text content
 =        xxxx           yyyy

For some files, e.g. log files, it would be rather tedious if one had to mark each timestamp in the reference text line:

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

To solve this one can set a global mask line after the preamble and between reference text specifications. For our example one would write:

*=ttt tt tt tt tt ttt
> Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
> Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
> Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

With a little attention, you notice that the log lines are from different threads. I.e. one cannot rely on the order of lines in the reference text specification. But at least the lines from one thread shall be in exactly the same order as given in the reference.

For this we declare two “interleaving groups” '1' and '2' in the preamble and mark the reference text lines to be in the specific group:

\%12
*=ttt tt tt tt tt ttt
>1Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
>2Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
>1Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

Now, both subjects

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

and

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

match the reference.

Matching Reference Lines ¶

Comparing subject texts is done by scanning the subject text line by line and then matching the current subject line against the reference lines currently in question. For each interleaving group there is at most one reference line to be matched. The first successful match of the subject line with a reference line accepts the subject line. Then the matched reference text line is replaced with the next reference text line from the same interleaving group, if any. Afterward scanning continues with the next subject line. Reference lines from different interleaving groups are checked in the same order as they are declared in the preamble.

If the subject line does not match, a mismatch is reported and scanning continues with the next subject text line. One can configure a maximum number of mismatches that is processed before scanning is aborted. By default the complete subject text is scanned.

Preamble Lines ¶

The type of a preamble line is recognized from the rune in the second column of the line, e.g.:

\%<interleaving groups>

This is a preamble line with tag '%' that sets the interleaving groups of the reference text specification. Currently there is not other preamble line type.

Interleaving Groups ¶

Interleaving groups are identified by a single rune and have to be declared upfront in the preamble. If no interleaving group is declared then the interleaving group ' ' (U+0020) is defined by default. A reference text line is assigned to an interleaving group by the rune in the second column of the line. E.g. the lines

\% a
> 1st reference text line
>a2nd reference text line

put the reference text "1st reference text line" into the interleaving group ' ' and the reference text "2nd reference text line" into the interleaving group 'a'. Because not only the default group ' ' is used, the groups had to be declared in the preamble line.

TODO: What do these groups do (see "Matching Reference Lines")? => Ambiguities & Order of IGroups

Example ¶

cmpr := Compare{
	OnMismatch: func(sn int, s string, refs []*RefLine) bool {
		for _, ref := range refs {
			fmt.Printf("mismatch %d/%d: '%s' / '%s'\n",
				sn, ref.Line(),
				s, ref.Text())
		}
		return false
	},
}
err := cmpr.Strings(`\%12
*=ttt tt tt tt tt ttt
>1Jun 27 21:58:11.112 INFO  [thread1] create localization dir:test1/test.xCuf/l10n
 +                                                                       xxxx
>2Jun 27 21:58:11.113 INFO  [thread2] load state from file:test1/test.xCuf/bcplus.json
 +                                                                    xxxx
>1Jun 27 18:58:11.125 DEBUG [thread1] clearing maps`,
	`Jun 27 21:58:11.112 INFO  [thread1] create localization dir:test1/test.RnD/l10n
Jun 27 18:58:11.125 DEBUG [thread1] clearing MAPS
Jun 27 21:58:11.113 INFO  [thread2] load state from file:test1/test.Rnd/bcplus.json`,
)
fmt.Println(err)

Output:

mismatch 2/7: 'Jun 27 18:58:11.125 DEBUG [thread1] clearing MAPS' / 'Jun 27 18:58:11.125 DEBUG [thread1] clearing maps'
mismatch 2/5: 'Jun 27 18:58:11.125 DEBUG [thread1] clearing MAPS' / 'Jun 27 21:58:11.113 INFO  [thread2] load state from file:test1/test.xCuf/bcplus.json'
1 mismatch

Index ¶

Constants
func Prepare(prepared io.Writer, subj io.Reader) (err error)
func PrepareFile(prepared string, subj io.Reader) error
type Compare
type LineSepScanner
- func (lsc *LineSepScanner) ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error)
type MismatchCount
- func (mc MismatchCount) Error() string
type MismatchFunc
type RefError
- func (e RefError) Error() string
- func (e RefError) Unwrap() error
type RefLine
type SubjError
- func (e SubjError) Error() string
- func (e SubjError) Unwrap() error

Examples ¶

Package

Constants ¶

View Source

const (
	// The masked part of the subject must have the same length as the reference
	// line segment.
	ArgMaskExact = '='
	// The masked part of the subject can be of any length, even zero.
	ArgMaskOpt = '*'
	// The masked part of the subject can be of any length greater than zero.
	ArgMaskVar = '+'
	// Match masked parts against a regular expression. TODO syntax of the line…
	ArgRegexp = '~'
)

Types of argument lines

View Source

const (
	// Marks a comment line.
	TagComment = '#'
	// Preamble lines must be the first lines of a reference text specification.
	TagPreamble = '\\'
	// Global segment lines set/clear file-global tags.
	TagGlobalSeg = '*'
	// Reference lines have the text that is compared to the subject text.
	TagRefLine = '>'
	// Argument lines apply to the most recent '>' reference line up to the next
	// non-argument line.
	TagRefArgs = ' '
)

Line Tags

View Source

const (
	// Define interleaving groups in the preamble.
	PreIGroups = '%'
)

Types of preamble lines

Variables ¶

This section is empty.

Functions ¶

func Prepare ¶ added in v0.5.0

func Prepare(prepared io.Writer, subj io.Reader) (err error)

func PrepareFile ¶ added in v0.5.0

func PrepareFile(prepared string, subj io.Reader) error

Types ¶

type Compare ¶

type Compare struct {
	// Specifies the number of detected mismatches after which the comparison
	// is aborted. If MismatchLimit == 0, do not abort.
	MismatchLimit int
	// OnMismatch is called on each detected mismatch
	OnMismatch MismatchFunc
	// contains filtered or unexported fields
}

Compare performs the comparison of a subject text against a reference text specification. A zero value is valid for use and can be reused for more than one comparison. It must not be used concurrently.

func (*Compare) Readers ¶

func (cmpr *Compare) Readers(ref, subj io.Reader) error

Readers compares the reference text and subject text from the io.Readers 'ref' and 'subj'. If 'onmiss' is not nil it will be called on each detected mismatch. The number of detected mismatches will be returned as MismatchCount error or as nil if no mismatch and no other error occurs. Errors regarding read operations or syntax errors in 'ref' or 'subj' will terminate the comparison immediately and be returned as RefError or SubjError, depending on the source of error.

func (*Compare) RefFile ¶ added in v0.4.0

func (cmpr *Compare) RefFile(refname string, subj io.Reader) error

func (*Compare) Strings ¶

func (cmpr *Compare) Strings(ref, subj string) error

Strings compares the reference text and subject text from the strings 'ref' and 'subj'. For more detail read Readers documentation.

type LineSepScanner ¶ added in v0.5.0

type LineSepScanner []byte

func (*LineSepScanner) ScanLines ¶ added in v0.5.0

func (lsc *LineSepScanner) ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error)

type MismatchCount ¶

type MismatchCount int

MismatchCount is the error used to report the total number of mismatches detected during a Compare run.

func (MismatchCount) Error ¶

func (mc MismatchCount) Error() string

type MismatchFunc ¶

type MismatchFunc func(slineno int, sline string, refs []*RefLine) (abort bool)

MismatchFunc is called for each mismatch in the subject text during comparison. It gets the respective line number 'slineno' in the subject file, the text line 'sline' and the reference lines of each interleaving group that were matched against the subject line.

If the MismatchFunc returns 'abort' == true the comparison terminates immediately.

type RefError ¶

type RefError struct {
	Line int
	// contains filtered or unexported fields
}

RefError is returned for errors during processing of the reference file.

func (RefError) Error ¶

func (e RefError) Error() string

func (RefError) Unwrap ¶

func (e RefError) Unwrap() error

type RefLine ¶

type RefLine struct {
	// contains filtered or unexported fields
}

RefLine represents a line of reference text with its arguments. API users will get current reference lines when using the MismatchFunc.

func (*RefLine) IGroup ¶

func (rl *RefLine) IGroup() rune

IGroup returns the name of the line's interleaving group.

func (*RefLine) Line ¶ added in v0.3.1

func (rl *RefLine) Line() int

Line returns the line number in the refrecence text specification.

func (*RefLine) ListNext ¶

func (rl *RefLine) ListNext() islist.Node

ListNext to implement intrusive singly linked list

func (*RefLine) SetListNext ¶

func (rl *RefLine) SetListNext(n islist.Node)

SetListNext to implement intrusive singly linked list

func (*RefLine) Text ¶

func (rl *RefLine) Text() string

Text returns the verbatim reference text.

type SubjError ¶

type SubjError struct {
	Line int
	// contains filtered or unexported fields
}

SubjError is returned for errors during processing of the subject file.

func (SubjError) Error ¶

func (e SubjError) Error() string

func (SubjError) Unwrap ¶

func (e SubjError) Unwrap() error

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
texst command A command line tool to use text tests	A command line tool to use text tests
texsting Package texsting supports the use of texst in your Go tests.	Package texsting supports the use of texst in your Go tests.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL