texst

package module
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 8, 2023 License: AGPL-3.0 Imports: 11 Imported by: 0

README

texst – Text Tests

Build codecov Go Report Card Go Reference

Package texst checks text files against a reference text specifications. The simplest reference text would be the verbatim text with each line prefixed with a 'reference text' line tag, e.g. "> ". This would only match exactly the verbatim text. To do more complex matching one can add other line types to the reference text specification.

Line types are recognised by the rune in the first column of each line in the reference text specification. There are line types that serve different purposes.

Most often one might need to mark parts of a reference line that do not need to match exactly to the checked “subject” text. texst does not embed markers into the reference text line because it would need some very sophisticated escaping to make arbitrary reference text feasible. Instead each reference text line may be followed by argument lines, that modify the way the reference text is matched against the checked text. Argument lines start with ' ' (U+0020). Some types of argument lines are used to mark segments of the reference text to not match exactly to the subject text:

> This is some reference text content
 =        xxxx

The above example says that the four runes above the non-space part of the argument line, i.e. "some", are not compared to the checked text. The '=' identifies the specific type of argument line (see Types of argument lines). So the text

This is blue reference text content

would perfectly match the reference text example. Argument lines can be stacked and are applied in order to their reference text line up to the next non-argument line.

> This is some reference text content
 =        xxxx
 =                       yyyy

would be the same as

> This is some reference text content
 =        xxxx           yyyy

For some files, e.g. log files, it would be rather tedious if one had to mark each timestamp in the reference text line:

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

To solve this one can set a global segment line after the preamble and between reference text specifications. For our example one would write:

*=ttt tt tt tt tt ttt
> Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
> Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
> Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

With a little attention, you notice that the log lines are from different threads. I.e. one cannot rely on the order of lines in the reference text specification. But at least the lines from one thread shall be in exactly the same order as given in the reference.

We declare two “interleaving groups” '1' and '2' in the preamble and mark the reference text lines to be in the specific group:

\%12
*=ttt tt tt tt tt ttt
>1Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
>2Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
>1Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

Now, both subjects

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

and

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

match the reference. For more details use the reference documentation.

Documentation

Overview

Package texst compares text files against a reference text specifications. A specification consists of the reference text itself combined with options how the reference text is matched. Lines with reference text are marked with a prefix that has '>' in the first column. The simplest reference text would be the verbatim text with each line prefixed with a "> ". This would only match exactly the verbatim text. To do more complex matching one can add other line types to the reference text specification. Line types are recognized by the rune in the first column of each line in the reference text specification. There are line types that serve different purposes.

Most often one might need to mark parts of a reference line that do not need to match exactly the compared “subject” text. We will call these parts 'masks'. Each reference text line may be followed by argument lines, that define masks and the way the reference text is matched against them. Argument lines start with ' ' (U+0020). There are different types of argument lines, e.g. this one starting with " =":

> This is some reference text content
 =        xxxx

The above example says that the four runes above the non-space part of the argument line, i.e. "some", are not compared to the compared subject text. The second column, here '=', identifies the specific type of argument line, for details see Types of Argument Lines. The text

This is blue reference text content

would perfectly match the reference text example. Argument lines can be stacked and are applied in order to their reference text line up to the next non-argument line.

> This is some reference text content
 =        xxxx
 =                       yyyy

would be the same as

> This is some reference text content
 =        xxxx           yyyy

For some files, e.g. log files, it would be rather tedious if one had to mark each timestamp in the reference text line:

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

To solve this one can set a global mask line after the preamble and between reference text specifications. For our example one would write:

*=ttt tt tt tt tt ttt
> Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
> Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
> Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

With a little attention, you notice that the log lines are from different threads. I.e. one cannot rely on the order of lines in the reference text specification. But at least the lines from one thread shall be in exactly the same order as given in the reference.

For this we declare two “interleaving groups” '1' and '2' in the preamble and mark the reference text lines to be in the specific group:

\%12
*=ttt tt tt tt tt ttt
>1Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
>2Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
>1Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

Now, both subjects

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

and

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

match the reference.

Comparing Subject and Reference

Comparing subject texts is done by scanning the subject text line by line and then matching the current subject line against the reference lines currently in question. For each Interleaving Group there is at most one reference line to be matched. The first successful match of the subject line with a reference line accepts the subject line. Then the matched reference text line is replaced with the next reference text line from the same interleaving group, if any. Afterward scanning continues with the next subject line. Reference lines from different interleaving groups are checked in the same order as they are declared in the preamble.

If the subject line does not match, a mismatch is reported and scanning continues with the next subject text line. One can configure a maximum number of mismatches that is processed before scanning is aborted. By default the complete subject text is scanned.

Types of Argument Lines

TODO: Be more descriptive

There are three types of mask lines:

= Part of subject must have same length as mask
* Part of subject may be of any length, even 0
+ Part of subject may be of any length >0

There are regexp lines to match masked subject parts:

~<n>['['<idx>']'] <regexp>

Preamble Lines

The type of a preamble line is recognized from the rune in the second column of the line, e.g.:

\%<interleaving groups>

This is a preamble line with tag '%' that sets the interleaving groups of the reference text specification. Currently there is not other preamble line type.

Interleaving Groups

Interleaving groups are identified by a single rune and have to be declared upfront in the preamble. If no interleaving group is declared then the interleaving group ' ' (U+0020) is defined by default. A reference text line is assigned to an interleaving group by the rune in the second column of the line. E.g. the lines

\% a
> 1st reference text line
>a2nd reference text line

put the reference text "1st reference text line" into the interleaving group ' ' and the reference text "2nd reference text line" into the interleaving group 'a'. Because not only the default group ' ' is used, the groups had to be declared in the preamble line.

TODO: What do these groups do (see "Matching Reference Lines")? => Ambiguities & Order of IGroups

Example
cmpr := Compare{
	OnMismatch: func(sn int, s string, refs []*RefLine) bool {
		for _, ref := range refs {
			fmt.Printf("mismatch %d/%d: '%s' / '%s'\n",
				sn, ref.Line(),
				s, ref.Text())
		}
		return false
	},
}
err := cmpr.Strings(`\%12
*=ttt tt tt tt tt ttt
>1Jun 27 21:58:11.112 INFO  [thread1] create localization dir:test1/test.xCuf/l10n
 +                                                                       xxxx
>2Jun 27 21:58:11.113 INFO  [thread2] load state from file:test1/test.xCuf/bcplus.json
 +                                                                    xxxx
>1Jun 27 18:58:11.125 DEBUG [thread1] clearing maps`,
	`Jun 27 21:58:11.112 INFO  [thread1] create localization dir:test1/test.RnD/l10n
Jun 27 18:58:11.125 DEBUG [thread1] clearing MAPS
Jun 27 21:58:11.113 INFO  [thread2] load state from file:test1/test.Rnd/bcplus.json`,
)
fmt.Println(err)
Output:

mismatch 2/7: 'Jun 27 18:58:11.125 DEBUG [thread1] clearing MAPS' / 'Jun 27 18:58:11.125 DEBUG [thread1] clearing maps'
mismatch 2/5: 'Jun 27 18:58:11.125 DEBUG [thread1] clearing MAPS' / 'Jun 27 21:58:11.113 INFO  [thread2] load state from file:test1/test.xCuf/bcplus.json'
1 mismatch

Index

Examples

Constants

View Source
const (
	// The masked part of the subject must have the same length as the reference
	// line segment.
	ArgMaskExact = '='
	// The masked part of the subject can be of any length, even zero.
	ArgMaskOpt = '*'
	// The masked part of the subject can be of any length greater than zero.
	ArgMaskVar = '+'
	// Match masked parts against a regular expression. TODO syntax of the line…
	ArgRegexp = '~'
)

Types of argument lines

View Source
const (
	// Marks a comment line.
	TagComment = '#'
	// Preamble lines must be the first lines of a reference text specification.
	TagPreamble = '\\'
	// Global segment lines set/clear file-global tags.
	TagGlobalSeg = '*'
	// Reference lines have the text that is compared to the subject text.
	TagRefLine = '>'
	// Argument lines apply to the most recent '>' reference line up to the next
	// non-argument line.
	TagRefArgs = ' '
)

Line Tags

View Source
const (
	// Define interleaving groups in the preamble.
	PreIGroups = '%'
)

Types of preamble lines

Variables

This section is empty.

Functions

func Prepare added in v0.5.0

func Prepare(prepared io.Writer, subj io.Reader) (err error)

func PrepareFile added in v0.5.0

func PrepareFile(prepared string, subj io.Reader) error

Types

type Compare

type Compare struct {
	// Specifies the number of detected mismatches after which the comparison
	// is aborted. If MismatchLimit == 0, do not abort.
	MismatchLimit int
	// OnMismatch is called on each detected mismatch
	OnMismatch MismatchFunc
	// contains filtered or unexported fields
}

Compare performs the comparison of a subject text against a reference text specification. A zero value is valid for use and can be reused for more than one comparison. It must not be used concurrently.

func (*Compare) Readers

func (cmpr *Compare) Readers(ref, subj io.Reader) error

Readers compares the reference text and subject text from the io.Readers 'ref' and 'subj'. If 'onmiss' is not nil it will be called on each detected mismatch. The number of detected mismatches will be returned as MismatchCount error or as nil if no mismatch and no other error occurs. Errors regarding read operations or syntax errors in 'ref' or 'subj' will terminate the comparison immediately and be returned as RefError or SubjError, depending on the source of error.

func (*Compare) RefFile added in v0.4.0

func (cmpr *Compare) RefFile(refname string, subj io.Reader) error

func (*Compare) Strings

func (cmpr *Compare) Strings(ref, subj string) error

Strings compares the reference text and subject text from the strings 'ref' and 'subj'. For more detail read Readers documentation.

type LineSepScanner added in v0.5.0

type LineSepScanner []byte

func (*LineSepScanner) ScanLines added in v0.5.0

func (lsc *LineSepScanner) ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error)

type MismatchCount

type MismatchCount int

MismatchCount is the error used to report the total number of mismatches detected during a Compare run.

func (MismatchCount) Error

func (mc MismatchCount) Error() string

type MismatchFunc

type MismatchFunc func(slineno int, sline string, refs []*RefLine) (abort bool)

MismatchFunc is called for each mismatch in the subject text during comparison. It gets the respective line number 'slineno' in the subject file, the text line 'sline' and the reference lines of each interleaving group that were matched against the subject line.

If the MismatchFunc returns 'abort' == true the comparison terminates immediately.

type RefError

type RefError struct {
	Line int
	// contains filtered or unexported fields
}

RefError is returned for errors during processing of the reference file.

func (RefError) Error

func (e RefError) Error() string

func (RefError) Unwrap

func (e RefError) Unwrap() error

type RefLine

type RefLine struct {
	icontainer.SListNode[*RefLine]
	// contains filtered or unexported fields
}

RefLine represents a line of reference text with its arguments. API users will get current reference lines when using the MismatchFunc.

func (*RefLine) IGroup

func (rl *RefLine) IGroup() rune

IGroup returns the name of the line's interleaving group.

func (*RefLine) Line added in v0.3.1

func (rl *RefLine) Line() int

Line returns the line number in the refrecence text specification.

func (*RefLine) Text

func (rl *RefLine) Text() string

Text returns the verbatim reference text.

type SubjError

type SubjError struct {
	Line int
	// contains filtered or unexported fields
}

SubjError is returned for errors during processing of the subject file.

func (SubjError) Error

func (e SubjError) Error() string

func (SubjError) Unwrap

func (e SubjError) Unwrap() error

Directories

Path Synopsis
cmd
texst
A command line tool to use text tests
A command line tool to use text tests
Package texsting supports the use of texst in your Go tests.
Package texsting supports the use of texst in your Go tests.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL