texst

package module

v0.9.3 Latest Latest Go to latest Published: Oct 2, 2024 License: AGPL-3.0 Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fractalqb/texst

Links

Open Source Insights

README ¶

texst – Text Tests

Package texst checks text files against a reference text specifications. The simplest reference text would be the verbatim text with each line prefixed with a 'reference text' line tag, e.g. "> ". This would only match exactly the verbatim text. To do more complex matching one can add other line types to the reference text specification.

Line types are recognised by the rune in the first column of each line in the reference text specification. There are line types that serve different purposes.

Most often one might need to mark parts of a reference line that do not need to match exactly to the checked “subject” text. texst does not embed markers into the reference text line because it would need some very sophisticated escaping to make arbitrary reference text feasible. Instead each reference text line may be followed by argument lines, that modify the way the reference text is matched against the checked text. Argument lines start with ' ' (U+0020). Some types of argument lines are used to mark segments of the reference text to not match exactly to the subject text:

> This is some reference text content
 .        xxxx

The above example says that the four runes above the non-space part of the argument line, i.e. "some", are not compared to the checked text. The '.' identifies the specific type of argument line (see Types of argument lines). So the text

This is blue reference text content

would perfectly match the reference text example. Argument lines can be stacked and are applied in order to their reference text line up to the next non-argument line.

> This is some reference text content
 .        xxxx
 .                       yyyy

would be the same as

> This is some reference text content
 .        xxxx           yyyy

For some files, e.g. log files, it would be rather tedious if one had to mark each timestamp in the reference text line:

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

To solve this one can set a global mask line after the preamble and between reference text specifications. For our example one would write:

*.ttt tt tt tt tt ttt
> Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
> Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
> Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

With a little attention, you notice that the log lines are from different threads. I.e. one cannot rely on the order of lines in the reference text specification. But at least the lines from one thread shall be in exactly the same order as given in the reference.

We declare two “interleaving groups” '1' and '2' in the preamble and mark the reference text lines to be in the specific group:

%%12
*.ttt tt tt tt tt ttt
>1Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
>2Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
>1Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

Now, both subjects

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

and

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

match the reference. For more details use the reference documentation.

Documentation ¶

Overview ¶

Package texst compares text files against a reference text specifications. A specification consists of the reference text itself combined with options how the reference text is matched. Lines with reference text are marked with a prefix that has '>' in the first column. The simplest reference text would be the verbatim text with each line prefixed with a "> ". This would only match exactly the verbatim text. To do more complex matching one can add other line types to the reference text specification. Line types are recognized by the rune in the first column of each line in the reference text specification. There are line types that serve different purposes.

Most often one might need to mark parts of a reference line that do not need to match exactly the compared “subject” text. We will call these markers 'masks'. Each reference text line may be followed by argument lines, that define masks and the way the reference text is matched against them. Argument lines start with ' ' (U+0020). There are different types of argument lines, e.g. this one starting with " .":

> This is some reference text content
 .        xxxx

The above example says that the four runes above the non-space part of the argument line, i.e. "some", are not compared to the compared subject text. The second column, here '.', identifies the specific type of argument line, for details see Types of Argument Lines. The text

This is blue reference text content

would perfectly match the reference text example. Argument lines can be stacked and are applied in order to their reference text line up to the next non-argument line.

> This is some reference text content
 .        xxxx
 .                       yyyy

would be the same as

> This is some reference text content
 .        xxxx           yyyy

For some files, e.g. log files, it would be rather tedious if one had to mark each timestamp in the reference text line:

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

To solve this one can set a global mask line in the preamble and. For our example one would write:

*.ttt tt tt tt tt ttt
> Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
> Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
> Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

With a little attention, you notice that the log lines are from different threads. I.e. one cannot rely on the order of lines in the reference text specification. But at least the lines from one thread shall be in exactly the same order as given in the reference.

For this we declare two “interleaving groups” '1' and '2' in the preamble and mark the reference text lines to be in the specific group:

%%12
*.ttt tt tt tt tt ttt
>1Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
>2Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
>1Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
> …

Now, both subject texts

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
…

and

Jun 27 21:58:11.112 INFO  [thread1] create `localization dir:test1/test.xCuf/l10n`
Jun 27 18:58:11.125 DEBUG [thread1] clearing maps
Jun 27 21:58:11.113 INFO  [thread2] load state from `file:test1/test.xCuf/bcplus.json`
…

match the reference text.

Comparing Subject and Reference ¶

Comparing subject texts is done by scanning the subject text line by line and then matching the current line against the reference lines currently in question. For each Interleaving Group there is at most one reference line in question. The first successful match of the subject line with a reference line accepts the subject line. Then the matched reference text line is replaced with the next reference text line from the same interleaving group, if any. Afterward scanning continues with the next subject line. Reference lines from different interleaving groups are checked in the same order as they are declared in the preamble.

If the subject line does not match, a mismatch is reported and scanning continues with the next subject text line. One can configure a maximum number of mismatches that is processed before scanning is aborted. By default the complete subject text is scanned.

Types of Argument Lines ¶

TODO: Be more descriptive

There are the following types of mask lines:

. Part of subject must have same length as mask
* Part of subject may be of any length, even 0
+ Part of subject may be of any length >0
0 Part of subject may be of any length up to the length of the mask
1 Part of subject may be of any length >0 up to the length of the mask
- Part of subject must be at least as long as the mask

Preamble Lines ¶

The preamble ends with the first reference line. The type of a preamble line is recognized from the rune in the first column of the line, e.g.:

%%<interleaving groups>

This is a preamble line with tag '%' that sets the interleaving groups of the reference text specification.

A global mask template can be defined with the preamble line type '*' similar as the masks of a reference line are defined:

*.xxx yyy
*-        zzzzz

Interleaving Groups ¶

Interleaving groups are identified by a single rune and have to be declared upfront in the preamble. If no interleaving group is declared then the interleaving group ' ' (U+0020) is defined by default. A reference text line is assigned to an interleaving group by the rune in the second column of the line. E.g. the lines

%% a
> 1st reference text line
>a2nd reference text line

put the reference text "1st reference text line" into the interleaving group ' ' and the reference text "2nd reference text line" into the interleaving group 'a'. Because not only the default group ' ' is used, the groups had to be declared in the preamble line.

Interleaving groups are defined once in the preamble in a line starting with '%%'.

TODO: What do these groups do (see "Matching Reference Lines")? => Ambiguities & Order of IGroups

Index ¶

Constants
type Mask
type MatchFunc
type MismatchFunc
type Prepare
- func (p Prepare) Text(ref io.Writer, subj io.Reader) (err error)
type RefDoc
type RefLine
type RefReader
type SegChecker
type Texst
- func (txs *Texst) Check(reference RefDoc, subject io.Reader) (mismatchCount int, err error)

Constants ¶

View Source

const (
	// Marks a comment line.
	TagComment = '#'

	// Preamble line regarding interleaving groups.
	TagIGroup = '%'

	// Global argument line
	TagGlobalArg = '*'

	// Reference lines have the text that is compared to the subject text.
	TagRefLine = '>'

	// Argument lines apply to the most recent '>' reference line up to the next
	// non-argument line.
	TagRefLineArg = ' '
)

Line Tags

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Mask ¶ added in v0.9.2

type Mask struct {
	// contains filtered or unexported fields
}

func (*Mask) Len ¶ added in v0.9.2

func (s *Mask) Len() int

func (*Mask) Start ¶ added in v0.9.2

func (s *Mask) Start() int

func (*Mask) String ¶ added in v0.9.2

func (s *Mask) String() string

type MatchFunc ¶ added in v0.9.2

type MatchFunc func(testedNo int, testedLine []byte, ref *RefLine, match []int)

type MismatchFunc ¶

type MismatchFunc func(testedNo int, testedLine []byte, ref []*RefLine)

type Prepare ¶ added in v0.5.0

type Prepare struct {
	DefaultIGroup rune
}

func (Prepare) Text ¶ added in v0.9.2

func (p Prepare) Text(ref io.Writer, subj io.Reader) (err error)

type RefDoc ¶ added in v0.9.2

type RefDoc interface {
	Name() string
	Line() int
	IGroups() []rune
	NextLine() (*RefLine, error)
	FreeLine(*RefLine)
}

type RefLine ¶

type RefLine struct {
	// contains filtered or unexported fields
}

func (*RefLine) IGroup ¶

func (rl *RefLine) IGroup() rune

func (*RefLine) Masks ¶ added in v0.9.2

func (rl *RefLine) Masks() []*Mask

func (*RefLine) Regexp ¶ added in v0.9.2

func (rl *RefLine) Regexp() string

func (*RefLine) SourceLine ¶ added in v0.9.2

func (rl *RefLine) SourceLine() int

func (*RefLine) SourceName ¶ added in v0.9.2

func (rl *RefLine) SourceName() string

func (*RefLine) Text ¶

func (rl *RefLine) Text() string

type RefReader ¶ added in v0.9.2

type RefReader struct {
	// contains filtered or unexported fields
}

Example ¶

ref, err := NewRefString("test text",
	`> foo bar baz
 .    xxx`)
if err != nil {
	fmt.Println(err)
	return
}
rl, err := ref.NextLine()
if err != nil && !errors.Is(err, io.EOF) {
	fmt.Println(err)
	return
}
fmt.Println(rl.regexp())
fmt.Println(ref.NextLine())

Output:

^foo (.{3}) baz$
<nil> test text:2:EOF

func NewRefReader ¶ added in v0.9.2

func NewRefReader(name string, r io.Reader) (*RefReader, error)

func NewRefString ¶ added in v0.9.2

func NewRefString(name, texts string) (*RefReader, error)

func OpenRefFile ¶ added in v0.9.2

func OpenRefFile(file string) (*RefReader, error)

func (*RefReader) Close ¶ added in v0.9.2

func (rr *RefReader) Close() error

func (*RefReader) FreeLine ¶ added in v0.9.2

func (rr *RefReader) FreeLine(rl *RefLine)

func (*RefReader) IGroups ¶ added in v0.9.2

func (rr *RefReader) IGroups() []rune

func (*RefReader) Line ¶ added in v0.9.2

func (rr *RefReader) Line() int

func (*RefReader) Name ¶ added in v0.9.2

func (rr *RefReader) Name() string

func (*RefReader) NextLine ¶ added in v0.9.2

func (rr *RefReader) NextLine() (*RefLine, error)

type SegChecker ¶ added in v0.9.2

type SegChecker interface {
	Check(seg []byte) error
}

type Texst ¶ added in v0.9.2

type Texst struct {
	MismatchLimit int
	OnMismatch    MismatchFunc
	OnMatch       MatchFunc
}

Example ¶

ref, err := NewRefString("test text", `> foo bar baz
 .    xxx`)
if err != nil {
	fmt.Println(err)
	return
}
vrf := Texst{
	OnMismatch: func(tiLNo int, tiLine []byte, _ []*RefLine) {
		fmt.Printf("input:%d [%s]\n", tiLNo, tiLine)
	},
	OnMatch: func(tiLNo int, line []byte, _ *RefLine, match []int) {
		txt := func(i int) string {
			i *= 2
			part := line[match[i]:match[i+1]]
			return string(part)
		}
		fmt.Printf("match:%d [%s]:", tiLNo, txt(0))
		for i := range (len(match) / 2) - 1 {
			fmt.Printf(" %d=[%s]", i, txt(i+1))
		}
		fmt.Println()
	},
}
mismatchCount, err := vrf.Check(ref, strings.NewReader(
	`foo bar baz`,
))
if err != nil {
	fmt.Println(err)
} else {
	fmt.Printf("%d mismatches\n", mismatchCount)
}

Output:

match:1 [foo bar baz]: 0=[bar]
0 mismatches

func (*Texst) Check ¶ added in v0.9.2

func (txs *Texst) Check(reference RefDoc, subject io.Reader) (mismatchCount int, err error)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
texst command A command line tool to use text tests	A command line tool to use text tests
texsting Package texsting supports the use of texst in your Go tests.	Package texsting supports the use of texst in your Go tests.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL