regexpscanner

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2024 License: GPL-3.0 Imports: 3 Imported by: 0

README

regexpscanner

import "github.com/tonymet/regexpscanner"

©️ 2024 Anthony Metzidis

regexpscanner -- stream-based scanner and regex-based tokenizer in one.

scans io.Reader streams and returns matching tokens

Index

func MakeScanner

func MakeScanner(in io.Reader, re *regexp.Regexp) *bufio.Scanner

MakeScanner creates a scanner you can call scanner.Scan() and scanner.Text() with.

Calling scanner.Scan() && scanner.Text() will return the latest token matching the regex in the stream.

Example

use MakeScanner to create a scanner that will tokenize using the regex

package main

import (
	"fmt"
	"regexp"
	"strings"

	rs "github.com/tonymet/regexpscanner"
)

func main() {
	scanner := rs.MakeScanner(strings.NewReader("<html><body><p>Welcome to My Website</p></body></html>"),
		regexp.MustCompile(`</?[a-z]+>`),
	)
	// scanner has Split function defined using the regexp passed to MakeScanner
	for scanner.Scan() {
		fmt.Println(scanner.Text())
	}
}
Output
<html>
<body>
<p>
</p>
</body>
</html>

func MakeSplitter

func MakeSplitter(re *regexp.Regexp) func([]byte, bool) (int, []byte, error)

MakeSplitter(re) creates a splitter to be passed to scanners.Split() the re will be used to tokenize input passed to the scanner.

splitters can be wrapped with more complicated splitters for further processing see bufio.Scanner for example splitter-wrappers

Example

use MakeSplitter to create a "splitter" for scanner.Split()

package main

import (
	"bufio"
	"fmt"
	"regexp"
	"strings"

	rs "github.com/tonymet/regexpscanner"
)

func main() {
	splitter := rs.MakeSplitter(regexp.MustCompile(`</?[a-z]+>`))
	scanner := bufio.NewScanner(strings.NewReader("<html><body><p>Welcome to My Website</p></body></html>"))
	// be sure to call Split()
	scanner.Split(splitter)
	for scanner.Scan() {
		fmt.Println(scanner.Text())
	}
}
Output
<html>
<body>
<p>
</p>
</body>
</html>

func ProcessTokens

func ProcessTokens(in io.Reader, re *regexp.Regexp, handler func(string))

ProcessTokens calls handler(string) for each matching token from the Scanner.

Example

use ProcessTokens when a simple callback-based stream tokenizer is needed

package main

import (
	"fmt"
	"regexp"
	"strings"

	rs "github.com/tonymet/regexpscanner"
)

func main() {
	rs.ProcessTokens(
		strings.NewReader("<html><body><p>Welcome to My Website</p></body></html>"),
		regexp.MustCompile(`</?[a-z]+>`),
		func(text string) {
			fmt.Println(text)
		})
}
Output
<html>
<body>
<p>
</p>
</body>
</html>

Generated by gomarkdoc

Documentation

Overview

©️ 2024 Anthony Metzidis

regexpscanner -- stream-based scanner and regex-based tokenizer in one.

scans io.Reader streams and returns matching tokens

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func MakeScanner

func MakeScanner(in io.Reader, re *regexp.Regexp) *bufio.Scanner

MakeScanner creates a scanner you can call scanner.Scan() and scanner.Text() with.

Calling scanner.Scan() && scanner.Text() will return the latest token matching the regex in the stream.

Example

use MakeScanner to create a scanner that will tokenize using the regex

package main

import (
	"fmt"
	"regexp"
	"strings"

	rs "github.com/tonymet/regexpscanner"
)

func main() {
	scanner := rs.MakeScanner(strings.NewReader("<html><body><p>Welcome to My Website</p></body></html>"),
		regexp.MustCompile(`</?[a-z]+>`),
	)
	// scanner has Split function defined using the regexp passed to MakeScanner
	for scanner.Scan() {
		fmt.Println(scanner.Text())
	}
}
Output:

<html>
<body>
<p>
</p>
</body>
</html>

func MakeSplitter

func MakeSplitter(re *regexp.Regexp) func([]byte, bool) (int, []byte, error)

MakeSplitter(re) creates a splitter to be passed to scanners.Split() the re will be used to tokenize input passed to the scanner.

splitters can be wrapped with more complicated splitters for further processing see bufio.Scanner for example splitter-wrappers

Example

use MakeSplitter to create a "splitter" for scanner.Split()

package main

import (
	"bufio"
	"fmt"
	"regexp"
	"strings"

	rs "github.com/tonymet/regexpscanner"
)

func main() {
	splitter := rs.MakeSplitter(regexp.MustCompile(`</?[a-z]+>`))
	scanner := bufio.NewScanner(strings.NewReader("<html><body><p>Welcome to My Website</p></body></html>"))
	// be sure to call Split()
	scanner.Split(splitter)
	for scanner.Scan() {
		fmt.Println(scanner.Text())
	}
}
Output:

<html>
<body>
<p>
</p>
</body>
</html>

func ProcessTokens

func ProcessTokens(in io.Reader, re *regexp.Regexp, handler func(string))

ProcessTokens calls handler(string) for each matching token from the Scanner.

Example

use ProcessTokens when a simple callback-based stream tokenizer is needed

package main

import (
	"fmt"
	"regexp"
	"strings"

	rs "github.com/tonymet/regexpscanner"
)

func main() {
	rs.ProcessTokens(
		strings.NewReader("<html><body><p>Welcome to My Website</p></body></html>"),
		regexp.MustCompile(`</?[a-z]+>`),
		func(text string) {
			fmt.Println(text)
		})
}
Output:

<html>
<body>
<p>
</p>
</body>
</html>

Types

This section is empty.

Directories

Path Synopsis
cmd
regexpscanner
©️ 2024 Anthony Metzids regexpscanner command.
©️ 2024 Anthony Metzids regexpscanner command.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL