triebytesmapper

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2026 License: MIT Imports: 3 Imported by: 0

README

Tests on Linux, MacOS and Windows Go Report Card GoDoc

Mapper builds a trie from a set of keywords and can be used to map a byte slice (typically file content) against these keywords.

There are some settings:

// Options for the mapper.
type Options struct {
	// NormalizeRune will be applied to both the input and the keywords before matching.
	// Defaults to nil.
	// Typically used to  do lower casing and accent folding.
	NormalizeRune func(rune) rune

	// IsWordBoundary is used to determine word boundaries.
	// A default implementation is used if nil.
	IsWordBoundary func(rune) bool
}

Note that this library currently does not work for CJK languages.

Documentation

Overview

Example
package main

import (
	"fmt"
	"log"
	"os"
	"strings"
	"unicode"

	"github.com/bep/triebytesmapper"
)

func main() {
	// Testdata:
	// Dickens, Charles. A Christmas Carol. 1843.
	// 1000 random words from the MacOS dictionary.
	christmascarol, err := os.ReadFile("testdata/christmascarol.txt")
	if err != nil {
		log.Fatal(err)
	}
	thousandwords, err := os.ReadFile("testdata/thousandwords.txt")
	if err != nil {
		log.Fatal(err)
	}
	// The thousandwords file is a list of words separated by newlines.
	// We need to split it into a slice of strings.
	keywords := strings.Split(string(thousandwords), "\n")
	// Trim any Windows line endings.
	for i, k := range keywords {
		keywords[i] = strings.TrimSpace(k)
	}
	tolower := func(r rune) rune {
		return unicode.ToLower(r)
	}
	opts := &triebytesmapper.Options{NormalizeRune: tolower}
	m := triebytesmapper.New(opts, keywords...)

	matches := m.Map(christmascarol)
	first := matches.Keyword(0, christmascarol)

	fmt.Printf("Found %d matches. First match is %q.\n", len(matches), first)
}
Output:

Found 11 matches. First match is "Dickens".

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type LoHi

type LoHi struct {
	Lo int
	Hi int
}

LoHi is a low (inclusively) and high (exclusively) slice indices.

type Mapper

type Mapper struct {
	// contains filtered or unexported fields
}

Mapper builds a trie from a set of keywords and can be used to map a byte slice against these keywords.

func New

func New(opts *Options, keywords ...string) *Mapper

New creates a new mapper with the given options and keywords.

func (*Mapper) Map

func (m *Mapper) Map(s []byte) Matches

Map a byte slice against the keywords in the trie.

func (*Mapper) MatchBytes

func (m *Mapper) MatchBytes(b []byte) (string, bool)

MatchBytes matches a byte slice against the keywords in the trie. A non empty string is returned if the byte slice matches a keyword. The boolean return value indicates if there could be a match with more bytes (b is a prefix of a keyword in the trie).

type Matches

type Matches []LoHi

Matches is a slice of low (inclusively) and high (exclusively) slice indices.

func (Matches) Keyword

func (m Matches) Keyword(i int, src []byte) []byte

Keyword returns the keyword at the given index in the matches, nil if the index is out of range.

type Options

type Options struct {
	// NormalizeRune will be applied to both the input and the keywords before matching.
	// Defaults to nil.
	// Typically used to  do lower casing and accent folding.
	NormalizeRune func(rune) rune

	// IsWordBoundary is used to determine word boundaries.
	// A default implementation is used if nil.
	IsWordBoundary func(rune) bool
}

Options for the mapper.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL