finder

package
v0.1.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2020 License: MIT Imports: 7 Imported by: 2

README

Finder

godoc Go Report Card

Finder is a library that finds the best match against a list of strings, using an algorithm of choice.

Example

import "github.com/Dynom/TySug/finder"
referenceList := []string{"example", "amplifier", "ample"}
ts := finder.New(referenceList, finder.WithAlgorithm(finder.NewJaroWinklerDefault()))

alt, score, exact := ts.Find("exampel")
// alt   = example
// score = 0.9714285714285714
// exact = false (not an exact match in our reference list)

Algorithms

You're free to specify your own algorithm. By default Jaro Winkler is available, this gives you freedom around different input lengths (in contrast to Levenshtein).

Using a different algorithm

if you want to use a different algorithm, simply wrap your algorithm in a finder.Algorithm compatible type and pass it as argument to the Finder. You can find inspiration in the unit-tests / examples.

Possible considerations:

Sources:

Case-sensitivity

Finder does not normalise words. This means that words are treated in a case-sensitive matter. This is done mostly to avoid doing unnecessary work in the hot-path. Typically you'll want to make sure both your lists and your input uses the same casing.

Ordering

The reference list order is significant. The first of an equal score wins the election. So you'll want to put more common, popular, etc. words first in the list.

Documentation

Index

Constants

View Source
const (
	WorstScoreValue = -1 * math.MaxFloat32
	BestScoreValue  = math.MaxFloat32
)

These constants hold the value of the lowest and highest possible scores. Compatible with JSON serialization. It's not ideal to mix presentation with business logic but in this instance it was convenient and similarly effective as math.Inf(-1)

Variables

View Source
var (
	ErrNoAlgorithmDefined = errors.New("no algorithm defined")
)

Errors

Functions

This section is empty.

Types

type Algorithm

type Algorithm func(a, b string) float64

Algorithm the type to comply with to create your own algorithm Note that the return value must be greater than WorstScoreValue and less than BestScoreValue

func NewDamerauLevenshtein added in v0.1.3

func NewDamerauLevenshtein() Algorithm

NewDamerauLevenshtein returns the DamerauLevenshtein algorithm

func NewJaro added in v0.1.3

func NewJaro() Algorithm

NewJaro returns the default Jaro algorithm @see https://rosettacode.org/wiki/Jaro_distance#Go

func NewJaroWinkler added in v0.1.3

func NewJaroWinkler(boostThreshold float64, prefixLength int) Algorithm

NewJaroWinkler returns the JaroWinkler algorithm

func NewJaroWinklerDefaults added in v0.1.3

func NewJaroWinklerDefaults() Algorithm

NewJaroWinklerDefaults returns the Jaro Winkler algorithm with 0.7 boost threshold and a prefix length of 4

func NewWagnerFischer added in v0.1.3

func NewWagnerFischer(insert, delete, substitution int) Algorithm

NewWagnerFischer returns the NewWagnerFischer algorithm, sensible defaults are: i:1, d:3, s:1

type Finder

type Finder struct {
	Alg             Algorithm
	LengthTolerance float64 // A number between 0.0-1.0 (percentage) to allow for length miss-match, anything outside this is considered not similar. Set to 0 to disable.
	// contains filtered or unexported fields
}

Finder is the type to find the nearest reference

func New

func New(list []string, options ...Option) (*Finder, error)

New creates a new instance of Finder. The order of the list is significant

func (*Finder) Exact added in v0.1.3

func (t *Finder) Exact(input string) bool

Exact returns true if the input is an exact match.

func (*Finder) Find

func (t *Finder) Find(input string) (string, float64, bool)

Find returns the best alternative a score and if it was an exact match or not. Since algorithms can define their own upper-bound, there is no "best" value.

func (*Finder) FindCtx

func (t *Finder) FindCtx(ctx context.Context, input string) (string, float64, bool)

FindCtx is the same as Find, with context support.

func (*Finder) FindTopRankingCtx added in v0.0.3

func (t *Finder) FindTopRankingCtx(ctx context.Context, input string) ([]string, float64, bool)

FindTopRankingCtx returns a list (of at least one element) of references with the same "best" score

func (*Finder) FindTopRankingPrefixCtx added in v0.1.3

func (t *Finder) FindTopRankingPrefixCtx(ctx context.Context, input string, prefixLength uint) (list []string, exact bool, err error)

FindTopRankingPrefixCtx requires the references to have an exact prefix match on N characters of the input. prefixLength cannot exceed length of input

func (*Finder) GetMatchingPrefix added in v0.1.3

func (t *Finder) GetMatchingPrefix(ctx context.Context, prefix string, max uint) ([]string, error)

GetMatchingPrefix returns up to max ref's, that start with the prefix argument

func (*Finder) Refresh added in v0.1.3

func (t *Finder) Refresh(list []string)

Refresh replaces the internal reference list.

type Option

type Option func(sug *Finder)

Option is the type accepted by finder to set specific options

func WithAlgorithm

func WithAlgorithm(alg Algorithm) Option

WithAlgorithm allows you to set any algorithm

func WithLengthTolerance added in v0.0.2

func WithLengthTolerance(t float64) Option

WithLengthTolerance defines a percentage of length above we no longer consider a length difference a typo, but instead we consider it as "completely wrong". A value of 0.2 specifies a tolerance of at most ~20% difference in size, with a minimum of 1 character. A value of 0 (the default) disables this feature.

func WithPrefixBuckets added in v0.1.3

func WithPrefixBuckets(enable bool) Option

WithPrefixBuckets splits the reference list into buckets by their first letter. Assuming the first letter is correct this can significantly improve the performance by reducing the size of the list to scan through.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL