typeflow

package module
v0.0.0-...-f0f4659 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 31, 2017 License: MIT Imports: 5 Imported by: 0

README

typeflow-go GoDoc Build Status Coverage Status

Introduction

typeflow is a tiny package that provides a few tools around string-based searching needs. With typeflow you'll be able to search for sub-string matches and get string similarity information.

You can see it in action here: https://typeflow-web.herokuapp.com/ .

Quick start

Likely you'll want to use the WordSource type which provides the most high-level interface in this package.

Dependencies

This project currently only depends on:

So make sure you go get it :)

go get github.com/alediaferia/prefixmap
Plain Levenshtein distance computation

If you just need to compute the Levenshtein distance between 2 words this is what you need:

typeflow.LevenshteinDistance("alessandro", "alesasndro")

This will return

2
Querying for similar strings

This package can be used for querying against a source of strings. For this particular need WordSource has been designed specifically.

ws := NewWordSource()
ws.SetSource(myListOfStrings)
matches := ws.FindMatches("query", 0.4)
The similarity

0.4 represents the minimum similarity we are OK with. A value of 1.0 represents an exact match.

A straightforward example

Supposedly you have a list of words, say country names, and you have a partial string which may match one or more of them according to a certain similarity value that is suitable for you. This may be the case when, for example, providing a typeahead API for populating a dropdown of suggestions (see Google).

In the following example we will have a program that holds an hard-coded list of country names and accepts 2 args:

  • the substring to look for
  • the accepted minimum similarity for the matches
package main

import (
  . "github.com/typeflow/typeflow-go"
  "strings"
  "os"
  "fmt"
  "strconv"
  "path/filepath"
)

var country_list = []string{
"mexico",
"micronesia",
"moldova",
"monaco",
"mongolia",
"montenegro",
"morocco",
"mozambique",
"myanmar",
"namibia",
"nauru",
"nepal",
"netherlands",
"new zealand",
"nicaragua",
"niger",
"nigeria",
"norway",
}

func printHelpAndExit() {
  fmt.Printf("usage: %s substr similarity\n", filepath.Base(os.Args[0]))
  os.Exit(0)
}

func main() {
  args := os.Args[1:]
  
  if l := len(args); l != 2 {
    fmt.Printf("Unexpected number of arguments: got %d, expected 2\n", l)
    printHelpAndExit()
  }
  
  similarity, err := strconv.ParseFloat(args[1], 32)
  if err != nil {
    fmt.Printf("Please, specify similarity as a floating point number\n")
    printHelpAndExit()
  }
  
  substr := args[0]

  // let's setup our word source
  // we will use to search for matches
  ws := NewWordSource()
  
  ws.SetSource(country_list)
  
  matches, err := ws.FindMatch(substr, float32(similarity))
  if err != nil {
    panic(err)
  }
  
  if len(matches) > 0 {
    fmt.Println("Found the following matches:\n")
    for _, match := range matches {
      fmt.Printf("'%s', similarity: %f\n", match.String, match.Similarity)
    }
  } else {
    fmt.Printf("No match found for '%s'.\n", substr)
  }
}

Output:

$ go run <program name> nig 0.4
Found the following matches:

'niger', similarity: 0.600000
'nigeria', similarity: 0.428571

A note on similarity

The similarity between the given substring and the found match is computed using the following formula:

       levenshtein(match,substr)
1.0 - ---------------------------
         max(|match|,|substr|)

Docs

I tried and will keep trying my best to keep the sources well documented. You can help me improving the docs as well!

Docs can be found at: https://godoc.org/github.com/typeflow/typeflow-go

Contribute

I love contributions! Please create your own branch and push a merge request.

Feel free to open issues for anything 😄

License

I'm releasing this project with a MIT license included in this repository.

Copyright (c) Alessandro Diaferia alediaferia@gmail.com

Documentation

Overview

This package contains everything you need to work with word-based searching. Its internals are founded on the Levenshtein distance. Check https://github.com/typeflow/typeflow-go for an example.

Index

Constants

This section is empty.

Variables

View Source
var (
	// This error occurs when trying to rollback more than needed
	OutOfRangeRollbackError = errors.New("Unexpected rollback: out of range")
	EmptyStateError         = errors.New("Unexpected: current state is empty")
)

Errors

Functions

func LevenshteinDistance

func LevenshteinDistance(source, destination string) int

This implementation takes advantage of the 2-columns approach since doesn't expose any incremental update functionality You are encouraged to use this function when simply interested in the levenshtein distance between 2 words

Types

type LState

type LState struct {
	// contains filtered or unexported fields
}

An LState encapsulates the current state of a levenshtein-based distance computation. In particular, the matrix-based levenshtein computation is used. An LState instance can be updated or rolled back which is useful for iterative word computations. It will take care of managing the internal state and memory allocation for you.

func InitLState

func InitLState(target string) (ls *LState)

Initializes an empty LState. This is the first step required for working with a matrix-based levenshtein computation.

func (*LState) Distance

func (state *LState) Distance() int

Returns the newly computed distance or math.MaxInt32 if no distance has been computed yet. This method has complexity O(1) as the distance is computed upon every UpdateState call.

func (*LState) RollbackBy

func (state *LState) RollbackBy(chars int) error

Rolls back the current state. Specify the amount of characters to roll back for the source string through the chars param

func (*LState) UpdateState

func (state *LState) UpdateState(source []rune)

Updates the current state with param source.

Since LState can be rolled back, source can also represent a slice to be appended to the previous one passed through a previous call to UpdateState.

E.g. you may want to compare the words 'levenshtein' and 'einstein' in two steps after having initialized the state using InitLState('einstein')

1) the first UpdateState will be called passing in 'leven' as source.

2) the second UpdateState will be called passing in 'stein' as source.

You will then be able to call Distance() on the updated state and get 4 as result.

type Match

type Match struct {
	// The actual string which matched
	// the requested string
	String string `json:"string"`

	// The similarity value for the
	// current word.
	// Similarity is 1 when
	// the two words are equal.
	// See github.com/typeflow/typeflow-go for
	// more details on how the similarity is computed.
	Similarity float32 `json:"similarity"`
}

Represents a match found

type WordSource

type WordSource struct {
	// contains filtered or unexported fields
}

func NewWordSource

func NewWordSource() (ws *WordSource)

NewWordSource Initializes a new empty WordSource

func (*WordSource) FindMatch

func (ws *WordSource) FindMatch(substr string, minSimilarity float32) (matches []Match, err error)

FindMatch Finds a match among the current words in the source. minSimilarity is the minimum accepted similarity to use when filling the matches slice. Param substr is the string to match against.

func (*WordSource) SetSource

func (ws *WordSource) SetSource(strs []string)

SetSource sets the given strings slice as the current source after applying the given filters in order

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL