symspell

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2025 License: AGPL-3.0 Imports: 5 Imported by: 0

README

SymSpell Package

GoDoc Go Report Card

Overview

The symspell package provides a Golang implementation of the SymSpell algorithm, a fast and memory-efficient algorithm for spelling correction, word segmentation, and fuzzy string matching. It supports both unigrams and bigrams for advanced contextual correction.

Features

  • Fast lookup for single-word corrections
  • Compound word corrections
  • Customizable edit distance and prefix length
  • Support for unigram and bigram dictionaries
  • Configurable thresholds for performance tuning

Installation

Install the package using go get:

go get github.com/snapp-incubator/go-symspell

Usage

  • Import the Package

  • import "github.com/snapp-incubator/go-symspell"

  • Initialize SymSpell

  • Simple Lookup

Lookup
Load a unigram dictionary:
package main

import "github.com/snapp-incubator/go-symspell"

func main() {
    symSpell := symspell.NewSymSpellWithLoadDictionary("path/to/vocab.txt", 0, 1,
        symspell.WithCountThreshold(10),
        symspell.WithMaxDictionaryEditDistance(3),
        symspell.WithPrefixLength(5),
    )
}
Compound Lookup
Load both unigram and bigram dictionaries:
package main

func main()  {
    symSpell := symspell.NewSymSpellWithLoadBigramDictionary("path/to/vocab.txt", "path/to/vocab_bigram.txt", 0, 1,
        symspell.WithCountThreshold(1),
        symspell.WithMaxDictionaryEditDistance(3),
        symspell.WithPrefixLength(7),
    )
}
Perform Lookup
Single Word Lookup
suggestions, err := symSpell.Lookup("حیابان", symspell.Top, 3)
if err != nil {
    log.Fatal(err)
}
fmt.Println(suggestions[0].Term) // Output: خیابان

Compound Word Lookup

suggestion := symSpell.LookupCompound("حیابان ملاصدزا", 3)
fmt.Println(suggestion.Term) // Output: خیابان ملاصدرا

Examples

Unit Tests

The repository includes comprehensive unit tests. Run the tests with:

go test ./...

Example test cases include single-word corrections, compound word corrections, and edge cases.

Configuration Options
  • WithMaxDictionaryEditDistance: Sets the maximum edit distance for corrections.
  • WithPrefixLength: Sets the prefix length for index optimization.
  • WithCountThreshold: Filters dictionary entries with low frequency.

Dictionaries

The dictionaries should be formatted as plain text files:

  • Unigram file: Each line should contain a term and its frequency, separated by a space.(or could be custom seperator)
  • Bigram file: Each line should contain two terms and their frequency, separated by a space.
Example:

Unigram (vocab.txt):

خیابان 1000
میدان 800

Bigram (vocab_bigram.txt):

خیابان کارگر 500
میدان آزادی 300
Performance

SymSpell is optimized for speed and memory efficiency. For large vocabularies, tune maxEditDistance, prefixLength, and countThreshold to balance performance and accuracy.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type SymSpell added in v1.0.3

type SymSpell interface {
	Lookup(phrase string, verbosity verbosity.Verbosity, maxEditDistance int) ([]items.SuggestItem, error)
	LookupCompound(phrase string, maxEditDistance int) *items.SuggestItem
	LoadBigramDictionary(corpusPath string, termIndex, countIndex int, separator string) (bool, error)
	LoadDictionary(corpusPath string, termIndex int, countIndex int, separator string) (bool, error)
	LoadExactDictionary(corpusPath string, separator string) (bool, error)
}

func NewSymSpell

func NewSymSpell(opt ...options.Options) SymSpell

func NewSymSpellWithLoadBigramDictionary

func NewSymSpellWithLoadBigramDictionary(vocabDirPath, bigramDirPath, exactDirPath string, termIndex, countIndex int, opt ...options.Options) SymSpell

func NewSymSpellWithLoadDictionary

func NewSymSpellWithLoadDictionary(dirPath string, termIndex, countIndex int, opt ...options.Options) SymSpell

NewSymSpellWithLoadDictionary used when want Lookup only

Directories

Path Synopsis
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL