compare

package module
v0.0.0-...-711b54d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 19, 2023 License: MIT Imports: 4 Imported by: 0

README

Compare

A comparison library written in go.

You can use to simply compare two texts. The comparison function will return you a

func ExampleCompareTexts() {
	t1 := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed orci felis, placerat quis enim vitae, semper tempus erat."
	t2 := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed orci felis, placerat quis enim vitae, semper tempus erat. Integer non enim pharetra, molestie nulla ut."
	result := CompareTexts(t1, t2)
	fmt.Println(result)
	// Output: 0.72
}

Or to match a text to a set of texts.

func ExampleTextMatcher() {
	matcher := NewTextMatcher()

	matcher.Feed("lorem_ipsum", `Lorem ipsum dolor sit amet, consectetur adipiscing elit.`)
	matcher.Feed("excepteur_sint", `Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.`)

	for _, match := range matcher.Match(`Lorem ipsum dolor sit amet.`) {
		fmt.Printf("Text name: %s, confidence: %.2f\n", match.TextName, match.Confidence)
	}

	// Output:
	//Text name: lorem_ipsum, confidence: 0.50
	//Text name: excepteur_sint, confidence: 0.00
}

The matching algorithm is based on markov chain model and shows the rate of sequential texts simimilarity.

The package was born from an idea of scanning licences in go modules and I just decided to extract this code as it was more complete than other parts. That also explains the implementation of the tokenizer, it has some license texts specifics.

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func CompareTexts

func CompareTexts(t1, t2 string) float64

CompareTexts returns a rate of similarity between two texts in range of 0 to 1.

Example
t1 := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed orci felis, placerat quis enim vitae, semper tempus erat."
t2 := "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed orci felis, placerat quis enim vitae, semper tempus erat. Integer non enim pharetra, molestie nulla ut."
result := CompareTexts(t1, t2)
fmt.Println(result)
Output:

0.72

func Tokenize

func Tokenize(text string) []string

Tokenize cleans up the text making a set of substitutions by this guide: https://spdx.dev/license-list/matching-guidelines/ and slit it in tokens by spaces.

Types

type Match

type Match struct {
	// TextName is the name of the text the match related to
	TextName string
	// Confidence is the percentage of texts similarity
	// for markov chain matcher is between 0 and 1.
	Confidence float64
}

Match desribe the matching output.

type Text

type Text struct {
	Name    string
	Content string
}

Text is a structure that represents a text to be compared.

type TextMatcher

type TextMatcher struct {
	// contains filtered or unexported fields
}

TextMatcher is an implementation of matcher that uses markov chains for comparison.

Example
matcher := NewTextMatcher()

matcher.Feed("lorem_ipsum", `Lorem ipsum dolor sit amet, consectetur adipiscing elit.`)
matcher.Feed("excepteur_sint", `Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.`)

for _, match := range matcher.Match(`Lorem ipsum dolor sit amet.`) {
	fmt.Printf("Text name: %s, confidence: %.2f\n", match.TextName, match.Confidence)
}
Output:

Text name: lorem_ipsum, confidence: 0.50
Text name: excepteur_sint, confidence: 0.00

func NewTextMatcher

func NewTextMatcher(texts ...Text) *TextMatcher

NewTextMatcher creates an istance of Markov matcher and preprocesses the texts to be ready for comparison operation.

func (*TextMatcher) Feed

func (mm *TextMatcher) Feed(name, text string)

Feed records a text to be compared with other texts. Names may duplicate.

func (*TextMatcher) Match

func (mm *TextMatcher) Match(text string) []Match

Match perform comparison of text with texts that were stored on matcher creation step. Result contains list of matches with all stored texts.

Directories

Path Synopsis
Package markov provides features for measurement similarity of sequences.
Package markov provides features for measurement similarity of sequences.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL