Documentation ¶
Overview ¶
Package preset provides a collection of pre-built string similarity scoring functions generated by the higher-ordered function in the factory parent
Index ¶
Constants ¶
This section is empty.
Variables ¶
var PlainSimilarityScore = factory.PrependStringSanitizerForSimilarityScore( transform.LatinExtendedSanitize, SimpleSimilarityScore, )
PlainSimilarityScore computes the similarity score between two input strings but each input string will be sanitized before they are compared to each other.
var SimpleSimilarityScore = editdist.MakeStringSimilarityFunction( editdist.MakeOptimalAlignmentDistFunction(editdist.UnitPenalty, editdist.UnitPenalty), )
SimpleSimilarityScore computes the similarity score between two input strings. Two input strings will be directly compared under optimal alignment distance metric without any pre-processing, and the resulting distance will be re-normalized to a similarity score between 0 and 1 (inclusive).
var ThaiNameSimilarityScore = factory.PrependStringSanitizerForSimilarityScore( sanitaryThai.Sanitize, factory.MaxFromCandidatesProduct( nametitle.GenerateNamesWithoutTitles, editdist.MakeStringSimilarityFunction( editdist.MakeOptimalAlignmentDistFunction(editdistThai.SubstPenalty, editdistThai.TransPenalty), ), ), )
ThaiNameSimilarityScore computes the similarity score between two input strings with the following functionalities:
- Each input string will be sanitized via sanitaryThai.Sanitize function (e.g. removing diacritics from latin scripts, removing repeated Thai tonal marks, etc.)
- Each input string will be used to generate bare names (i.e. attempting to remove English and Thai titles such as Mrs. or dek-chai)
- For optimal alignment distance metric over string space, the specialized substitution/transposition penalty functions are used instead.
Functions ¶
This section is empty.
Types ¶
This section is empty.