turkishstemmer

package module

v1.0.0 Latest Latest Go to latest Published: Mar 6, 2024 License: GPL-3.0 Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/cengizhancaliskan/turkishstemmer

Links

Open Source Insights

README ¶

Turkish Stemmer for Golang

Stemmer algorithm for Turkish language.

Note: This is a rewritten version of elasticsearch-analysis-turkishstemmer project with Golang and most of the documentation taken from the original one. In additional to the original, inspired by python version turkish-stemmer-python

Install

go get github.com/cengizhancaliskan/turkishstemmer

Usage

package main

import (
	"fmt"
	
	"github.com/cengizhancaliskan/turkishstemmer"
)

stemmer := turkishstemmer.New()

fmt.Println(stemmer.Stem("kalem"))
fmt.Println(stemmer.Stem("doktorum"))

Introduction to Turkish language morphology

Turkish is an agglutinative language and has a very rich morphological stucture. In Turkish, you can form many different words from a single stem by appending a sequence of suffixes. For example The word "doktoruymuşsunuz" means "You had been the doctor of him". The stem of the word is "doktor" and it takes three different suffixes -sU, -ymUş, and -sUnUz.

From "Snowball Description":

Words are usually composed of a stem and of at least two or three affixes appended to it.

We can analyze noun suffixes in Turkish in two groups. Noun suffixes (eg. "doktor-um" meaning "my doctor") and nominal verb suffixes (eg. "doktor-dur" meaning ‘is a doctor’). The words ending with nominal verb suffixes can be used as verbs in sentences. There are over thirty different suffixes classified in these two general groups of suffixes.

In Turkish, the suffixes are affixed to the stem according to definite ordering rules.

From "An affix stripping morphological analyzer for Turkish" paper:

Turkish has a special place within the natural languages not only being a fully concatenative language but also having the suffixes as the only affix type. Another feature of the language is that, someone who knows Turkish can easily analyze a word even if he/she does not know its stem.

The phonological rules of Turkish are significant factors that influence this feature. Ex: (any word)lerim => (any word)-ler-im "ler" plural suffix, "im" 1st singular person possessive.

Rules

The only affix type in Turkish is the suffix.
A plural suffix cannot follow a possesive suffix.
A suffix in Turkish can have multiple allomorphs in order to provide sound harmony in the word to which it is affixed.
In Turkish each vowel indicates a distinct syllable.
In Turkish, single syllable words are mostly the stem itself
If a word has nominal verb suffixes, they always appear at the end of the word. They follow noun suffixes or the stem itself at the absence of noun suffixes
In Turkish, “-lAr” suffix can be used both as a nominal verb suffix (third person plural present tense) and as a noun suffix (plural inflection).
In Turkish, words do not end with consonants 'b', 'c', 'd', and 'ğ'. However, when a suffix starting with a vowel is affixed to a word ending with 'p', 'ç', 't' or 'k', the last consonant is transformed into 'b', 'c', 'd', or 'ğ' respectively. The postlude routine transforms last consonants 'b', 'c','d', or 'ğ'' back to 'p', 'ç', 't' or 'k', respectively, after stemming is complete.

Suffix Classes

Class	Type
Nominal verb suffixes	Inflectional
Derivational suffixes	Derivational
Noun suffixes	Inflectional
Tense & person verb suffixes	Inflectional
Verb suffixes	Inflectional

Suffix allomorphs

Suffix allomorphs are used to create a good sound harmony. They do not change the meaning of the word. If a suffix has a capital letter then it has an allomorh. If a suffix has a letter in parentheses then it can be omitted. Possible allomorphs are given below:

Letter	Allomorph
U	ı,i,u,ü
C	c,ç
A	a,e
D	d,t
I	ı,I

Nominal Verb Suffixes

a/a	Suffix
1	–(y)Um
2	–sUn
3	–(y)Uz
4	–sUnUz
5	–lAr
6	–md
7	–n
8	–k
9	–nUz
10	–DUr
11	–cAsInA
12	–(y)DU
13	–(y)sA
14	–(y)mUş
15	–(y)ken

Suffix transition ordering for nominal verbs can be seen in References[5]

Noun Suffixes

a/a	Suffixes
1	–lAr
2	–(U)m
3	–(U)mUz
4	–(U)n
5	–(U)nUz
6	–(s)U
7	–lArI
8	–(y)U
9	–nU
10	–(n)Un
11	–(y)A
12	–nA
13	–DA
14	–nDA
15	–DAn
16	–nDAn
17	–(y)lA
18	–ki
19	–(n)cA

Suffix transition ordering for nouns can be seen in References[5]

Derivational Suffixes

a/a	Suffixes
1	–lUk
2	–CU
3	–CUk
4	–lAş
5	–lA
6	–lAn
7	–CA
8	–lU
9	–sUz

Initially, we will handle only a small subset of the above suffixes which are more common in our domain.

Vowel Harmony

This routine checks whether the last two vowels of the word obey vowel harmony rules. A brief description of Turkish vowel harmony follows.

Turkish vowel harmony is a two dimensional vowel harmony system, where vowels are characterised by two features named frontness and roundness. There are vowel harmony rules for each feature.

Vowel harmony rule for frontness: Vowels in Turkish are grouped into two according to where they are produced. Front produced vowels are formed at the front of the mouth ('e', 'i', 'ö', 'ü') and back produced vowels are produced nearer to throat ('a', 'ı', 'o', 'u'). According to the vowel harmony rule, words cannot contain both front and back vowels. This is one of the reasons why suffixes containing vowels can take different forms to obey vowel harmony.
Vowel harmony rule for roundness: Vowels in Turkish are grouped into two according to whether lips are rounded while producing it. 'o', 'ö', 'u' and 'ü' are rounded vowels whereas 'a', 'e', 'ı' and 'i' are unrounded. According to the vowel harmony rules, if the vowel of a syllable is unrounded, the following vowel is unrounded as well. If the vowel of a syllable is rounded, the following vowels are 'a', 'e', 'u' or 'ü'.

Last consonant

Another interesting case in detecting suffixes in Turkish is that, for some suffixes, if the word ends with a vowel, a consonant is inserted between the rest of the word and the suffix. These merging consonants can be 'y', 'n' or 's'. When a merging consonant can be inserted before the suffix, the representation of the suffix starts with the optional consonant surrounded by paranthesis (eg. –(y)Um, -(n)cA). For these kinds of suffixes, if existence of a merging consonant is considered, the candidate stem is checked whether it ends with a vowel.

If there is no 'y' consonant before the suffix, only the real part of the suffix (eg. -Um) is marked for stemming. If there is a 'y' consonant and it is preceded by a vowel, 'y' is treated as a merging consonant and both 'y' and the candidate suffix (eg. -Um) is marked for stemming. If there is a consonant just before 'y', the decision is that the consonant 'y' and the candidate suffix are really a part of the stem. In such a case, cursor is not advanced to prevent over-stemming. The last case can occur especially when the stem originates from another language like in 'lityum' (meaning the element Lithium). If the check for vowel harmony was not made, the word would be stemmed to 'lit', for '–(y)Um' would be treated as a suffix affixed to it. But according to morphological rules of Turkish, the final word would be 'litim', not 'lityum' if 'lit' were really the stem of the word and the suffix '–(y)Um' were affixed to it. So detecting 'lit' as the stem of the word would be an over -stemming.

Merging Vowel

Similar to merging consonants, there are merging vowels for some suffixes starting with consonants. They can be preceded by merging vowels like in '-(U) mUz' suffix when they are affixed to a stem ending with a consonant. In such a case, a U vowel ('ı', 'i', 'u' or 'ü' depending on vowel harmony) is inserted between the stem and real suffix (e.g. '-mUz') for ease of pronunciation.

Some examples

Word / Analysis	Meaning / Stem
Kalelerimizdekilerden	From the ones at one of our castles
Kale-lAr-UmUz-DA-ki-lAr-DAn	Kale
Çocuğuymuşumcasına	As if I were her child
Çocuk-(s)U-(y)mUş-(y)Um-cAsInA	Çocuk
Kedileriyle	With their cats
Kedi-lAr-(s)U-(y)lA	Kedi
Çocuklarımmış	Someone told me that they were my children
çocuk-lAr-(U)m-(y)mUş	Çocuk
Kitabımızdı	It was our book
kitap-UmUz-(y)DU	Kitap

Future Work

Add more verbs suffixes.
Add more derivational suffixes.

References

Contributing

Fork it ( https://github.com/<my-github-username>/turkishstemmer/fork )
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

License

turkishstemmer is licensed under the Apache Software License, Version 2.0.

Documentation ¶

Index ¶

Constants
Variables
func Contains[T comparable](s []T, e T) bool
func CountSyllables(word string) int
func GetType(i interface{}) string
func GetVowels(word string) []rune
func HasFrontness(vowel, candidate string) bool
func HasRoundness(vowel, candidate string) bool
func HasVowelHarmony(word string) bool
func IsTurkishWord(word string) bool
func LoadWords(path string) []string
func LoadWordsFromSliceBytes(data []byte) []string
func ReplaceStringAtIndex(text string, r rune, i int) string
func ValidateOptionalLetter(word string, candidate *rune) bool
func VowelHarmony(vowel, candidate string) bool
type BaseState
- func (s BaseState) AddTransitions(word string, transitions *Transitions, startState State)
- func (s BaseState) FinalState() bool
- func (s BaseState) InitialState() bool
- func (s BaseState) Suffixes() []Suffix
type DerivationalState
- func NewDerivationalState(initialState, finalState bool, suffixes []Suffix) DerivationalState
- func (s DerivationalState) NextState(suffix string) State
type NominalVerbState
- func NewNominalVerbState(initialState, finalState bool, suffixes []Suffix) NominalVerbState
- func (s NominalVerbState) GetValues() map[string]NominalVerbState
- func (s NominalVerbState) NextState(suffix string) State
type NounState
- func NewNounState(initialState, finalState bool, suffixes []Suffix) NounState
- func (s NounState) GetValues() map[string]NounState
- func (s NounState) NextState(suffix string) State
type State
- func GetInitialNominalVerbState() State
type Stemmer
- func New() Stemmer
- func (s Stemmer) Stem(word string, tryCount ...int) string
type Stems
type Suffix
- func NewSuffix(name, pattern, optionalLetter string, checkHarmony bool) Suffix
- func (s Suffix) GetOptionalLetter(word string) *rune
- func (s Suffix) Match(word string) bool
- func (s Suffix) RemoveSuffix(word string) string
- func (s Suffix) String() string
type Transition
- func NewTransition(startState, nextState State, word string, suffix Suffix) *Transition
- func (t Transition) SimilarTransitions(transitions Transitions) *Transitions
- func (t Transition) String() string
type Transitions

Constants ¶

View Source

const (
	AverageStemmerCount = 4
	MinSyllableCount    = 2
	// Alphabet Turkish alphabet. They are used for skipping not turkish words.
	Alphabet = "abcçdefgğhıijklmnoöprsştuüvyz"
	// Vowels Turkish vowels.
	Vowels = "üiıueöao"
	// Consonants Turkish consonants.
	Consonants = "bcçdfgğhjklmnprsştvyz"
	// RoundedVowels Rounded vowels which are used for checking roundness harmony.
	RoundedVowels = "oöuü"
	// FollowingRoundedVowels Vowels that follow rounded vowels.
	// They are combined with ROUNDED_VOWELS to check roundness harmony.
	FollowingRoundedVowels = "aeuü"
	// UnroundedVowels The unrounded vowels which are used for checking roundness harmony.
	UnroundedVowels = "iıea"
	// FrontVowels Front vowels which are used for checking frontness harmony.
	FrontVowels = "eiöü"
	// BackVowels Front vowels which are used for checking frontness harmony.
	BackVowels = "ıuao"
)

Variables ¶

View Source

var (
	//go:embed data/protected_words.txt
	// DefaultProtectedWordsFile The path of the file that contains the default set of protected words.
	DefaultProtectedWordsFile []byte

	//go:embed data/vowel_harmony_exceptions.txt
	// DefaultVowelHarmonyExceptionsFile The path of the file that contains the default set of vowel harmony exceptions.
	DefaultVowelHarmonyExceptionsFile []byte

	//go:embed data/last_consonant_exceptions.txt
	// DefaultLastConsonantExceptionsFile The path of the file that contains the default set of last consonant exceptions.
	DefaultLastConsonantExceptionsFile []byte

	//go:embed data/average_stem_size_exceptions.txt
	// DefaultAverageStemSizeExceptionsFile The path of the file that contains
	// the default set of average stem size exceptions.
	DefaultAverageStemSizeExceptionsFile []byte

	// LastConsonantRules Last consonant rules
	LastConsonantRules = map[string]string{"b": "p", "c": "ç", "d": "t", "ğ": "k"}
)

View Source

var (
	DerivationalStateA = NewDerivationalState(true, false, DerivationalSuffixValues)
	DerivationalStateB = NewDerivationalState(false, true, nil)
)

View Source

var (
	NominalVerbStateA = NewNominalVerbState(true, false, NominalVerbSuffixValues)
	NominalVerbStateB = NewNominalVerbState(false, true, []Suffix{NominalVerbSuffix14})
	NominalVerbStateC = NewNominalVerbState(false, true, []Suffix{NominalVerbSuffix10, NominalVerbSuffix12, NominalVerbSuffix13, NominalVerbSuffix14}) //nolint:lll
	NominalVerbStateD = NewNominalVerbState(false, false, []Suffix{NominalVerbSuffix12, NominalVerbSuffix13})
	NominalVerbStateE = NewNominalVerbState(false, true, []Suffix{NominalVerbSuffix1, NominalVerbSuffix2, NominalVerbSuffix3, NominalVerbSuffix4, NominalVerbSuffix5, NominalVerbSuffix14}) //nolint:lll
	NominalVerbStateF = NewNominalVerbState(false, true, nil)
	NominalVerbStateG = NewNominalVerbState(false, false, []Suffix{NominalVerbSuffix14})
	NominalVerbStateH = NewNominalVerbState(false, false, []Suffix{NominalVerbSuffix1, NominalVerbSuffix2, NominalVerbSuffix3, NominalVerbSuffix4, NominalVerbSuffix5, NominalVerbSuffix14}) //nolint:lll

	NominalVerbTFValues = map[string]NominalVerbState{
		NominalVerbSuffix1.Name:  NominalVerbStateB,
		NominalVerbSuffix2.Name:  NominalVerbStateB,
		NominalVerbSuffix3.Name:  NominalVerbStateB,
		NominalVerbSuffix4.Name:  NominalVerbStateB,
		NominalVerbSuffix5.Name:  NominalVerbStateC,
		NominalVerbSuffix6.Name:  NominalVerbStateD,
		NominalVerbSuffix7.Name:  NominalVerbStateD,
		NominalVerbSuffix8.Name:  NominalVerbStateD,
		NominalVerbSuffix9.Name:  NominalVerbStateD,
		NominalVerbSuffix10.Name: NominalVerbStateE,
		NominalVerbSuffix11.Name: NominalVerbStateH,
		NominalVerbSuffix12.Name: NominalVerbStateF,
		NominalVerbSuffix13.Name: NominalVerbStateF,
		NominalVerbSuffix14.Name: NominalVerbStateF,
		NominalVerbSuffix15.Name: NominalVerbStateF,
	}

	NominalVerbFTValues = map[string]NominalVerbState{
		NominalVerbSuffix1.Name:  NominalVerbStateG,
		NominalVerbSuffix2.Name:  NominalVerbStateG,
		NominalVerbSuffix3.Name:  NominalVerbStateG,
		NominalVerbSuffix4.Name:  NominalVerbStateG,
		NominalVerbSuffix5.Name:  NominalVerbStateG,
		NominalVerbSuffix10.Name: NominalVerbStateF,
		NominalVerbSuffix13.Name: NominalVerbStateF,
		NominalVerbSuffix14.Name: NominalVerbStateF,
	}

	NominalVerbFFValues = map[string]NominalVerbState{
		NominalVerbSuffix1.Name:  NominalVerbStateG,
		NominalVerbSuffix2.Name:  NominalVerbStateG,
		NominalVerbSuffix3.Name:  NominalVerbStateG,
		NominalVerbSuffix4.Name:  NominalVerbStateG,
		NominalVerbSuffix5.Name:  NominalVerbStateG,
		NominalVerbSuffix12.Name: NominalVerbStateF,
		NominalVerbSuffix14.Name: NominalVerbStateF,
	}
)

View Source

var (
	NounStateA = NewNounState(true, true, NounSuffixValues)
	NounStateB = NewNounState(false, true, []Suffix{NounSuffix1, NounSuffix2, NounSuffix3, NounSuffix4, NounSuffix5})
	NounStateC = NewNounState(false, false, []Suffix{NounSuffix6, NounSuffix7})
	NounStateD = NewNounState(false, false, []Suffix{NounSuffix10, NounSuffix13, NounSuffix14})
	NounStateE = NewNounState(false, true, []Suffix{NounSuffix1, NounSuffix2, NounSuffix3, NounSuffix4, NounSuffix5, NounSuffix6, NounSuffix7, NounSuffix18}) //nolint:lll
	NounStateF = NewNounState(false, false, []Suffix{NounSuffix6, NounSuffix7, NounSuffix18})
	NounStateG = NewNounState(false, true, []Suffix{NounSuffix1, NounSuffix2, NounSuffix3, NounSuffix4, NounSuffix5, NounSuffix18}) //nolint:lll
	NounStateH = NewNounState(false, true, []Suffix{NounSuffix1})
	NounStateK = NewNounState(false, true, nil)
	NounStateL = NewNounState(false, true, []Suffix{NounSuffix18})
	NounStateM = NewNounState(false, true, []Suffix{NounSuffix1, NounSuffix2, NounSuffix3, NounSuffix4, NounSuffix5, NounSuffix6, NounSuffix6, NounSuffix7}) //nolint:lll

	// TTValues InitialState = true, FinalState = true
	TTValues = map[string]NounState{
		NounSuffix1.Name:  NounStateL,
		NounSuffix2.Name:  NounStateH,
		NounSuffix3.Name:  NounStateH,
		NounSuffix4.Name:  NounStateH,
		NounSuffix5.Name:  NounStateH,
		NounSuffix6.Name:  NounStateH,
		NounSuffix7.Name:  NounStateK,
		NounSuffix8.Name:  NounStateB,
		NounSuffix9.Name:  NounStateC,
		NounSuffix10.Name: NounStateE,
		NounSuffix11.Name: NounStateB,
		NounSuffix12.Name: NounStateF,
		NounSuffix13.Name: NounStateB,
		NounSuffix14.Name: NounStateF,
		NounSuffix15.Name: NounStateG,
		NounSuffix16.Name: NounStateC,
		NounSuffix17.Name: NounStateE,
		NounSuffix18.Name: NounStateD,
	}

	// InitialState = false, FinalState = true
	FTValues = map[string]NounState{
		NounSuffix1.Name:  NounStateL,
		NounSuffix2.Name:  NounStateH,
		NounSuffix3.Name:  NounStateH,
		NounSuffix4.Name:  NounStateH,
		NounSuffix5.Name:  NounStateH,
		NounSuffix6.Name:  NounStateH,
		NounSuffix7.Name:  NounStateK,
		NounSuffix18.Name: NounStateD,
	}

	// InitialState = false, FinalState = false
	FFValues = map[string]NounState{
		NounSuffix6.Name:  NounStateH,
		NounSuffix7.Name:  NounStateL,
		NounSuffix10.Name: NounStateE,
		NounSuffix13.Name: NounStateB,
		NounSuffix14.Name: NounStateF,
		NounSuffix18.Name: NounStateD,
	}
)

View Source

var (
	DerivationalSuffix1 = NewSuffix("-lU", "lı|li|lu|lü", "", true)

	NominalVerbSuffix1  = NewSuffix("-(y)Um", "ım|im|um|üm", "y", true)
	NominalVerbSuffix2  = NewSuffix("-sUn", "sın|sin|sun|sün", "", true)
	NominalVerbSuffix3  = NewSuffix("-(y)Uz", "ız|iz|uz|üz", "y", true)
	NominalVerbSuffix4  = NewSuffix("-sUnUz", "sınız|siniz|sunuz|sünüz", "", true)
	NominalVerbSuffix5  = NewSuffix("-lAr", "lar|ler", "", true)
	NominalVerbSuffix6  = NewSuffix("-m", "m", "", true)
	NominalVerbSuffix7  = NewSuffix("-n", "n", "", true)
	NominalVerbSuffix8  = NewSuffix("-k", "k", "", true)
	NominalVerbSuffix9  = NewSuffix("-nUz", "nız|niz|nuz|nüz", "", true)
	NominalVerbSuffix10 = NewSuffix("-DUr", "tır|tir|tur|tür|dır|dir|dur|dür", "", true)
	NominalVerbSuffix11 = NewSuffix("-cAsInA", "casına|çasına|cesine|çesine", "", true)
	NominalVerbSuffix12 = NewSuffix("-(y)DU", "dı|di|du|dü|tı|ti|tu|tü", "y", true)
	NominalVerbSuffix13 = NewSuffix("-(y)sA", "sa|se", "y", true)
	NominalVerbSuffix14 = NewSuffix("-(y)mUş", "muş|miş|müş|mış", "y", true)
	NominalVerbSuffix15 = NewSuffix("-(y)ken", "ken", "y", true)

	NounSuffix1  = NewSuffix("-lAr", "lar|ler", "", true)
	NounSuffix2  = NewSuffix("-(U)m", "m", "ı|i|u|ü", true)
	NounSuffix3  = NewSuffix("-(U)mUz", "mız|miz|muz|müz", "ı|i|u|ü", true)
	NounSuffix4  = NewSuffix("-Un", "ın|in|un|ün", "", true)
	NounSuffix5  = NewSuffix("-(U)nUz", "nız|niz|nuz|nüz", "ı|i|u|ü", true)
	NounSuffix6  = NewSuffix("-(s)U", "ı|i|u|ü", "s", true)
	NounSuffix7  = NewSuffix("-lArI", "ları|leri", "", true)
	NounSuffix8  = NewSuffix("-(y)U", "ı|i|u|ü", "y", true)
	NounSuffix9  = NewSuffix("-nU", "nı|ni|nu|nü", "", true)
	NounSuffix10 = NewSuffix("-(n)Un", "ın|in|un|ün", "n", true)
	NounSuffix11 = NewSuffix("-(y)A", "a|e", "y", true)
	NounSuffix12 = NewSuffix("-nA", "na|ne", "", true)
	NounSuffix13 = NewSuffix("-DA", "da|de|ta|te", "", true)
	NounSuffix14 = NewSuffix("-nDA", "nta|nte|nda|nde", "", true)
	NounSuffix15 = NewSuffix("-DAn", "dan|tan|den|ten", "", true)
	NounSuffix16 = NewSuffix("-nDAn", "ndan|ntan|nden|nten", "", true)
	NounSuffix17 = NewSuffix("-(y)lA", "la|le", "y", true)
	NounSuffix18 = NewSuffix("-ki", "ki", "", false)
	NounSuffix19 = NewSuffix("-(n)cA", "ca|ce", "n", true)

	// The order of this slice definition determines the priority of the suffix.
	DerivationalSuffixValues = []Suffix{DerivationalSuffix1}
	NominalVerbSuffixValues  = []Suffix{NominalVerbSuffix11, NominalVerbSuffix4, NominalVerbSuffix14, NominalVerbSuffix15, NominalVerbSuffix2, NominalVerbSuffix5, NominalVerbSuffix9, NominalVerbSuffix10, NominalVerbSuffix3, NominalVerbSuffix1, NominalVerbSuffix12, NominalVerbSuffix13, NominalVerbSuffix6, NominalVerbSuffix7, NominalVerbSuffix8} //nolint:lll
	NounSuffixValues         = []Suffix{NounSuffix16, NounSuffix7, NounSuffix3, NounSuffix5, NounSuffix1, NounSuffix14, NounSuffix15, NounSuffix17, NounSuffix10, NounSuffix19, NounSuffix4, NounSuffix9, NounSuffix12, NounSuffix13, NounSuffix18, NounSuffix2, NounSuffix6, NounSuffix8, NounSuffix11}                                                  //nolint:lll
)

Functions ¶

func Contains ¶

func Contains[T comparable](s []T, e T) bool

Contains Returns whether e is within s.

func CountSyllables ¶

func CountSyllables(word string) int

CountSyllables Returns the number/count of syllables of a word.

func GetType ¶

func GetType(i interface{}) string

GetType Returns type of interface.

func GetVowels ¶

func GetVowels(word string) []rune

GetVowels returns the vowels of a word.

func HasFrontness ¶

func HasFrontness(vowel, candidate string) bool

HasFrontness Checks the frontness harmony of two characters.

func HasRoundness ¶

func HasRoundness(vowel, candidate string) bool

HasRoundness Checks the roundness harmony of two characters.

func HasVowelHarmony ¶

func HasVowelHarmony(word string) bool

HasVowelHarmony Checks the vowel harmony of a word.

func IsTurkishWord ¶

func IsTurkishWord(word string) bool

IsTurkishWord Checks whether a word is written in Turkish alphabet or not.

func LoadWords ¶

func LoadWords(path string) []string

LoadWords Reads file line by line and returns string Slice

func LoadWordsFromSliceBytes ¶

func LoadWordsFromSliceBytes(data []byte) []string

LoadWordsFromSliceBytes Reads data from slide of bytes and split as newline ('\n')

func ReplaceStringAtIndex ¶

func ReplaceStringAtIndex(text string, r rune, i int) string

ReplaceStringAtIndex Find exact index number in text then replaces with r and returns new text.

func ValidateOptionalLetter ¶

func ValidateOptionalLetter(word string, candidate *rune) bool

ValidateOptionalLetter Checks whether an optional letter is valid or not.

func VowelHarmony ¶

func VowelHarmony(vowel, candidate string) bool

VowelHarmony Checks the vowel harmony of two characters.

Types ¶

type BaseState ¶

type BaseState struct {
	// contains filtered or unexported fields
}

func (BaseState) AddTransitions ¶

func (s BaseState) AddTransitions(word string, transitions *Transitions, startState State)

func (BaseState) FinalState ¶

func (s BaseState) FinalState() bool

func (BaseState) InitialState ¶

func (s BaseState) InitialState() bool

func (BaseState) Suffixes ¶

func (s BaseState) Suffixes() []Suffix

type DerivationalState ¶

type DerivationalState struct {
	BaseState
}

func NewDerivationalState ¶

func NewDerivationalState(initialState, finalState bool, suffixes []Suffix) DerivationalState

func (DerivationalState) NextState ¶

func (s DerivationalState) NextState(suffix string) State

type NominalVerbState ¶

type NominalVerbState struct {
	BaseState
}

func NewNominalVerbState ¶

func NewNominalVerbState(initialState, finalState bool, suffixes []Suffix) NominalVerbState

func (NominalVerbState) GetValues ¶

func (s NominalVerbState) GetValues() map[string]NominalVerbState

func (NominalVerbState) NextState ¶

func (s NominalVerbState) NextState(suffix string) State

type NounState ¶

type NounState struct {
	BaseState
}

func NewNounState ¶

func NewNounState(initialState, finalState bool, suffixes []Suffix) NounState

func (NounState) GetValues ¶

func (s NounState) GetValues() map[string]NounState

func (NounState) NextState ¶

func (s NounState) NextState(suffix string) State

type State ¶

type State interface {
	AddTransitions(word string, transitions *Transitions, startState State)
	NextState(suffix string) State
	InitialState() bool
	FinalState() bool
	Suffixes() []Suffix
}

func GetInitialNominalVerbState ¶

func GetInitialNominalVerbState() State

type Stemmer ¶

type Stemmer struct {
	ProtectedWords            []string
	VowelHarmonyExceptions    []string
	LastConsonantExceptions   []string
	AverageStemSizeExceptions []string
}

func New ¶

func New() Stemmer

New constructs a new Stemmer.

func (Stemmer) Stem ¶

func (s Stemmer) Stem(word string, tryCount ...int) string

Stem returns the stemmed word of a given un-stemmed word. In case it remained unstemmed it attempts to correct some mistypes such as 'u' instead of 'ü' and 'i' instead of 'ı'.

type Stems ¶

type Stems []string

type Suffix ¶

type Suffix struct {
	Name                  string
	Pattern               *regexp.Regexp
	OptionalLetterPattern *regexp.Regexp
	OptionalLetterCheck   bool
	CheckHarmony          bool
	OptionalLetter        string
}

func NewSuffix ¶

func NewSuffix(name, pattern, optionalLetter string, checkHarmony bool) Suffix

func (Suffix) GetOptionalLetter ¶

func (s Suffix) GetOptionalLetter(word string) *rune

func (Suffix) Match ¶

func (s Suffix) Match(word string) bool

func (Suffix) RemoveSuffix ¶

func (s Suffix) RemoveSuffix(word string) string

func (Suffix) String ¶

func (s Suffix) String() string

type Transition ¶

type Transition struct {
	StartState State
	NextState  State
	Word       string
	Suffix     Suffix
	Marked     bool
}

func NewTransition ¶

func NewTransition(startState, nextState State, word string, suffix Suffix) *Transition

func (Transition) SimilarTransitions ¶

func (t Transition) SimilarTransitions(transitions Transitions) *Transitions

func (Transition) String ¶

func (t Transition) String() string

type Transitions ¶

type Transitions []*Transition

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL