Documentation
¶
Overview ¶
Package langid provides a high-performance natural language identifier library supporting 97 languages. It is a pure Go runtime port of the langid inference stack, initially derived from langid.c and later expanded for parity with langid.js and langid.py.
The package ports the Naive Bayes/DFA inference path, not the original training pipeline. It is CGO-free, making it simple to cross-compile and safe for highly concurrent production pipelines.
Index ¶
- func Normalize(results []Result)
- type Identifier
- func (id *Identifier) Classes() []string
- func (id *Identifier) IdentifyBytes(text []byte) (Result, error)
- func (id *Identifier) IdentifyFile(path string) (Result, error)
- func (id *Identifier) IdentifyString(text string) (Result, error)
- func (id *Identifier) KeepOnly(langs ...string) errordeprecated
- func (id *Identifier) RankBytes(text []byte) ([]Result, error)
- func (id *Identifier) RankFile(path string) ([]Result, error)
- func (id *Identifier) RankString(text string) ([]Result, error)
- func (id *Identifier) ResetLanguages()
- func (id *Identifier) SetLanguages(langs ...string) error
- type Result
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Identifier ¶
type Identifier struct {
// contains filtered or unexported fields
}
Identifier classifies text by language using a pre-trained model.
func LoadModel ¶
func LoadModel(path string) (*Identifier, error)
LoadModel reads a .lidg model file.
func NewDefaultIdentifier ¶
func NewDefaultIdentifier() (*Identifier, error)
NewDefaultIdentifier loads the embedded default model (ldpy).
func (*Identifier) Classes ¶
func (id *Identifier) Classes() []string
Classes returns the active language classes supported by the identifier.
func (*Identifier) IdentifyBytes ¶
func (id *Identifier) IdentifyBytes(text []byte) (Result, error)
IdentifyBytes predicts a language label for bytes.
func (*Identifier) IdentifyFile ¶
func (id *Identifier) IdentifyFile(path string) (Result, error)
IdentifyFile reads the file at the specified path and predicts its language. If reading the file fails, it returns the wrapped filesystem error without swallowing context.
func (*Identifier) IdentifyString ¶
func (id *Identifier) IdentifyString(text string) (Result, error)
IdentifyString predicts a language label for text.
func (*Identifier) KeepOnly
deprecated
func (id *Identifier) KeepOnly(langs ...string) error
KeepOnly restricts the identifier to a specific subset of languages.
Deprecated: Use SetLanguages instead, which has identical behavior with stricter language validation and support for resetting subsets.
func (*Identifier) RankBytes ¶
func (id *Identifier) RankBytes(text []byte) ([]Result, error)
RankBytes returns a sorted list of all languages and their raw log scores.
func (*Identifier) RankFile ¶
func (id *Identifier) RankFile(path string) ([]Result, error)
RankFile reads the file at the specified path and ranks all supported languages by likelihood. If reading the file fails, it returns the wrapped filesystem error without swallowing context.
func (*Identifier) RankString ¶
func (id *Identifier) RankString(text string) ([]Result, error)
RankString returns a sorted list of all languages and their raw log scores.
func (*Identifier) ResetLanguages ¶
func (id *Identifier) ResetLanguages()
ResetLanguages restores the active language set of the identifier to include all languages present in the original loaded model.
func (*Identifier) SetLanguages ¶
func (id *Identifier) SetLanguages(langs ...string) error
SetLanguages restricts the active language set of the identifier to the specified subset. If langs is empty or nil, it resets the active languages to the original model languages. If any requested language is not supported by the model, it returns an error and leaves the active language set unmodified (atomic operation).