whatlanggo

package module
v0.0.0-...-e869148 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 29, 2017 License: MIT Imports: 2 Imported by: 1

README

Whatlanggo

Build Status Go Report Card GoDoc

Natural language detection for Go.

Features

  • Supports 84 languages
  • 100% written in Go
  • No external dependencies
  • Fast
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)

Getting started

Installation:

    go get -u github.com/abadojack/whatlanggo

Simple usage example:

package main

import (
	"fmt"
	"github.com/abadojack/whatlanggo"
)

func main() {
	info := whatlanggo.Detect("Foje funkcias kaj foje ne funkcias")
	fmt.Println("Language:", whatlanggo.LangToString(info.Lang), "Script:", whatlanggo.Scripts[info.Script])
}

Blacklisting and whitelisting

import "github.com/abadojack/whatlanggo"

//Blacklist
options := whatlanggo.Options{
	Blacklist: map[whatlanggo.Lang]bool{
		whatlanggo.Ydd: true,
	},
}

info := whatlanggo.DetectWithOptions("האקדמיה ללשון העברית", options)

fmt.Println("Language:", whatlanggo.LangToString(info.Lang), "Script:", whatlanggo.Scripts[info.Script])

//Whitelist
options1 := whatlanggo.Options{
	Whitelist: map[whatlanggo.Lang]bool{
		whatlanggo.Epo: true,
		whatlanggo.Ukr: true,
	},
}

info = whatlanggo.DetectWithOptions("Mi ne scias", options1)
fmt.Println("Language:", whatlanggo.LangToString(info.Lang), "Script:", whatlanggo.Scripts[info.Script])

For more details, please check the documentation

##TODO Add reliabilty metrics in the Info struct

License

MIT

Derivation

whatlanggo is a derivative Franc (JavaScript, MIT) by Titus Wormer

Acknowledgements

Thanks to greyblake Potapov Sergey for creating whatlang-rs from where I got the idea and logic.

Documentation

Overview

Package whatlanggo detects natural languages and scripts ( writing systems ). Languages are represented by a determined list of constants while scripts are represented by *unicode.RangeTable.

Index

Constants

This section is empty.

Variables

View Source
var Langs = map[Lang]string{
	Aka: "Akan",
	Amh: "Amharic",
	Arb: "Arabic",
	Azj: "Azerbaijani",
	Bel: "Belarusian",
	Ben: "Bengali",
	Bho: "Bhojpuri",
	Bul: "Bulgarian",
	Ceb: "Cebuano",
	Ces: "Czech",
	Cmn: "Mandarin",
	Dan: "Danish",
	Deu: "German",
	Ell: "Greek",
	Eng: "English",
	Epo: "Esperanto",
	Est: "Estonian",
	Fin: "Finnish",
	Fra: "French",
	Guj: "Gujarati",
	Hat: "Haitian Creole",
	Hau: "Hausa",
	Heb: "Hebrew",
	Hin: "Hindi",
	Hrv: "Croatian",
	Hun: "Hungarian",
	Ibo: "Igbo",
	Ilo: "Ilocano",
	Ind: "Indonesian",
	Ita: "Italian",
	Jav: "Javanese",
	Jpn: "Japanese",
	Kan: "Kannada",
	Kat: "Georgian",
	Khm: "Khmer",
	Kin: "Kinyarwanda",
	Kor: "Korean",
	Kur: "Kurdish",
	Lav: "Latvian",
	Lit: "Lithuanian",
	Mai: "Maithili",
	Mal: "Malayalam",
	Mar: "Marathi",
	Mkd: "Macedonian",
	Mlg: "Malagasy",
	Mya: "Burmese",
	Nep: "Nepali",
	Nld: "Dutch",
	Nno: "Nynorsk",
	Nob: "Bokmal",
	Nya: "Chewa",
	Ori: "Oriya",
	Orm: "Oromo",
	Pan: "Punjabi",
	Pes: "Persian",
	Pol: "Polish",
	Por: "Portuguese",
	Ron: "Romanian",
	Run: "Rundi",
	Rus: "Russian",
	Sin: "Sinhalese",
	Skr: "Saraiki",
	Slv: "Slovene",
	Sna: "Shona",
	Som: "Somali",
	Spa: "Spanish",
	Srp: "Serbian",
	Swe: "Swedish",
	Tam: "Tamil",
	Tel: "Telugu",
	Tgl: "Tagalog",
	Tha: "Thai",
	Tir: "Tigrinya",
	Tuk: "Turkmen",
	Tur: "Turkish",
	Uig: "Uyghur",
	Ukr: "Ukranian",
	Urd: "Urdu",
	Uzb: "Uzbek",
	Vie: "Vietnamese",
	Ydd: "Yiddish",
	Yor: "Yoruba",
	Zul: "Zulu",
}

Langs represents a map of Lang to language name.

View Source
var Scripts = map[*unicode.RangeTable]string{
	unicode.Latin:      "Latin",
	unicode.Cyrillic:   "Cyrillic",
	unicode.Arabic:     "Arabic",
	unicode.Devanagari: "Devanagari",
	unicode.Hiragana:   "Hiragana",
	unicode.Katakana:   "Katakana",
	unicode.Ethiopic:   "Ethiopic",
	unicode.Hebrew:     "Hebrew",
	unicode.Bengali:    "Bengali",
	unicode.Georgian:   "Georgian",
	unicode.Han:        "Han",
	unicode.Hangul:     "Hangul",
	unicode.Greek:      "Greek",
	unicode.Kannada:    "Kannada",
	unicode.Tamil:      "Tamil",
	unicode.Thai:       "Thai",
	unicode.Gujarati:   "Gujarati",
	unicode.Gurmukhi:   "Gurmukhi",
	unicode.Telugu:     "Telugu",
	unicode.Malayalam:  "Malayalam",
	unicode.Oriya:      "Oriya",
	unicode.Myanmar:    "Myanmar",
	unicode.Sinhala:    "Sinhala",
	unicode.Khmer:      "Khmer",
}

Scripts is the set of Unicode script tables.

Functions

func DetectScript

func DetectScript(text string) *unicode.RangeTable

DetectScript returns only the script of the given text.

func LangToString

func LangToString(lang Lang) string

LangToString converts enum into ISO 639-3 code as a string.

Types

type Info

type Info struct {
	Lang   Lang
	Script *unicode.RangeTable
}

Info represents a full outcome of language detection.

func Detect

func Detect(text string) Info

Detect language and script of the given text.

func DetectWithOptions

func DetectWithOptions(text string, options Options) Info

DetectWithOptions detects the language and script of the given text with the provided options.

type Lang

type Lang int

Lang represents a language following ISO 639-3 standard.

const (
	Aka Lang = iota
	Amh
	Arb
	Azj
	Bel
	Ben
	Bho
	Bul
	Ceb
	Ces
	Cmn
	Dan
	Deu
	Ell
	Eng
	Epo
	Est
	Fin
	Fra
	Guj
	Hat
	Hau
	Heb
	Hin
	Hrv
	Hun
	Ibo
	Ilo
	Ind
	Ita
	Jav
	Jpn
	Kan
	Kat
	Khm
	Kin
	Kor
	Kur
	Lav
	Lit
	Mai
	Mal
	Mar
	Mkd
	Mlg
	Mya
	Nep
	Nld
	Nno
	Nob
	Nya
	Ori
	Orm
	Pan
	Pes
	Pol
	Por
	Ron
	Run
	Rus
	Sin
	Skr
	Slv
	Sna
	Som
	Spa
	Srp
	Swe
	Tam
	Tel
	Tgl
	Tha
	Tir
	Tuk
	Tur
	Uig
	Ukr
	Urd
	Uzb
	Vie
	Ydd
	Yor
	Zul
)

Aka ...

func CodeToLang

func CodeToLang(code string) Lang

CodeToLang gets enum by ISO 639-3 code as a string.

func DetectLang

func DetectLang(text string) Lang

DetectLang detects only the language by a given text.

func DetectLangWithOptions

func DetectLangWithOptions(text string, options Options) Lang

DetectLangWithOptions detects only the language of the given text with the provided options.

type Options

type Options struct {
	Whitelist map[Lang]bool
	Blacklist map[Lang]bool
}

Options represents options that can be set when detecting a language or/and script such blacklisting languages to skip checking.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL