arabicgo

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 5, 2026 License: MIT Imports: 1 Imported by: 0

README

ArabicGo

A powerful Arabic text shaping library for Go, specifically designed for Quranic text rendering with full RTL (right-to-left) support. This library handles the complete Arabic character shaping pipeline, making it the ideal choice for applications requiring proper Arabic typography.

Key Features

  • Complete Quranic Text Support - Render the Holy Quran with all diacritical marks perfectly preserved
  • Advanced Tashkeel Engine - Full support for all Arabic diacritical marks:
    • Harakat: Fatha (َ), Damma (ُ), Kasra (ِ), Sukun (ْ)
    • Shadda: Gemination mark (ّ) with automatic vowel ligature combining
    • Tanween: Fathatan (ً), Dammatan (ٌ), Kasratan (ٍ)
    • Quranic Marks: Superscript Alef (ٰ), Maddah (ٓ), Hamza Above/Below (ٔ ٕ), Subscript Alef (ٖ), Inverted Damma (ٗ), Noon Ghunna (٘)
  • Intelligent Character Joining - Automatic joining of Arabic letters in their correct contextual forms (isolated, initial, medial, final)
  • Ligature Rendering - Proper Lam-Alef (لا) and Allah (ﷲ) ligature formation
  • Arabic-Indic Numerals - Automatic conversion from Western (0-9) to Eastern Arabic-Indic (٠-٩) numerals
  • RTL Text Processing - Correct right-to-left text ordering for PDF and image generation

Screenshots

Surah Al-Fatiha - Complete Quranic Rendering

Surah Al-Fatiha

The opening chapter of the Holy Quran rendered with complete tashkeel, demonstrating perfect Arabic typography

Arabic Letters with Numerals

Arabic Numerals

Demonstration of Arabic-Indic numeral conversion and mixed Arabic-number text

Installation

go get github.com/AmmrFX/arabicgo

Quick Start

package main

import (
    "fmt"
    "github.com/AmmrFX/arabicgo"
)

func main() {
    // Render Quranic text with full tashkeel
    bismillah := arabicgo.ToArabic("بِسْمِ اللهِ الرَّحْمَنِ الرَّحِيمِ")
    fmt.Println(bismillah)

    // Shape any Arabic text
    greeting := arabicgo.Shape("السَّلامُ عَلَيْكُمْ")
    fmt.Println(greeting)
}

PDF Generation Example

package main

import (
    "log"
    "github.com/AmmrFX/arabicgo"
    "github.com/signintech/gopdf"
)

func main() {
    pdf := gopdf.GoPdf{}
    pdf.Start(gopdf.Config{PageSize: *gopdf.PageSizeA4})
    pdf.AddPage()

    err := pdf.AddTTFFont("Arabic", "path/to/arabic-font.ttf")
    if err != nil {
        log.Fatal(err)
    }
    pdf.SetFont("Arabic", "", 24)

    // Render Surah Al-Fatiha
    pdf.SetXY(50, 50)
    pdf.Cell(nil, arabicgo.ToArabic("بِسْمِ اللهِ الرَّحْمَنِ الرَّحِيمِ"))

    pdf.SetXY(50, 90)
    pdf.Cell(nil, arabicgo.ToArabic("الْحَمْدُ لله رَبِّ الْعَالَمِينَ"))

    // Numbers automatically convert to Arabic-Indic
    pdf.SetXY(50, 130)
    pdf.Cell(nil, arabicgo.ToArabic("سورة الفاتحة - 7 آيات"))

    pdf.WritePdf("quran.pdf")
}

API Reference

Core Functions
ToArabic(text string) string

The main text processing function. Transforms Arabic text for proper visual rendering by:

  • Applying contextual character shaping
  • Forming required ligatures (Lam-Alef, Allah)
  • Preserving and positioning tashkeel marks
  • Converting Western numerals to Arabic-Indic
  • Reversing text for RTL display
Shape(text string) string

Alias for ToArabic() - use whichever name you prefer.

Utility Functions
IsTashkeel(r rune) bool

Check if a rune is an Arabic diacritical mark.

IsWesternDigit(r rune) bool

Check if a rune is a Western Arabic digit (0-9).

ToEasternDigit(r rune) rune

Convert a single Western digit to its Eastern Arabic-Indic equivalent.

GetShaddaLigature(vowel rune) rune

Get the combined Shadda+Vowel ligature character for a vowel.

Supported Characters

Arabic Alphabet

All 28 Arabic letters with full contextual forms:

ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ي
Special Characters
ة (Teh Marbuta)    ى (Alef Maksura)    ء (Hamza)
أ (Alef + Hamza)   إ (Alef + Hamza Below)   آ (Alef + Maddah)
ؤ (Waw + Hamza)    ئ (Yeh + Hamza)
Persian/Urdu Extensions
پ (Peh)   چ (Tcheh)   ژ (Jeh)   گ (Gaf)   ک (Keheh)
Complete Tashkeel Set
Mark Name Unicode
َ Fatha U+064E
ُ Damma U+064F
ِ Kasra U+0650
ْ Sukun U+0652
ّ Shadda U+0651
ً Tanween Fath U+064B
ٌ Tanween Damm U+064C
ٍ Tanween Kasr U+064D
ٰ Superscript Alef U+0670
ٓ Maddah Above U+0653
ٔ Hamza Above U+0654
ٕ Hamza Below U+0655
ٖ Subscript Alef U+0656
ٗ Inverted Damma U+0657
٘ Noon Ghunna U+0658
Shadda + Vowel Ligatures

The library automatically combines Shadda with vowels into single ligature characters:

Combination Ligature
Shadda + Fatha ﱠ (U+FC60)
Shadda + Damma ﱡ (U+FC61)
Shadda + Kasra ﱢ (U+FC62)
Shadda + Dammatan ﱞ (U+FC5E)
Shadda + Kasratan ﱟ (U+FC5F)
Shadda + Superscript Alef ﱣ (U+FC63)

Why ArabicGo?

PDF libraries and image generators don't natively support Arabic because:

  1. Contextual Shaping - Arabic letters have 4 different forms depending on position
  2. Right-to-Left Flow - Text must be reversed for proper display
  3. Tashkeel Complexity - Diacritical marks must stay attached to their base letters
  4. Ligature Requirements - Certain letter combinations must form ligatures

ArabicGo handles all of this automatically, allowing you to focus on your application logic while producing beautiful, correctly-rendered Arabic text - including the Holy Quran with complete tashkeel.

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author

AmmrFX - GitHub

Documentation

Overview

Package arabicgo provides Arabic text shaping and processing for Go applications. It handles Arabic character joining, ligatures, tashkeel (diacritical marks), and right-to-left text rendering.

Index

Constants

View Source
const (
	FATHA  rune = '\u064E' // َ (short a)
	DAMMA  rune = '\u064F' // ُ (short u)
	KASRA  rune = '\u0650' // ِ (short i)
	SUKUN  rune = '\u0652' // ْ (no vowel)
	SHADDA rune = '\u0651' // ّ (gemination/doubling)

	// Tanween (nunation)
	TANWEEN_FATH rune = '\u064B' // ً (an)
	TANWEEN_DAMM rune = '\u064C' // ٌ (un)
	TANWEEN_KASR rune = '\u064D' // ٍ (in)

	// Quranic / Extended marks
	SUPERSCRIPT_ALEF rune = '\u0670' // ٰ (dagger alef)
	MADDAH_ABOVE     rune = '\u0653' // ٓ (maddah)
	HAMZA_ABOVE      rune = '\u0654' // ٔ (hamza above)
	HAMZA_BELOW      rune = '\u0655' // ٕ (hamza below)
	SUBSCRIPT_ALEF   rune = '\u0656' // ٖ (subscript alef)
	INVERTED_DAMMA   rune = '\u0657' // ٗ (inverted damma)
	MARK_NOON_GHUNNA rune = '\u0658' // ٘ (noon ghunna)

	// Shadda + Vowel Ligatures (Arabic Presentation Forms-B)
	SHADDA_FATHA            rune = '\uFC60' // ﱠ
	SHADDA_DAMMA            rune = '\uFC61' // ﱡ
	SHADDA_KASRA            rune = '\uFC62' // ﱢ
	SHADDA_DAMMATAN         rune = '\uFC5E' // ﱞ (Shadda + Tanween Damm)
	SHADDA_KASRATAN         rune = '\uFC5F' // ﱟ (Shadda + Tanween Kasr)
	SHADDA_SUPERSCRIPT_ALEF rune = '\uFC63' // ﱣ

	// Eastern Arabic-Indic numerals (٠-٩)
	// Unicode range: U+0660 to U+0669
	ARABIC_INDIC_ZERO  rune = '\u0660' // ٠
	ARABIC_INDIC_ONE   rune = '\u0661' // ١
	ARABIC_INDIC_TWO   rune = '\u0662' // ٢
	ARABIC_INDIC_THREE rune = '\u0663' // ٣
	ARABIC_INDIC_FOUR  rune = '\u0664' // ٤
	ARABIC_INDIC_FIVE  rune = '\u0665' // ٥
	ARABIC_INDIC_SIX   rune = '\u0666' // ٦
	ARABIC_INDIC_SEVEN rune = '\u0667' // ٧
	ARABIC_INDIC_EIGHT rune = '\u0668' // ٨
	ARABIC_INDIC_NINE  rune = '\u0669' // ٩
)
View Source
const ALLAH_LIGATURE rune = 0xFDF2

ALLAH_LIGATURE is the Unicode character for the Allah ligature (U+FDF2 ﷲ)

Variables

View Source
var (
	ALEF_HAMZA_ABOVE = Harf{
		Unicode:   '\u0623',
		Isolated:  '\ufe83',
		Beginning: '\u0623',
		Middle:    '\ufe84',
		Final:     '\ufe84'}

	ALEF = Harf{
		Unicode:   '\u0627',
		Isolated:  '\ufe8d',
		Beginning: '\u0627',
		Middle:    '\ufe8e',
		Final:     '\ufe8e'}

	ALEF_MADDA_ABOVE = Harf{
		Unicode:   '\u0622',
		Isolated:  '\ufe81',
		Beginning: '\u0622',
		Middle:    '\ufe82',
		Final:     '\ufe82'}

	HAMZA = Harf{
		Unicode:   '\u0621',
		Isolated:  '\ufe80',
		Beginning: '\u0621',
		Middle:    '\u0621',
		Final:     '\u0621'}

	WAW_HAMZA_ABOVE = Harf{
		Unicode:   '\u0624',
		Isolated:  '\ufe85',
		Beginning: '\u0624',
		Middle:    '\ufe86',
		Final:     '\ufe86'}

	ALEF_HAMZA_BELOW = Harf{
		Unicode:   '\u0625',
		Isolated:  '\ufe87',
		Beginning: '\u0625',
		Middle:    '\ufe88',
		Final:     '\ufe88'}

	YEH_HAMZA_ABOVE = Harf{
		Unicode:   '\u0626',
		Isolated:  '\ufe89',
		Beginning: '\ufe8b',
		Middle:    '\ufe8c',
		Final:     '\ufe8a'}

	BEH = Harf{
		Unicode:   '\u0628',
		Isolated:  '\ufe8f',
		Beginning: '\ufe91',
		Middle:    '\ufe92',
		Final:     '\ufe90'}

	PEH = Harf{
		Unicode:   '\u067e',
		Isolated:  '\ufb56',
		Beginning: '\ufb58',
		Middle:    '\ufb59',
		Final:     '\ufb57'}

	TEH = Harf{
		Unicode:   '\u062A',
		Isolated:  '\ufe95',
		Beginning: '\ufe97',
		Middle:    '\ufe98',
		Final:     '\ufe96'}

	TEH_MARBUTA = Harf{
		Unicode:   '\u0629',
		Isolated:  '\ufe93',
		Beginning: '\u0629',
		Middle:    '\u0629',
		Final:     '\ufe94'}

	THEH = Harf{
		Unicode:   '\u062b',
		Isolated:  '\ufe99',
		Beginning: '\ufe9b',
		Middle:    '\ufe9c',
		Final:     '\ufe9a'}

	JEEM = Harf{
		Unicode:   '\u062c',
		Isolated:  '\ufe9d',
		Beginning: '\ufe9f',
		Middle:    '\ufea0',
		Final:     '\ufe9e'} // ـج

	TCHEH = Harf{
		Unicode:   '\u0686',
		Isolated:  '\ufb7a',
		Beginning: '\ufb7c',
		Middle:    '\ufb7d',
		Final:     '\ufb7b'}

	HAH = Harf{
		Unicode:   '\u062d',
		Isolated:  '\ufea1',
		Beginning: '\ufea3',
		Middle:    '\ufea4',
		Final:     '\ufea2'}

	KHAH = Harf{
		Unicode:   '\u062e',
		Isolated:  '\ufea5',
		Beginning: '\ufea7',
		Middle:    '\ufea8',
		Final:     '\ufea6'}

	DAL = Harf{
		Unicode:   '\u062f',
		Isolated:  '\ufea9',
		Beginning: '\u062f',
		Middle:    '\ufeaa',
		Final:     '\ufeaa'}

	THAL = Harf{
		Unicode:   '\u0630',
		Isolated:  '\ufeab',
		Beginning: '\u0630',
		Middle:    '\ufeac',
		Final:     '\ufeac'}

	REH = Harf{
		Unicode:   '\u0631',
		Isolated:  '\ufead',
		Beginning: '\u0631',
		Middle:    '\ufeae',
		Final:     '\ufeae'}

	JEH = Harf{
		Unicode:   '\u0698',
		Isolated:  '\ufb8a',
		Beginning: '\u0698',
		Middle:    '\ufb8b',
		Final:     '\ufb8b',
	}

	ZAIN = Harf{
		Unicode:   '\u0632',
		Isolated:  '\ufeaf',
		Beginning: '\u0632',
		Middle:    '\ufeb0',
		Final:     '\ufeb0'}

	SEEN = Harf{
		Unicode:   '\u0633',
		Isolated:  '\ufeb1',
		Beginning: '\ufeb3',
		Middle:    '\ufeb4',
		Final:     '\ufeb2'}

	SHEEN = Harf{
		Unicode:   '\u0634',
		Isolated:  '\ufeb5',
		Beginning: '\ufeb7',
		Middle:    '\ufeb8',
		Final:     '\ufeb6'}

	SAD = Harf{
		Unicode:   '\u0635',
		Isolated:  '\ufeb9',
		Beginning: '\ufebb',
		Middle:    '\ufebc',
		Final:     '\ufeba'}

	DAD = Harf{
		Unicode:   '\u0636',
		Isolated:  '\ufebd',
		Beginning: '\ufebf',
		Middle:    '\ufec0',
		Final:     '\ufebe'}

	TAH = Harf{
		Unicode:   '\u0637',
		Isolated:  '\ufec1',
		Beginning: '\ufec3',
		Middle:    '\ufec4',
		Final:     '\ufec2'}

	ZAH = Harf{
		Unicode:   '\u0638',
		Isolated:  '\ufec5',
		Beginning: '\ufec7',
		Middle:    '\ufec8',
		Final:     '\ufec6'}

	AIN = Harf{
		Unicode:   '\u0639',
		Isolated:  '\ufec9',
		Beginning: '\ufecb',
		Middle:    '\ufecc',
		Final:     '\ufeca'}

	GHAIN = Harf{
		Unicode:   '\u063a',
		Isolated:  '\ufecd',
		Beginning: '\ufecf',
		Middle:    '\ufed0',
		Final:     '\ufece'}

	FEH = Harf{
		Unicode:   '\u0641',
		Isolated:  '\ufed1',
		Beginning: '\ufed3',
		Middle:    '\ufed4',
		Final:     '\ufed2'}

	QAF = Harf{
		Unicode:   '\u0642',
		Isolated:  '\ufed5',
		Beginning: '\ufed7',
		Middle:    '\ufed8',
		Final:     '\ufed6'}

	KAF = Harf{
		Unicode:   '\u0643',
		Isolated:  '\ufed9',
		Beginning: '\ufedb',
		Middle:    '\ufedc',
		Final:     '\ufeda'}

	KEHEH = Harf{
		Unicode:   '\u06a9',
		Isolated:  '\ufb8e',
		Beginning: '\ufb90',
		Middle:    '\ufb91',
		Final:     '\ufb8f',
	}

	GAF = Harf{
		Unicode:   '\u06af',
		Isolated:  '\ufb92',
		Beginning: '\ufb94',
		Middle:    '\ufb95',
		Final:     '\ufb93'}

	LAM = Harf{
		Unicode:   '\u0644',
		Isolated:  '\ufedd',
		Beginning: '\ufedf',
		Middle:    '\ufee0',
		Final:     '\ufede'}

	MEEM = Harf{
		Unicode:   '\u0645',
		Isolated:  '\ufee1',
		Beginning: '\ufee3',
		Middle:    '\ufee4',
		Final:     '\ufee2'}

	NOON = Harf{
		Unicode:   '\u0646',
		Isolated:  '\ufee5',
		Beginning: '\ufee7',
		Middle:    '\ufee8',
		Final:     '\ufee6'}

	HEH = Harf{
		Unicode:   '\u0647',
		Isolated:  '\ufee9',
		Beginning: '\ufeeb',
		Middle:    '\ufeec',
		Final:     '\ufeea'}

	WAW = Harf{
		Unicode:   '\u0648',
		Isolated:  '\ufeed',
		Beginning: '\u0648',
		Middle:    '\ufeee',
		Final:     '\ufeee'}

	YEH = Harf{
		Unicode:   '\u06cc',
		Isolated:  '\ufbfc',
		Beginning: '\ufbfe',
		Middle:    '\ufbff',
		Final:     '\ufbfd'}

	ARABICYEH = Harf{
		Unicode:   '\u064a',
		Isolated:  '\ufef1',
		Beginning: '\ufef3',
		Middle:    '\ufef4',
		Final:     '\ufef2'}

	ALEF_MAKSURA = Harf{
		Unicode:   '\u0649',
		Isolated:  '\ufeef',
		Beginning: '\u0649',
		Middle:    '\ufef0',
		Final:     '\ufef0'}

	TATWEEL = Harf{
		Unicode:   '\u0640',
		Isolated:  '\u0640',
		Beginning: '\u0640',
		Middle:    '\u0640',
		Final:     '\u0640'}

	LAM_ALEF = Harf{
		Unicode:   '\ufefb',
		Isolated:  '\ufefb',
		Beginning: '\ufefb',
		Middle:    '\ufefc',
		Final:     '\ufefc'}

	LAM_ALEF_HAMZA_ABOVE = Harf{
		Unicode:   '\ufef7',
		Isolated:  '\ufef7',
		Beginning: '\ufef7',
		Middle:    '\ufef8',
		Final:     '\ufef8'}
)

Arabic Alphabet using the Harf type.

Functions

func GetShaddaLigature

func GetShaddaLigature(vowel rune) rune

GetShaddaLigature returns the combined Shadda+Vowel ligature for a given vowel. Returns 0 if no ligature exists for the vowel.

func IsTashkeel

func IsTashkeel(r rune) bool

IsTashkeel returns true if the rune is an Arabic diacritical mark

func IsWesternDigit

func IsWesternDigit(r rune) bool

IsWesternDigit returns true if the rune is a Western Arabic digit (0-9).

func Shape

func Shape(text string) string

Shape is an alias for ToArabic for backward compatibility.

func ToArabic

func ToArabic(text string) string

ToArabic processes Arabic text for proper display. It handles character joining, ligatures (Lam-Alef, Allah), tashkeel, converts Western digits (0-9) to Eastern Arabic-Indic (٠-٩), and reverses the text for RTL rendering.

func ToEasternDigit

func ToEasternDigit(r rune) rune

ToEasternDigit converts a Western Arabic digit (0-9) to Eastern Arabic-Indic (٠-٩). Returns the original rune if not a Western digit.

Types

type Harf

type Harf struct {
	Unicode, Isolated, Beginning, Middle, Final rune
}

Harf is the Arabic meaning of Letter. Harf holds the Arabic character with its different representation forms (glyphs).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL