pronlex

module
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2026 License: GPL-3.0

README

pronlex

pronlex is a pronunciation lexicon database with a server behind a simple HTTP API.

GoDoc Go Report Card Build Status

Lexicon server / Installation instructions

Utility scripts below (setup, import, start_server) require a working bash installation (preferably on a Linux system).

I. Installation
  1. Prerequisites

    If you're on Linux, you may need to install gcc and build-essential for the sqlite3 go adapter to work properly:

    sudo apt-get install gcc build-essential
    
  2. Set up go

    Download: https://golang.org/dl/ (1.25 or higher)
    Installation instructions: https://golang.org/doc/install

  3. Install database support

    Sqlite3: On Linux systems with apt, run sudo apt install sqlite3

    MariaDB: On Linux systems with apt, run sudo apt install mariadb-server or similar (it should be version 10.1.3 or higher)

    Please note that you need to install both databases if you intend to run unit tests or other automated tests

  4. Clone the source code

    git clone https://github.com/stts-se/pronlex.git 
    cd pronlex
    
  5. Test (optional)

    go test ./...
    
  6. Set up MariaDB (optional)

    sudo mysql -u root < scripts/mariadb_setup.sql
    cd dbapi
    go test . -mariadb # run unit tests (optional)
    
II. Server setup
  1. Setup the pronlex server

    pronlex$ bash scripts/setup.sh -a <application folder> -e <db engine> -l <db location>*

    Example:

    bash scripts/setup.sh -a ~/wikispeech/sqlite -e sqlite
    

    Usage info:

    bash scripts/setup.sh -h
    

    Sets up the pronlex server using the specified database engine and specified location, and a set of test data. The db location folder is not required for sqlite (if it's not specified, the application folder will be used for db location).

    The application folder is where databases and other resources will be installed. It can be any folder of your choice.

    If, for some reason, you are not using the above setup script to configure your pronlex installation, you need to configure mariadb using the mariadb setup script (as root):

    sudo mysql -u root < scripts/mariadb_setup.sql
    
  2. Import lexicon data (optional)

    pronlex$ bash scripts/import.sh -a <application folder> -e <db engine> -l <db location>* -f <lexdata git>

    Example:

    bash scripts/import.sh -a ~/wikispeech/sqlite -e sqlite -f ~/git_repos/wikispeech-lexdata
    

    Imports lexicon databases (sql dumps) for Swedish, Norwegian, US English, and a small set of test data for Arabic from the wikispeech-lexdata repository. If the <lexdata git> folder exists on disk, lexicon resources will be read from this folder. If it doesn't exist, the lexicon data will be downloaded from github. The db location folder is not required for sqlite (if it's not specified, the application folder will be used for db location).

    If you want to import other lexicon data, or just a subset of the data above, you can use one of the following methods:

    You can create your own lexicon files, or you can use existing data in the wikispeech-lexdata repository. The lexicon file format is described here: https://godoc.org/github.com/stts-se/pronlex/line.

III. Start the lexicon server

The server is started using this script:

pronlex$ bash scripts/start_server.sh -e <db engine> -l <db location>* -a <application folder>

Example:

bash scripts/start_server.sh -e sqlite -a ~/wikispeech/sqlite/

The startup script will run some init tests in a separate test server, before starting the standard server.

When the standard (non-testing) server is started, it always creates a demo database and lexicon, containing a few simple entries for demo and testing purposes. The server can thus be started and tested even if you haven't imported the lexicon data above.

For a complete set of options, run:

bash scripts/start_server.sh -h
IV. Advanced usage: Create a lexicon database file and look up a word (for Sqlite configuration)
  1. Download an SQL lexicon dump file. In the following example, we use a Swedish lexicon: https://github.com/stts-se/wikispeech-lexdata/blob/master/sv-se/nst/swe030224NST.pron-ws.utf8.sqlite.sql.gz

  2. Pre-compile binaries (for faster execution times)

    pronlex$ go build ./...

  3. Create a database file (this takes a while):

    pronlex$ importSql -db_engine sqlite -db_location ~/wikispeech/sqlite/ -db_name sv_db swe030224NST.pron-ws.utf8.sqlite.sql.gz

  4. Test looking up a word:

    pronlex$ lexlookup -db_engine sqlite -db_location ~/wikispeech/sqlite/ -db_name sv_db -lexicon swe_lex åsna


This work was supported by the Swedish Post and Telecom Authority (PTS) through the grant "Wikispeech – en användargenererad talsyntes på Wikipedia" (2016–2017), and the Swedish Inheritance Fund (Allmänna arvsfonden) through the grant "Wikispeech – Talsyntes och taldatainsamlare." (2024–2027).

Directories

Path Synopsis
cmd
Package cmd contains various command line tools for this repository
Package cmd contains various command line tools for this repository
decompounder
Package decompounder contains command line tools for parsing decomps for the Swedish NST lexicon.
Package decompounder contains command line tools for parsing decomps for the Swedish NST lexicon.
decompounder/lexfile2decomps command
TODO: Remove this file.
TODO: Remove this file.
lexio
Package lexio contains main command line tools for importing and exporting lexicon files to/from the database, and for converting some external file formats into a format readable by the pronlex package.
Package lexio contains main command line tools for importing and exporting lexicon files to/from the database, and for converting some external file formats into a format readable by the pronlex package.
lexio/convert
Package convert contains main command line tools for converting between some specific lexicon file formats
Package convert contains main command line tools for converting between some specific lexicon file formats
lexio/createEmptyDB command
createEmptyDB initialises an Sqlite3 relational database from the schema defining a lexicon database, but empty of data.
createEmptyDB initialises an Sqlite3 relational database from the schema defining a lexicon database, but empty of data.
lexio/createEmptyLexicon command
createEmptyLexicon initialises a database from the schema defining a lexicon database, but empty of data.
createEmptyLexicon initialises a database from the schema defining a lexicon database, but empty of data.
lexio/exportLex command
Command line tool for exporting lexicons from the database to a file.
Command line tool for exporting lexicons from the database to a file.
lexio/importLex command
Command line tool for importing lexicon files into a lexicon database.
Command line tool for importing lexicon files into a lexicon database.
lexio/importSql command
lexlookup command
test_validator command
Package dbapi contains code wrapped around an SQL(ite3) DB.
Package dbapi contains code wrapped around an SQL(ite3) DB.
Package lex is used for general 'container' classes such as entry, transcription, lemma, etc.
Package lex is used for general 'container' classes such as entry, transcription, lemma, etc.
Package line is used to define lexicon line formats for parsing input and printing output.
Package line is used to define lexicon line formats for parsing input and printing output.
Package validation is used to define entry validators and rules.
Package validation is used to define entry validators and rules.
locale
Package locale is meant to be used to validate a locale specified for a lexicon.
Package locale is meant to be used to validate a locale specified for a lexicon.
rules
Package rules contains a few general validation rule types.
Package rules contains a few general validation rule types.
validators
Package validators contains a validator service for caching loaded validators, and it contains language and project specific validators.
Package validators contains a validator service for caching loaded validators, and it contains language and project specific validators.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL