dictutil

module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2020 License: MIT

README

dictutil

This repository contains a collection of tools and libraries to work with Kobo dictionaries, plus comprehensive documentation of Kobo's dictionary format.

Unlike previous attempts at working with Kobo dictionaries, dictutil has full support for all features supported by nickel (word prefixes, unicode, variants, images, etc), with a focus on simplicity, correctness (prefix generation and other features are directly tested against libnickel's code and regexps, v1/v2 dictionaries are differentiated), and completeness (most of the research was done by reverse-engineering libnickel).

Dictutil consists of multiple tools and libraries:

  • dictutil provides commands for installing, removing, unpacking, packing, and performing low-level modifications and tests on Kobo dictionaries. All operations are intended to be correct, lossless, and deterministic.
  • dictgen simplifies creating full-featured dictionaries for Kobo eReaders, with support for images, unicode prefixes, raw html, markdown, and more.
  • dicthtml documents Kobo's dictionary format and how it works.
  • examples/gotdict-convert is a working example of using dictutil to convert GOTDict into a Kobo dictionary.
  • examples/webster1913-convert is a working example of using dictutil to convert Project Gutenberg's Webster's Unabridged Dictionary into a Kobo dictionary.
  • examples/dictzip-decompile is an experimental tool to convert a dictzip into a dictfile.
  • Library: kobodict provides support for reading, writing, encrypting, and decrypting Kobo dictionaries.
  • Library: dictgen provides the functionality of dictgen as a library.
  • Library: marisa provides self-contained CGO bindings for marisa-trie.

Dictutil implements version 2 of the Kobo dictionary format, which supports firmware versions 4.7.10364+.

For more information, see the documentation. If you just want a quick overview of the utilities provided, continue reading below.

Download

Usage

See the documentation for more detailed information and examples.

dictutil

Usage: dictutil command [options] [arguments]

Dictutil provides low-level utilities to manipulate Kobo dictionaries (v2).

Commands:
  install (I)          Install a dictzip file
  pack (p)             Pack a dictzip file
  prefix (x)           Calculate the prefix for a word
  uninstall (U)        Uninstall a dictzip file
  unpack (u)           Unpack a dictzip file
  help                 Show help for all commands

Options:
  -h, --help   Show this help text
Usage: dictutil install [options] dictzip

Options:
  -k, --kobo string      KOBOeReader path (default: automatically detected)
  -l, --locale string    Locale name to use (format: ALPHANUMERIC{2}; translation dictionaries are not supported) (default: detected from filename if in format dicthtml-**.zip)
  -n, --name string      Custom additional label for dictionary (ignored when replacing built-in dictionaries) (doesn't have any effect on 4.20.14601+)
  -b, --builtin string   How to handle built-in locales [replace = replace and prevent from syncing] [ignore = replace and leave syncing as-is] (default "replace")
  -h, --help             Show this help text

Note:
  If you are not replacing a built-in dictionary, the 'Enable searches on extra
  dictionaries patch' must be installed, or you will not be able to select
  your custom dictionary.
Usage: dictutil uninstall [options] locale

Options:
  -k, --kobo string      KOBOeReader path (default: automatically detected)
  -b, --builtin string   How to handle built-in locales [normal = uninstall the same way as the UI] [delete = completely delete the entry (doesn't have any effect on 4.20.14601+)] [restore = download the original dictionary from Kobo again] (default "normal")
  -h, --help             Show this help text
Usage: dictutil pack [options] dictdir

Options:
  -o, --output string   The output dictzip filename (will be overwritten if it exists) (default "dicthtml.zip")
  -c, --crypt string    Encrypt the dictzip using the specified encryption method (format: method:keyhex)
  -h, --help            Show this help text
Usage: dictutil unpack [options] dictzip

Options:
  -o, --output string   The output directory (must not exist) (default: the basename of the input without the extension)
  -c, --crypt string    Decrypt the dictzip (if needed) using the specified encryption method (format: method:keyhex)
  -h, --help            Show this help text
Usage: dictutil prefix [options] word...

Options:
  -f, --format string   The output format (go-slice, go-map, csv, tsv, json-array, json-object) (default "json-array")
  -h, --help            Show this help text

dictgen

Usage: dictgen [options] dictfile...

Options:
  -o, --output string         The output filename (will be overwritten if it exists) (- is stdout) (default "dicthtml.zip")
  -c, --crypt string          Encrypt the dictzip using the specified encryption method (format: method:keyhex)
  -I, --image-method string   How to handle images (if an image path is relative, it is loaded from the current dir) (base64 - optimize and encode as base64, embed - add to dictzip, remove) (default "base64")
  -h, --help                  Show this help text

If multiple dictfiles (*.df) are provided, they will be merged (duplicate entries are fine; they will be shown in sequential order). To read from stdin, use - as the filename.

Note that the only usable image method is currently removing them or using base64-encoding (for firmware 4.20.14601+; older versions segfault in the in-book dictionary), as embedded dict:/// image URLs cause the webviews to appear blank (this is a nickel bug). See https://github.com/geek1011/dictutil/issues/1 for more details.

See https://pgaskin.net/dictutil/dictgen for more information about the dictfile format.

See here for information and examples of the dictfile format.

gotdict-convert

Usage: gotdict-convert [options]

Options:
  -g, --gotdict string   The path to the local copy of github.com/wjdp/gotdict. (default "./gotdict")
  -o, --output string    The output filename (will be overwritten if it exists) (- is stdout) (default "./gotdict.df")
  -I, --images           Include images in dictfile
  -h, --help             Show this help text

To convert the resulting dictfile into a dictzip, use dictgen.

webster1913-convert

Usage: webster1913-convert [options] gutenberg_webster1913_path

Options:
  -o, --output string   The output filename (will be overwritten if it exists) (- is stdout) (default "./webster1913.df")
      --dump            Instead of converting, dump the parsed dictionary to stdout as JSON (for debugging)
  -h, --help            Show this help text

Arguments:
  gutenberg_webster1913_path is the path to Project Gutenberg's Webster's 1913 dictionary. Use - to read from stdin.

To convert the resulting dictfile into a dictzip, use dictgen.

The original dictionary can be downloaded here or here.

dictzip-decompile

Usage: dictzip-decompile [options] dictzip

Options:
  -o, --output string   The output filename (will be overwritten if it exists) (- is stdout) (default "./decompiled.df")
  -r, --resources       Also extract referenced resources to the current directory (warning: any existing files will be overwritten, so it is recommended to run in an empty directory if enabled)
  -h, --help            Show this help text

Arguments:
  dictzip is the path to the dictzip to decompile.

To convert the resulting dictfile into a dictzip, use dictgen.

Note: The regenerated dictzip from the dictfile may not match exactly, but it will look the same, and certain bugs with prefixes and variants will be implicitly fixed by the conversion process (i.e. variant in wrong file, incorrect prefix, missing words in index file). All output is in raw HTML, not Markdown.

This is an experimental tool, and the output may not be perfect on complex dictionaries.

Directories

Path Synopsis
cmd
dictgen
Command dictgen is a CLI wrapper around package dictgen.
Command dictgen is a CLI wrapper around package dictgen.
dictutil
Command dictutil provides commands for installing, removing, unpacking, packing, and performing low-level modifications and tests on Kobo dictionaries.
Command dictutil provides commands for installing, removing, unpacking, packing, and performing low-level modifications and tests on Kobo dictionaries.
Package dictgen simplifies creating full-featured dictionaries for Kobo eReaders, with support for images, unicode prefixes, raw html, markdown, and more.
Package dictgen simplifies creating full-featured dictionaries for Kobo eReaders, with support for images, unicode prefixes, raw html, markdown, and more.
examples
dictzip-decompile
Command dictzip-decompile converts a dictzip into a dictfile.
Command dictzip-decompile converts a dictzip into a dictfile.
gotdict-convert
Command gotdict-convert converts GOTDict (https://github.com/wjdp/gotdict) to a dictgen dictfile.
Command gotdict-convert converts GOTDict (https://github.com/wjdp/gotdict) to a dictgen dictfile.
gotdict-convert/gotdict
Package gotdict parses GOTDict (https://github.com/wjdp/gotdict).
Package gotdict parses GOTDict (https://github.com/wjdp/gotdict).
webster1913-convert
Command webster1913-convert converts Project Gutenberg's Webster's 1913 Unabridged Dictionary to a dictgen dictfile.
Command webster1913-convert converts Project Gutenberg's Webster's 1913 Unabridged Dictionary to a dictgen dictfile.
webster1913-convert/webster1913
Package webster1913 parses Project Gutenberg's Webster's 1913 Unabridged Dictionary (http://www.gutenberg.org/ebooks/29765.txt.utf-8).
Package webster1913 parses Project Gutenberg's Webster's 1913 Unabridged Dictionary (http://www.gutenberg.org/ebooks/29765.txt.utf-8).
Package kobodict implements reading, writing, and other utilities for Kobo dictionaries (v2).
Package kobodict implements reading, writing, and other utilities for Kobo dictionaries (v2).
Package marisa provides a self-contained SWIG wrapper for marisa-trie (https://github.com/s-yata/marisa-trie).
Package marisa provides a self-contained SWIG wrapper for marisa-trie (https://github.com/s-yata/marisa-trie).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL