Documentation ¶
Overview ¶
Package cedict provides a parser / tokenizer for reading entries from the CEDict Chinese dictionary project.
Tokenizing is done by creating a CEDict for an io.Reader r. It is the caller's responsibility to ensure that r provides a CEDict-formatted dictionary.
import "github.com/hermanschaaf/cedict" ... c := cedict.New(r) // r is an io.Reader to the cedict file
Given a CEDict c, the dictionary is tokenized by repeatedly calling c.NextEntry(), which parses until it reaches the next entry, or an error if no more entries are found:
for { err := c.NextEntry() if err != nil { break } entry := c.Entry() fmt.Println(entry.Simplified, entry.Definitions[0]) }
To retrieve the current entry, the Entry method can be called. There is also a lower-level API available, using the bufio.Scanner Scan method. Using this lower-level API is the recommended way to read comments from the CEDict, should that be necessary.
Index ¶
Examples ¶
Constants ¶
const ( EntryToken = iota CommentToken ErrorToken )
Variables ¶
var NoMoreEntries error = errors.New("No more entries to read")
Functions ¶
func ToPinyinTonemarks ¶
ToPinyinTonemarks takes a CEDICT pinyin representation and returns the concatenated pinyin version with tone marks, e.g., yi1 lan3 zi5 => yīlǎnzi. This function is useful for customizing pinyin conversion for your own application. For example, if you wish to get the tone pinyin of each character, you may pass in each section of the original word separately, as in yi1 => yī, lan3 => lǎn, zi5 => zi.
Types ¶
type CEDict ¶
CEDict is the basic tokenizer struct we use to read and parse new dictionary instances.
Example ¶
The following example demonstrates basic usage of the package. It uses a string.Reader as io.Reader, where you would normally use a file.Reader.
dict := `一層 一层 [yi1 ceng2] /layer/ 一攬子 一揽子 [yi1 lan3 zi5] /all-inclusive/undiscriminating/` r := io.Reader(strings.NewReader(dict)) c := New(r) for { err := c.NextEntry() if err != nil { // you may also compare the error to cedict.NoMoreEntries // to know whether the end was reached or some other problem // occurred. break } // get current entry entry := c.Entry() // print out some fields fmt.Printf("%s\t(%s)\t%s\n", entry.Simplified, entry.PinyinWithTones, entry.Definitions[0]) }
Output: 一层 (yīcéng) layer 一揽子 (yīlǎnzi) all-inclusive