mdx

package module
v0.1.15 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 24, 2026 License: MIT Imports: 36 Imported by: 0

README

MDX/MDD Parser for Go

CN This is a high-performance MDict (.mdx/.mdd) file parsing library written in Go. It supports querying dictionary content, retrieving dictionary metadata, and provides a file system wrapper compliant with the io/fs.FS interface, making it easy to integrate with other Go ecosystem libraries (such as HTTP servers).

This library was originally based on the terasum/medict project and has undergone extensive bug fixes, performance optimizations, and code refactoring.

Features

  • High-Performance Queries: Builds an in-memory exact-match index after BuildIndex() for stable fast lookups.
  • MDX/MDD Support: Supports both .mdx (text dictionaries) and .mdd (resource files) formats.
  • Standard Interface: Implements the io/fs.FS interface, allowing dictionaries to be easily served as a file system.
  • Robust Error Handling: Comprehensive error handling and logging.
  • Complete Metadata: Provides an API to access all dictionary metadata (such as title, description, creation date, etc.).

Installation

go get github.com/lib-x/mdx

(Note: Please replace github.com/lib-x/mdx with the actual repository path)

Usage Examples

Example 1: Querying an MDX Dictionary

Here is a simple example of how to load an MDX dictionary and query a word.

package main

import (
	"fmt"
	"log"

	"github.com/lib-x/mdx" // Assuming this is the module path
)

func main() {
	// 1. Create a new Mdict instance
	// Replace "path/to/your/dictionary.mdx" with your MDX file path
	mdict, err := mdx.New("path/to/your/dictionary.mdx")
	if err != nil {
		log.Fatalf("Failed to load dictionary file: %v", err)
	}

	// 2. Build the index (recommended to be done once at program startup)
	err = mdict.BuildIndex()
	if err != nil {
		log.Fatalf("Failed to build dictionary index: %v", err)
	}

	// 3. Print dictionary information
	fmt.Printf("Dictionary Title: %s\n", mdict.Title())
	fmt.Printf("Dictionary Description: %s\n", mdict.Description())

	// 4. Query a word
	word := "hello"
	definition, err := mdict.Lookup(word)
	if err != nil {
		log.Fatalf("Failed to look up word '%s': %v", word, err)
	}

	fmt.Printf("Definition of '%s':\n%s\n", word, string(definition))

	// 5. Query a non-existent word
	word = "nonexistentword"
	_, err = mdict.Lookup(word)
	if err != nil {
		fmt.Printf("As expected, an error occurred when querying a non-existent word '%s': %v\n", word, err)
	}
}
Example 1.1: Exporting an External Index

If you want to store the index in Redis, SQL, or another external system, export the index entries and store them yourself. Later, load one entry back and resolve it to the real definition.

package main

import (
	"fmt"
	"log"

	"github.com/lib-x/mdx"
)

func main() {
	dict, err := mdx.New("path/to/your/dictionary.mdx")
	if err != nil {
		log.Fatal(err)
	}
	if err := dict.BuildIndex(); err != nil {
		log.Fatal(err)
	}

	info := dict.DictionaryInfo()
	fmt.Printf("title=%s entries=%d\n", info.Title, info.EntryCount)

	entries, err := dict.ExportIndex()
	if err != nil {
		log.Fatal(err)
	}

	// Store entries in Redis / DB here.
	first := entries[0]

	content, err := dict.Resolve(first)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("resolved %q -> %d bytes\n", first.Keyword, len(content))
}
Example 1.2: Explicitly Splitting Dictionary Entries and Resource Entries
mdxDict, _ := mdx.New("path/to/dictionary.mdx")
_ = mdxDict.BuildIndex()
entries, _ := mdxDict.ExportEntries()

mddDict, _ := mdx.New("path/to/dictionary.mdd")
_ = mddDict.BuildIndex()
resources, _ := mddDict.ExportResources()

fmt.Println(len(entries), len(resources))
Example 1.3: Storing the Exported Index in an External Store

The library now exposes a minimal IndexStore boundary. You can implement it for Redis, SQL, or another backend. A small in-memory example is included:

For lifecycle-aware external indexing, use EnsureDictionaryIndex(...) with a ManagedIndexStore. It reuses unchanged indexes via a manifest/fingerprint check, supports a missing-source TTL, and lets Redis remain just one backend implementation:

store := mdx.NewRedisIndexStore(client)

result, err := mdx.EnsureDictionaryIndex(
    "/path/to/dictionary.mdx",
    store,
    mdx.WithReuseIfUnchanged(true),
    mdx.WithMissingSourceTTL(24*time.Hour),
)
if err != nil {
    log.Fatal(err)
}
fmt.Println(result.Reused, result.Rebuilt)

If you only need the export/resolve path without building in-memory exact lookup tables, call PrepareForExternalIndex() instead of BuildIndex().

store := mdx.NewMemoryIndexStore()

info := dict.DictionaryInfo()
entries, _ := dict.ExportEntries()
_ = store.Put(info, entries)

entry, _ := store.GetExact(info.Name, "ability")
content, _ := dict.Resolve(entry)
fmt.Println(len(content))
Example 2: Listing MDD Resource Files

MDD files typically contain resources like audio and images. The following example shows how to list all resources in an MDD file.

package main

import (
	"fmt"
	"log"

	"github.com/lib-x/mdx" // Assuming this is the module path
)

func main() {
	// 1. Load the MDD file
	// Replace "path/to/your/resource.mdd" with your MDD file path
	mdd, err := mdx.New("path/to/your/resource.mdd")
	if err != nil {
		log.Fatalf("Failed to load MDD file: %v", err)
	}

	// 2. Build the index
	err = mdd.BuildIndex()
	if err != nil {
		log.Fatalf("Failed to build MDD index: %v", err)
	}

	// 3. Get and print all keyword entries (in MDD, this is usually the file path)
	entries, err := mdd.GetKeyWordEntries()
	if err != nil {
		log.Fatalf("Failed to get keyword entries: %v", err)
	}

	fmt.Printf("Found %d resource files in '%s':\n", len(entries), mdd.Name())
	for i, entry := range entries {
		// Print only the first 10 as an example
		if i >= 10 {
			break
		}
		fmt.Println(entry.KeyWord) // The KeyWord field stores the resource file path
	}
}
Example 3: Extracting Resource References from MDX Content

MDX entries often contain references to CSS, JavaScript, image, and audio resources that are actually stored in the companion MDD file.

definition, err := mdict.Lookup("accordion")
if err != nil {
	log.Fatal(err)
}

refs := mdx.ExtractResourceRefs(definition)
for _, ref := range refs {
	fmt.Println(ref)
}

fmt.Println(mdx.NormalizeMDDKey("accordion_concertina.jpg"))
// Output: \accordion_concertina.jpg
Example 4: Serving MDX HTML and MDD Assets over Go HTTP
package main

import (
	"log"
	"net/http"

	"github.com/lib-x/mdx"
)

func main() {
	mdxDict, err := mdx.New("path/to/dictionary.mdx")
	if err != nil {
		log.Fatal(err)
	}
	if err := mdxDict.BuildIndex(); err != nil {
		log.Fatal(err)
	}

	mddDict, err := mdx.New("path/to/dictionary.mdd")
	if err != nil {
		log.Fatal(err)
	}
	if err := mddDict.BuildIndex(); err != nil {
		log.Fatal(err)
	}

	http.Handle("/assets/", http.StripPrefix("/assets/", mdx.NewAssetHandler(mddDict)))

	http.HandleFunc("/entry", func(w http.ResponseWriter, r *http.Request) {
		word := r.URL.Query().Get("word")
		content, err := mdx.LookupAndRewriteHTMLWithEntryBase(mdxDict, word, "/assets", "/entry?word=")
		if err != nil {
			http.Error(w, err.Error(), http.StatusNotFound)
			return
		}
		w.Header().Set("Content-Type", "text/html; charset=utf-8")
		_, _ = w.Write(content)
	})

	log.Fatal(http.ListenAndServe(":8080", nil))
}

LookupAndRewriteHTML rewrites resource references such as:

  • oalecd9.css -> /assets/oalecd9.css
  • thumb_accordion.jpg -> /assets/thumb_accordion.jpg
  • snd://ability__gb_1.spx -> /assets/snd:%2F%2Fability__gb_1.spx

LookupAndRewriteHTMLWithEntryBase additionally rewrites internal entry://word links into browser-servable lookup URLs such as /entry?word=word, normalizes malformed entry://entry://... links, and upgrades anchor-based sound:// / snd:// audio links into <audio controls ...> output.

NewAssetHandler now serves resolver-backed assets through http.ServeContent, which means browsers can make Range requests against large image/audio assets.

For callers that need explicit HTTP cache semantics, NewAssetHandlerWithOptions can customize Cache-Control and enable ETag / Last-Modified headers. Conditional requests using If-None-Match and If-Modified-Since are also supported through the same ServeContent-based path.

Note on audio playback: this library can resolve and serve raw audio resources from MDX/MDD files (including real sound://... / snd://... references), but browser playback still depends on whether the client can decode the underlying audio format. In particular, .spx (Speex) assets usually require transcoding or an application-level playback backend outside this core library.

A runnable demo is available at examples/http-server:

go run ./examples/http-server \
  --mdx /path/to/dictionary.mdx \
  --mdd /path/to/dictionary.mdd \
  --listen :8080

A Redis-backed variant is also available:

go run ./examples/http-server-redis \
  --mdx /path/to/dictionary.mdx \
  --mdd /path/to/dictionary.mdd \
  --redis 127.0.0.1:6379 \
  --listen :8080

Contributing

Issues and Pull Requests are welcome.

Local fixture tests

Real .mdx / .mdd fixtures are not stored in this repository.

Set MDX_TESTDICT_DIR to a local directory containing the external fixture pair, for example:

MDX_TESTDICT_DIR="/path/to/local/dictionary-dir" go test ./... -run "TestIntegration|TestMdict|TestMdictFS" -v

Without local fixtures, external integration tests will skip automatically.

License

This project is licensed under the GNU General Public License v3.0.

A first in-memory fuzzy reference implementation is now available as MemoryFuzzyIndexStore. It is suitable for tests and demos, while production fuzzy search should still live in an external store/service.

Dictionary Library

A multi-dictionary registry is available for directories containing many .mdx / .mdd pairs. The registry now auto-discovers companion resource chains such as demo.mdd, demo.1.mdd, demo.2.mdd, and composes resolver-backed sidecar-first resource lookup by default.

Core APIs:

  • ScanDirectory(root string)
  • DictionaryRegistry
  • OpenDictionary(id string)
  • LibrarySearch(query, limit)

Runnable example:

go run ./examples/http-library --root /path/to/dictionaries --listen :8080

Routes:

  • /dict/{id}/entry?word=...
  • /dict/{id}/assets/...
  • /library/search?q=...

Documentation

Index

Constants

View Source
const (
	// MdictTypeMdd indicates an MDD file.
	MdictTypeMdd MdictType = 1
	// MdictTypeMdx indicates an MDX file.
	MdictTypeMdx MdictType = 2

	// EncryptNoEnc indicates no encryption.
	EncryptNoEnc = 0
	// EncryptRecordEnc indicates record block encryption.
	EncryptRecordEnc = 1
	// EncryptKeyInfoEnc indicates key info block encryption.
	EncryptKeyInfoEnc = 2
	// NumfmtBe8bytesq represents big-endian 8-byte unsigned integer.
	NumfmtBe8bytesq = 0
	// NumfmtBe4bytesi represents big-endian 4-byte unsigned integer.
	NumfmtBe4bytesi = 1
	// EncodingUtf8 represents UTF-8 encoding.
	EncodingUtf8 = 0
	// EncodingUtf16 represents UTF-16 encoding.
	EncodingUtf16 = 1
	// EncodingBig5 represents Big5 encoding.
	EncodingBig5 = 2
	// EncodingGbk represents GBK encoding.
	EncodingGbk = 3
	// EncodingGb2312 represents GB2312 encoding.
	EncodingGb2312 = 4
	// EncodingGb18030 represents GB18030 encoding.
	EncodingGb18030 = 5
)

Variables

View Source
var ErrIndexMiss = errors.New("index entry not found")

ErrIndexMiss is returned when no entry exists in the external index store.

View Source
var ErrWordNotFound = errors.New("word not found")

ErrWordNotFound is returned when a word is not found in the dictionary.

Functions

func AssetLookupCandidates

func AssetLookupCandidates(ref string) []string

AssetLookupCandidates expands a raw asset reference into possible storage lookup candidates.

func AssetURL

func AssetURL(basePath, ref string) string

AssetURL returns a browser-safe URL for a raw asset reference.

func BuildRangeTree

func BuildRangeTree(list []*MdictRecordBlockInfoListItem, root *RecordBlockRangeTreeNode)

BuildRangeTree constructs a range tree from a list of record block info items. This tree allows for efficient querying of record blocks based on an offset.

func ConfigureDictionaryPairAssets added in v0.1.10

func ConfigureDictionaryPairAssets(spec DictionarySpec, mdxDict *Mdict, mddDicts ...*Mdict)

ConfigureDictionaryPairAssets composes the default shared asset resolver for a dictionary pair.

func ExtractResourceRefs

func ExtractResourceRefs(content []byte) []string

ExtractResourceRefs extracts resource-like references from MDX entry content.

func IsResourceRef

func IsResourceRef(ref string) bool

IsResourceRef reports whether the reference looks like an external asset instead of an internal entry link.

func LookupAndRewriteHTML

func LookupAndRewriteHTML(dict *Mdict, word, assetBasePath string) ([]byte, error)

LookupAndRewriteHTML looks up an MDX entry and rewrites its asset URLs for web delivery.

func LookupAndRewriteHTMLWithEntryBase added in v0.1.10

func LookupAndRewriteHTMLWithEntryBase(dict *Mdict, word, assetBasePath, entryBasePath string) ([]byte, error)

LookupAndRewriteHTMLWithEntryBase looks up an MDX entry and rewrites its asset and entry URLs for web delivery.

func NewAssetHandler

func NewAssetHandler(mdd *Mdict) http.Handler

NewAssetHandler creates an HTTP handler that serves MDD-backed assets by raw reference name.

func NewAssetHandlerWithOptions added in v0.1.10

func NewAssetHandlerWithOptions(mdd *Mdict, opts AssetHandlerOptions) http.Handler

NewAssetHandlerWithOptions creates an HTTP handler that serves MDD-backed assets with configurable HTTP semantics.

func NormalizeMDDKey

func NormalizeMDDKey(name string) string

NormalizeMDDKey normalizes MDD resource names to the dictionary-internal key format.

func RewriteEntryAudioLinks(content []byte, assetBasePath string) []byte

RewriteEntryAudioLinks rewrites anchor-based sound:// and snd:// links into HTML audio controls.

func RewriteEntryInternalLinks(content []byte) []byte

RewriteEntryInternalLinks normalizes malformed internal entry links commonly seen in MDict HTML.

func RewriteEntryLookupLinks(content []byte, entryBasePath string) []byte

RewriteEntryLookupLinks rewrites entry:// word links into browser-servable lookup URLs.

func RewriteEntryResourceURLs

func RewriteEntryResourceURLs(content []byte, assetBasePath string) []byte

RewriteEntryResourceURLs rewrites asset references inside MDX HTML into browser-servable URLs.

Types

type AssetHandlerOptions added in v0.1.10

type AssetHandlerOptions struct {
	CacheControl       string
	EnableETag         bool
	EnableLastModified bool
}

AssetHandlerOptions configures HTTP delivery behavior for resolver-backed assets.

type AssetResolver added in v0.1.10

type AssetResolver struct {
	// contains filtered or unexported fields
}

AssetResolver resolves dictionary assets from ordered sources and follows MDict-style redirects.

func NewAssetResolver added in v0.1.10

func NewAssetResolver(mdd *Mdict, opts ...AssetResolverOption) *AssetResolver

NewAssetResolver constructs an AssetResolver.

Ordered options are evaluated first. When mdd is non-nil it is appended after those sources, preserving GoldenDict-style sidecar-first precedence by default.

func (*AssetResolver) Open added in v0.1.10

func (r *AssetResolver) Open(ref string) (fs.File, error)

Open opens an asset by reference from the configured sources.

func (*AssetResolver) Read added in v0.1.10

func (r *AssetResolver) Read(ref string) ([]byte, error)

Read reads a resolved asset fully into memory.

type AssetResolverOption added in v0.1.10

type AssetResolverOption func(*AssetResolver)

AssetResolverOption configures an AssetResolver.

func WithAssetMdicts added in v0.1.10

func WithAssetMdicts(dicts ...*Mdict) AssetResolverOption

WithAssetMdicts appends ordered MDD-backed asset sources to the resolver.

func WithAssetSidecarDir added in v0.1.10

func WithAssetSidecarDir(dir string) AssetResolverOption

WithAssetSidecarDir appends an on-disk sidecar directory used as a fallback source for assets.

func WithAssetSidecarFS added in v0.1.10

func WithAssetSidecarFS(fsys fs.FS) AssetResolverOption

WithAssetSidecarFS appends a sidecar filesystem used as a fallback source for assets.

func WithAssetSource added in v0.1.10

func WithAssetSource(source AssetSource) AssetResolverOption

WithAssetSource appends an ordered asset source to the resolver.

type AssetSource added in v0.1.10

type AssetSource interface {
	ReadAsset(ref string) ([]byte, error)
}

AssetSource provides raw dictionary asset bytes for a logical reference.

type Dictionary

type Dictionary struct {
	XMLName                  xml.Name `xml:"Dictionary"`
	Text                     string   `xml:"chardata"`
	GeneratedByEngineVersion string   `xml:"GeneratedByEngineVersion,attr"`
	RequiredEngineVersion    string   `xml:"RequiredEngineVersion,attr"`
	Encrypted                string   `xml:"Encrypted,attr"`
	Encoding                 string   `xml:"Encoding,attr"`
	IsUTF16                  string   `xml:"IsUTF16,attr"`
	Format                   string   `xml:"Format,attr"`
	StripKey                 string   `xml:"StripKey,attr"`
	CreationDate             string   `xml:"CreationDate,attr"`
	Compact                  string   `xml:"Compact,attr"`
	Compat                   string   `xml:"Compat,attr"`
	KeyCaseSensitive         string   `xml:"KeyCaseSensitive,attr"`
	Description              string   `xml:"Description,attr"`
	Title                    string   `xml:"Title,attr"`
	DataSourceFormat         string   `xml:"DataSourceFormat,attr"`
	StyleSheet               string   `xml:"StyleSheet,attr"`
	Left2Right               string   `xml:"Left2Right,attr"`
	RegisterBy               string   `xml:"RegisterBy,attr"`
}

Dictionary was generated 2023-09-11 11:07:50 by https://xml-to-go.github.io/ in Ukraine.

type DictionaryInfo

type DictionaryInfo struct {
	Name                     string `json:"name"`
	Title                    string `json:"title"`
	Description              string `json:"description"`
	CreationDate             string `json:"creation_date"`
	GeneratedByEngineVersion string `json:"generated_by_engine_version"`
	Version                  string `json:"version"`
	IsMDD                    bool   `json:"is_mdd"`
	IsUTF16                  bool   `json:"is_utf16"`
	IsRecordEncrypted        bool   `json:"is_record_encrypted"`
	EntryCount               int64  `json:"entry_count"`
}

DictionaryInfo describes dictionary metadata suitable for external indexing.

type DictionaryRegistry

type DictionaryRegistry struct {
	// contains filtered or unexported fields
}

DictionaryRegistry manages multiple dictionary pairs discovered from disk.

func NewDictionaryRegistry

func NewDictionaryRegistry() *DictionaryRegistry

NewDictionaryRegistry creates an empty registry.

func (*DictionaryRegistry) LibrarySearch

func (r *DictionaryRegistry) LibrarySearch(query string, limit int) ([]LibrarySearchHit, error)

LibrarySearch performs a library-wide fuzzy search using the in-memory fuzzy store.

func (*DictionaryRegistry) List

func (r *DictionaryRegistry) List() []DictionarySpec

List returns the known dictionary specs.

func (*DictionaryRegistry) LoadDirectory

func (r *DictionaryRegistry) LoadDirectory(root string) error

LoadDirectory scans and loads specs from a root directory into the registry.

func (*DictionaryRegistry) LoadSpecs

func (r *DictionaryRegistry) LoadSpecs(specs []DictionarySpec)

LoadSpecs replaces the registry contents with the provided specs.

func (*DictionaryRegistry) OpenDictionary

func (r *DictionaryRegistry) OpenDictionary(id string) (*Mdict, *Mdict, error)

OpenDictionary opens a dictionary pair by ID, lazily building indexes as needed.

type DictionarySpec

type DictionarySpec struct {
	ID       string   `json:"id"`
	Name     string   `json:"name"`
	MDXPath  string   `json:"mdx_path"`
	MDDPath  string   `json:"mdd_path,omitzero"`
	MDDPaths []string `json:"mdd_paths,omitempty"`
}

DictionarySpec describes one discoverable dictionary pair.

func ScanDirectory

func ScanDirectory(root string) ([]DictionarySpec, error)

ScanDirectory scans a root directory for MDX/MDD pairs and returns their specs.

type EnsureIndexResult added in v0.1.12

type EnsureIndexResult struct {
	DictionaryName string        `json:"dictionary_name"`
	Reused         bool          `json:"reused"`
	Rebuilt        bool          `json:"rebuilt"`
	Manifest       IndexManifest `json:"manifest"`
}

EnsureIndexResult reports whether EnsureDictionaryIndex reused or rebuilt an index.

func EnsureDictionaryIndex added in v0.1.12

func EnsureDictionaryIndex(dictPath string, store ManagedIndexStore, opts ...IndexSyncOption) (*EnsureIndexResult, error)

EnsureDictionaryIndex ensures the external index for a dictionary is present and reusable.

type FileStatFingerprinter added in v0.1.12

type FileStatFingerprinter struct{}

FileStatFingerprinter fingerprints a dictionary source from cheap filesystem metadata.

func (FileStatFingerprinter) Fingerprint added in v0.1.12

func (FileStatFingerprinter) Fingerprint(path string) (string, error)

Fingerprint implements Fingerprinter.

type Fingerprinter added in v0.1.12

type Fingerprinter interface {
	Fingerprint(path string) (string, error)
}

Fingerprinter computes a stable fingerprint for a dictionary source.

func NewFileStatFingerprinter added in v0.1.12

func NewFileStatFingerprinter() Fingerprinter

NewFileStatFingerprinter returns the default filesystem-stat fingerprinter.

type FingerprinterFunc added in v0.1.12

type FingerprinterFunc func(path string) (string, error)

FingerprinterFunc adapts a function into a Fingerprinter.

func (FingerprinterFunc) Fingerprint added in v0.1.12

func (fn FingerprinterFunc) Fingerprint(path string) (string, error)

Fingerprint implements Fingerprinter.

type FuzzyIndexStore

type FuzzyIndexStore interface {
	Put(info DictionaryInfo, entries []IndexEntry) error
	Search(dictionaryName, query string, limit int) ([]SearchHit, error)
}

FuzzyIndexStore defines a fuzzy-search-capable external index boundary.

type IndexBuildLeaseStore added in v0.1.15

type IndexBuildLeaseStore interface {
	AcquireIndexBuildLease(dictionaryName string, ttl time.Duration) (release func() error, acquired bool, err error)
}

IndexBuildLeaseStore optionally coordinates index rebuild ownership across processes.

type IndexEntry

type IndexEntry struct {
	Keyword           string `json:"keyword"`
	NormalizedKeyword string `json:"normalized_keyword,omitempty"`
	RecordStartOffset int64  `json:"record_start_offset"`
	RecordEndOffset   int64  `json:"record_end_offset"`
	KeyBlockIdx       int64  `json:"key_block_idx"`
	IsResource        bool   `json:"is_resource"`
}

IndexEntry is the external-storage-friendly representation of a dictionary entry.

type IndexManifest added in v0.1.12

type IndexManifest struct {
	DictionaryName string     `json:"dictionary_name"`
	SourcePath     string     `json:"source_path,omitempty"`
	Fingerprint    string     `json:"fingerprint,omitempty"`
	SchemaVersion  string     `json:"schema_version,omitempty"`
	BuiltAt        time.Time  `json:"built_at"`
	ExpiresAt      *time.Time `json:"expires_at,omitempty"`
}

IndexManifest describes externally stored index metadata used for lifecycle decisions.

func BuildIndexManifest added in v0.1.12

func BuildIndexManifest(dictPath string, dictName string, opts ...IndexSyncOption) (IndexManifest, error)

BuildIndexManifest builds a lifecycle manifest for the supplied dictionary source.

type IndexStore

type IndexStore interface {
	Put(info DictionaryInfo, entries []IndexEntry) error
	GetExact(dictionaryName, keyword string) (IndexEntry, error)
	PrefixSearch(dictionaryName, prefix string, limit int) ([]IndexEntry, error)
}

IndexStore defines the minimal external index boundary for exact and prefix search.

type IndexSyncConfig added in v0.1.12

type IndexSyncConfig struct {
	ReuseIfUnchanged       bool
	MissingSourceTTL       time.Duration
	ForceRebuild           bool
	Fingerprinter          Fingerprinter
	Now                    func() time.Time
	SchemaVersion          string
	RebuildLeaseTTL        time.Duration
	RebuildLeasePollPeriod time.Duration
}

IndexSyncConfig controls external-index lifecycle behavior.

func DefaultIndexSyncConfig added in v0.1.12

func DefaultIndexSyncConfig() IndexSyncConfig

DefaultIndexSyncConfig returns the default lifecycle configuration.

func ResolveIndexSyncConfig added in v0.1.12

func ResolveIndexSyncConfig(opts ...IndexSyncOption) IndexSyncConfig

ResolveIndexSyncConfig applies options onto the default configuration.

type IndexSyncOption added in v0.1.12

type IndexSyncOption func(*IndexSyncConfig)

IndexSyncOption customizes IndexSyncConfig.

func WithClock added in v0.1.12

func WithClock(now func() time.Time) IndexSyncOption

WithClock overrides the clock used for lifecycle calculations.

func WithFingerprinter added in v0.1.12

func WithFingerprinter(fp Fingerprinter) IndexSyncOption

WithFingerprinter overrides the source fingerprint implementation.

func WithForceRebuild added in v0.1.12

func WithForceRebuild() IndexSyncOption

WithForceRebuild forces index regeneration even when a manifest matches.

func WithMissingSourceTTL added in v0.1.12

func WithMissingSourceTTL(ttl time.Duration) IndexSyncOption

WithMissingSourceTTL sets how long an index may be reused after its source disappears.

func WithRebuildLeasePollPeriod added in v0.1.15

func WithRebuildLeasePollPeriod(period time.Duration) IndexSyncOption

WithRebuildLeasePollPeriod controls how often waiters re-check manifests while another process rebuilds.

func WithRebuildLeaseTTL added in v0.1.15

func WithRebuildLeaseTTL(ttl time.Duration) IndexSyncOption

WithRebuildLeaseTTL controls cross-process rebuild lease duration for stores that support it.

func WithReuseIfUnchanged added in v0.1.12

func WithReuseIfUnchanged(enabled bool) IndexSyncOption

WithReuseIfUnchanged toggles manifest fingerprint reuse.

func WithSchemaVersion added in v0.1.12

func WithSchemaVersion(version string) IndexSyncOption

WithSchemaVersion overrides the lifecycle schema version.

type LibrarySearchHit

type LibrarySearchHit struct {
	DictID   string    `json:"dict_id"`
	DictName string    `json:"dict_name"`
	Hit      SearchHit `json:"hit"`
}

LibrarySearchHit describes a search result together with its dictionary source.

type MDictKeywordEntry

type MDictKeywordEntry struct {
	RecordStartOffset int64
	RecordEndOffset   int64
	KeyWord           string
	KeyBlockIdx       int64
}

MDictKeywordEntry represents a single keyword entry from the key block.

type MDictKeywordIndex

type MDictKeywordIndex struct {
	//encoding                            int
	//encryptType                         int
	KeywordEntry MDictKeywordEntry
	RecordBlock  MDictKeywordIndexRecordBlock
}

MDictKeywordIndex provides a detailed index for a keyword, linking it to its specific location within a record block.

type MDictKeywordIndexRecordBlock

type MDictKeywordIndexRecordBlock struct {
	DataStartOffset          int64
	CompressSize             int64
	DeCompressSize           int64
	KeyWordPartStartOffset   int64
	KeyWordPartDataEndOffset int64
}

MDictKeywordIndexRecordBlock contains information about the record block where a specific keyword's definition is stored.

type ManagedIndexStore added in v0.1.12

type ManagedIndexStore interface {
	IndexStore
	LoadManifest(dictionaryName string) (IndexManifest, error)
	SaveManifest(manifest IndexManifest) error
	DeleteDictionary(dictionaryName string) error
}

ManagedIndexStore extends IndexStore with lifecycle metadata operations.

type Mdict

type Mdict struct {
	*MdictBase
	// contains filtered or unexported fields
}

Mdict is a high-level wrapper for mdx/mdd dictionary files. It embeds MdictBase to handle the underlying parsing logic and provides a user-facing API.

func New

func New(filename string) (*Mdict, error)

New creates a new Mdict instance. It automatically determines the dictionary type based on the file extension (.mdx or .mdd).

func (*Mdict) AssetResolver added in v0.1.10

func (mdict *Mdict) AssetResolver() *AssetResolver

AssetResolver returns the shared asset resolver for this dictionary.

func (*Mdict) BuildIndex

func (mdict *Mdict) BuildIndex() error

BuildIndex builds the complete dictionary index. This process can consume significant memory and time as it needs to read all keyword and record block information. It is recommended to call this once during program initialization.

func (*Mdict) CreationDate

func (mdict *Mdict) CreationDate() string

CreationDate returns the creation date of the dictionary.

func (*Mdict) Description

func (mdict *Mdict) Description() string

Description returns the description of the dictionary.

func (*Mdict) DictionaryInfo

func (mdict *Mdict) DictionaryInfo() DictionaryInfo

DictionaryInfo returns exported metadata for the current dictionary.

func (*Mdict) Digest

func (mdict *Mdict) Digest() string

Digest returns a string containing a summary of the Mdict object's metadata and internal structures. This is useful for debugging and understanding the contents of an MDX/MDD file.

func (*Mdict) ExportEntries

func (mdict *Mdict) ExportEntries() ([]IndexEntry, error)

ExportEntries exports only MDX-style text entries.

func (*Mdict) ExportIndex

func (mdict *Mdict) ExportIndex() ([]IndexEntry, error)

ExportIndex exports the in-memory dictionary index into storage-friendly entries.

func (*Mdict) ExportResources

func (mdict *Mdict) ExportResources() ([]IndexEntry, error)

ExportResources exports only MDD-style resource entries.

func (*Mdict) FindComparableEntry added in v0.1.8

func (mdict *Mdict) FindComparableEntry(word string) (*MDictKeywordEntry, bool)

FindComparableEntry returns the normalized comparable keyword entry for the supplied word.

func (*Mdict) FindExactEntry added in v0.1.8

func (mdict *Mdict) FindExactEntry(word string) (*MDictKeywordEntry, bool)

FindExactEntry returns the exact keyword entry for the supplied word.

func (*Mdict) GeneratedByEngineVersion

func (mdict *Mdict) GeneratedByEngineVersion() string

GeneratedByEngineVersion returns the engine version that generated the dictionary.

func (*Mdict) GetKeyWordEntries

func (mdict *Mdict) GetKeyWordEntries() ([]*MDictKeywordEntry, error)

GetKeyWordEntries returns all keyword entries in the dictionary.

func (*Mdict) GetKeyWordEntriesSize

func (mdict *Mdict) GetKeyWordEntriesSize() int64

GetKeyWordEntriesSize returns the total number of keyword entries in the dictionary.

func (*Mdict) IsMDD

func (mdict *Mdict) IsMDD() bool

IsMDD checks if the dictionary is an MDD file.

func (*Mdict) IsRecordEncrypted

func (mdict *Mdict) IsRecordEncrypted() bool

IsRecordEncrypted checks if the dictionary's record blocks are encrypted.

func (*Mdict) IsUTF16

func (mdict *Mdict) IsUTF16() bool

IsUTF16 checks if the dictionary's encoding is UTF-16.

func (*Mdict) KeywordEntryToIndex

func (mdict *Mdict) KeywordEntryToIndex(item *MDictKeywordEntry) (*MDictKeywordIndex, error)

KeywordEntryToIndex converts a keyword entry to a more detailed keyword index.

func (*Mdict) LocateByKeywordEntry

func (mdict *Mdict) LocateByKeywordEntry(entry *MDictKeywordEntry) ([]byte, error)

LocateByKeywordEntry locates and returns the definition by keyword entry.

func (*Mdict) LocateByKeywordIndex

func (mdict *Mdict) LocateByKeywordIndex(index *MDictKeywordIndex) ([]byte, error)

LocateByKeywordIndex locates and returns the definition by keyword index.

func (*Mdict) Lookup

func (mdict *Mdict) Lookup(word string) ([]byte, error)

Lookup finds the definition for a given word.

func (*Mdict) Name

func (mdict *Mdict) Name() string

Name returns the name of the dictionary, usually the filename without the extension.

func (*Mdict) PrepareForExternalIndex added in v0.1.12

func (mdict *Mdict) PrepareForExternalIndex() error

PrepareForExternalIndex loads the minimum structures needed for ExportIndex and Resolve. It avoids building the in-memory exact/comparable lookup tables used by BuildIndex.

func (*Mdict) Resolve

func (mdict *Mdict) Resolve(entry IndexEntry) ([]byte, error)

Resolve resolves previously exported index data back into dictionary content.

func (*Mdict) ResolveEntry added in v0.1.8

func (mdict *Mdict) ResolveEntry(entry *MDictKeywordEntry) ([]byte, error)

ResolveEntry resolves a keyword entry into dictionary content bytes.

func (*Mdict) SetAssetResolver added in v0.1.10

func (mdict *Mdict) SetAssetResolver(resolver *AssetResolver)

SetAssetResolver overrides the shared asset resolver for this dictionary.

func (*Mdict) Title

func (mdict *Mdict) Title() string

Title returns the title of the dictionary.

func (*Mdict) Version

func (mdict *Mdict) Version() string

Version returns the version number of the dictionary.

type MdictAccessor

type MdictAccessor struct {
	Filepath          string `json:"filepath"`
	IsRecordEncrypted bool   `json:"is_record_encrypted"`
	IsMDD             bool   `json:"is_mdd"`
	IsUTF16           bool   `json:"is_utf_16"`
}

MdictAccessor provides a simplified interface for accessing Mdict data, suitable for serialization and remote access.

func NewAccessor

func NewAccessor(mdict *Mdict) *MdictAccessor

NewAccessor creates a new MdictAccessor from an Mdict instance.

func NewAccessorFromJSON

func NewAccessorFromJSON(data []byte) (*MdictAccessor, error)

NewAccessorFromJSON creates a new MdictAccessor from a JSON byte slice.

func (*MdictAccessor) RetrieveDefByIndex

func (mdi *MdictAccessor) RetrieveDefByIndex(index *MDictKeywordIndex) ([]byte, error)

RetrieveDefByIndex retrieves a definition by its keyword index.

func (*MdictAccessor) Serialize

func (mdi *MdictAccessor) Serialize() ([]byte, error)

Serialize converts the MdictAccessor to its JSON representation.

type MdictBase

type MdictBase struct {
	// contains filtered or unexported fields
}

MdictBase is the base structure for handling MDict file parsing. It contains all the necessary metadata and data structures read from the file.

func (*MdictBase) GetKeyWordEntries

func (mdict *MdictBase) GetKeyWordEntries() ([]*MDictKeywordEntry, error)

GetKeyWordEntries returns all keyword entries in the dictionary.

type MdictFS

type MdictFS struct {
	// contains filtered or unexported fields
}

MdictFS wraps an Mdict instance to implement the io/fs.FS interface. This allows an MDX/MDD file to be accessed like a regular file system, for example, for an HTTP file server.

func NewMdictFS

func NewMdictFS(mdict *Mdict) *MdictFS

NewMdictFS creates a new MdictFS instance.

func (*MdictFS) Open

func (mfs *MdictFS) Open(name string) (fs.File, error)

Open opens a file (a keyword or an MDD resource).

type MdictFile

type MdictFile struct {
	// contains filtered or unexported fields
}

MdictFile implements the fs.File interface.

func (*MdictFile) Close

func (mf *MdictFile) Close() error

Close closes the file.

func (*MdictFile) Read

func (mf *MdictFile) Read(b []byte) (int, error)

Read reads up to len(b) bytes from the file.

func (*MdictFile) ReadDir

func (mf *MdictFile) ReadDir(n int) ([]fs.DirEntry, error)

ReadDir reads the contents of the directory.

func (*MdictFile) Seek

func (mf *MdictFile) Seek(offset int64, whence int) (int64, error)

Seek sets the offset for the next Read or Write on the file.

func (*MdictFile) Stat

func (mf *MdictFile) Stat() (fs.FileInfo, error)

Stat returns the FileInfo for the file.

type MdictFileInfo

type MdictFileInfo struct {
	// contains filtered or unexported fields
}

MdictFileInfo implements the fs.FileInfo interface.

func (*MdictFileInfo) Info

func (mfi *MdictFileInfo) Info() (fs.FileInfo, error)

Info returns the FileInfo for the file.

func (*MdictFileInfo) IsDir

func (mfi *MdictFileInfo) IsDir() bool

IsDir reports whether mfi describes a directory.

func (*MdictFileInfo) ModTime

func (mfi *MdictFileInfo) ModTime() time.Time

ModTime returns the modification time.

func (*MdictFileInfo) Mode

func (mfi *MdictFileInfo) Mode() fs.FileMode

Mode returns the file mode bits.

func (*MdictFileInfo) Name

func (mfi *MdictFileInfo) Name() string

Name returns the base name of the file.

func (*MdictFileInfo) Size

func (mfi *MdictFileInfo) Size() int64

Size returns the length in bytes for regular files.

func (*MdictFileInfo) Sys

func (mfi *MdictFileInfo) Sys() interface{}

Sys returns underlying data source (can be nil).

func (*MdictFileInfo) Type

func (mfi *MdictFileInfo) Type() fs.FileMode

Type returns the file's type.

type MdictRecordBlockInfoListItem

type MdictRecordBlockInfoListItem struct {
	// contains filtered or unexported fields
}

MdictRecordBlockInfoListItem holds information about a single record block.

func QueryRangeData

func QueryRangeData(root *RecordBlockRangeTreeNode, queryRange int64) *MdictRecordBlockInfoListItem

QueryRangeData queries the range tree to find the record block info item that contains the given queryRange offset.

type MdictType

type MdictType int

MdictType represents the type of the dictionary file (MDX or MDD).

type MemoryFuzzyIndexStore

type MemoryFuzzyIndexStore struct {
	// contains filtered or unexported fields
}

MemoryFuzzyIndexStore is a small in-memory reference implementation of FuzzyIndexStore.

func NewMemoryFuzzyIndexStore

func NewMemoryFuzzyIndexStore() *MemoryFuzzyIndexStore

NewMemoryFuzzyIndexStore creates a new in-memory fuzzy store.

func (*MemoryFuzzyIndexStore) Put

func (s *MemoryFuzzyIndexStore) Put(info DictionaryInfo, entries []IndexEntry) error

Put stores dictionary metadata and entries.

func (*MemoryFuzzyIndexStore) Search

func (s *MemoryFuzzyIndexStore) Search(dictionaryName, query string, limit int) ([]SearchHit, error)

Search performs a simple in-memory fuzzy search suitable for demos and tests.

type MemoryIndexStore

type MemoryIndexStore struct {
	// contains filtered or unexported fields
}

MemoryIndexStore is a small in-memory reference implementation of ManagedIndexStore.

func NewMemoryIndexStore

func NewMemoryIndexStore() *MemoryIndexStore

NewMemoryIndexStore creates a new in-memory store.

func (*MemoryIndexStore) DeleteDictionary added in v0.1.12

func (s *MemoryIndexStore) DeleteDictionary(dictionaryName string) error

DeleteDictionary removes one dictionary's entries and manifest.

func (*MemoryIndexStore) GetExact

func (s *MemoryIndexStore) GetExact(dictionaryName, keyword string) (IndexEntry, error)

GetExact retrieves a single exact-match entry.

func (*MemoryIndexStore) LoadManifest added in v0.1.12

func (s *MemoryIndexStore) LoadManifest(dictionaryName string) (IndexManifest, error)

LoadManifest returns lifecycle metadata for one dictionary.

func (*MemoryIndexStore) PrefixSearch

func (s *MemoryIndexStore) PrefixSearch(dictionaryName, prefix string, limit int) ([]IndexEntry, error)

PrefixSearch returns entries that start with the supplied prefix.

func (*MemoryIndexStore) Put

func (s *MemoryIndexStore) Put(info DictionaryInfo, entries []IndexEntry) error

Put stores dictionary metadata and entries.

func (*MemoryIndexStore) SaveManifest added in v0.1.12

func (s *MemoryIndexStore) SaveManifest(manifest IndexManifest) error

SaveManifest stores lifecycle metadata for one dictionary.

type RecordBlockRangeTreeNode

type RecordBlockRangeTreeNode struct {
	// contains filtered or unexported fields
}

RecordBlockRangeTreeNode represents a node in the record block range tree. This tree is used to efficiently find the record block corresponding to a given offset.

type RedisIndexStore

type RedisIndexStore struct {
	// contains filtered or unexported fields
}

RedisIndexStore is a Redis-backed reference implementation of ManagedIndexStore.

func NewRedisIndexStore

func NewRedisIndexStore(client *redis.Client, opts ...RedisIndexStoreOption) *RedisIndexStore

NewRedisIndexStore creates a Redis-backed store.

func (*RedisIndexStore) AcquireIndexBuildLease added in v0.1.15

func (s *RedisIndexStore) AcquireIndexBuildLease(dictionaryName string, ttl time.Duration) (func() error, bool, error)

AcquireIndexBuildLease coordinates dictionary index rebuild ownership across processes.

func (*RedisIndexStore) DeleteDictionary added in v0.1.12

func (s *RedisIndexStore) DeleteDictionary(dictionaryName string) error

DeleteDictionary removes one dictionary's entries and manifest.

func (*RedisIndexStore) GetExact

func (s *RedisIndexStore) GetExact(dictionaryName, keyword string) (IndexEntry, error)

GetExact returns one exact entry from Redis.

func (*RedisIndexStore) LoadManifest added in v0.1.12

func (s *RedisIndexStore) LoadManifest(dictionaryName string) (IndexManifest, error)

LoadManifest returns lifecycle metadata for one dictionary.

func (*RedisIndexStore) PrefixSearch

func (s *RedisIndexStore) PrefixSearch(dictionaryName, prefix string, limit int) ([]IndexEntry, error)

PrefixSearch returns entries that share the supplied prefix.

func (*RedisIndexStore) Put

func (s *RedisIndexStore) Put(info DictionaryInfo, entries []IndexEntry) error

Put stores dictionary metadata and index entries in Redis.

func (*RedisIndexStore) SaveManifest added in v0.1.12

func (s *RedisIndexStore) SaveManifest(manifest IndexManifest) error

SaveManifest stores lifecycle metadata for one dictionary.

type RedisIndexStoreOption

type RedisIndexStoreOption func(*RedisIndexStore)

RedisIndexStoreOption customizes RedisIndexStore construction.

func WithRedisIndexContext

func WithRedisIndexContext(ctx context.Context) RedisIndexStoreOption

WithRedisIndexContext overrides the store context.

func WithRedisKeyPrefix

func WithRedisKeyPrefix(prefix string) RedisIndexStoreOption

WithRedisKeyPrefix overrides the Redis key namespace prefix.

func WithRedisPrefixIndexMaxLen

func WithRedisPrefixIndexMaxLen(maxLen int) RedisIndexStoreOption

WithRedisPrefixIndexMaxLen overrides the maximum stored prefix length.

type SearchHit

type SearchHit struct {
	Entry  IndexEntry `json:"entry"`
	Score  float64    `json:"score"`
	Source string     `json:"source"`
}

SearchHit represents a ranked search result from a fuzzy-capable store.

Directories

Path Synopsis
examples
http-library command
http-server command
redis-index command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL