zim

package
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package zim reads and writes the ZIM offline-archive format, the open single-file container that Kiwix uses to ship offline content. kage uses it to pack a cloned mirror into one indexable, compressed file that a reader can random-access without unpacking.

The package is pure: no network, no clock, no global state beyond a lazily built zstd codec. A ZIM file is laid out as a fixed header, a MIME-type list, three pointer lists (URL, title, cluster), a run of directory entries, a run of clusters that hold the content, and a trailing MD5. Every cross-reference is an absolute file position recorded in the header, so the writer assigns positions in one pass and emits bytes in a second. All integers are little-endian.

We write the new namespace scheme (minor version 1): all content lives under the single 'C' namespace, metadata under 'M', and a 'W/mainPage' redirect points at the entry point. Reading handles redirects and both offset widths.

Index

Constants

View Source
const (
	NamespaceContent   byte = 'C' // pages and assets
	NamespaceMetadata  byte = 'M' // M/Title, M/Date, ...
	NamespaceWellKnown byte = 'W' // W/mainPage redirect
)

Namespaces in the new (minor version 1) scheme.

View Source
const Magic uint32 = 0x44D495A // 72173914

Magic is the ZIM header magic number, the first four bytes of every file.

Variables

View Source
var ErrNotFound = errors.New("zim: not found")

ErrNotFound is returned by Get when no entry matches the namespace and url. Callers (such as the HTTP handler) test for it with errors.Is to map a miss to a 404.

Functions

func Compress added in v0.3.0

func Compress(p []byte) []byte

Compress zstd-compresses p with the exact codec the writer uses for its clusters. It is exported so a caller can cache cluster compression across packs and feed the result back through Writer.SetCompress; a cached cluster is then byte-for-byte what a fresh compression would have produced.

Types

type Blob

type Blob struct {
	Namespace byte
	URL       string
	Title     string
	MimeType  string
	Data      []byte
}

Blob is the result of a lookup: the resolved entry's bytes and metadata.

type Entry added in v0.3.0

type Entry struct {
	Namespace         byte
	URL               string
	Title             string
	MimeType          string
	Redirect          bool
	RedirectNamespace byte
	RedirectURL       string
	Data              []byte
}

Entry is one directory entry as stored, returned by EntryAt. A redirect keeps Data nil and names its target in RedirectNamespace/RedirectURL; any other entry carries its bytes in Data and its type in MimeType. Unlike Get, EntryAt does not follow redirects, so a caller can round-trip every entry, the redirects included.

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader provides random access to a ZIM file's entries. Open one with Open or NewReader, then look entries up by namespace and url, or fetch the main page. Decompressed clusters are cached so repeated reads from one cluster are cheap.

func NewReader

func NewReader(ra io.ReaderAt, size int64) (*Reader, error)

NewReader reads the header and MIME list from ra, which must hold size bytes.

func Open

func Open(path string) (*Reader, error)

Open opens a ZIM file on disk. Close the returned reader when done.

func (*Reader) Close

func (r *Reader) Close() error

Close releases the underlying file, if Open created one.

func (*Reader) Count

func (r *Reader) Count() uint32

Count returns the number of directory entries.

func (*Reader) EntryAt added in v0.3.0

func (r *Reader) EntryAt(idx uint32) (Entry, error)

EntryAt returns the directory entry at idx, where 0 <= idx < Count, in the archive's URL order. It is the iteration counterpart to Get: it exposes every entry exactly as stored, including metadata and redirects, which is what an exporter needs to reproduce the archive.

func (*Reader) Get

func (r *Reader) Get(namespace byte, url string) (Blob, error)

Get resolves the entry at (namespace, url), following one or more redirects.

func (*Reader) MainPage

func (r *Reader) MainPage() (Blob, error)

MainPage returns the archive's entry point, or an error if none is set.

func (*Reader) MainPageRef added in v0.3.0

func (r *Reader) MainPageRef() (byte, string, bool)

MainPageRef returns the namespace and url of the archive's entry point and whether one is set, so an exporter can record which entry is the main page without following the W/mainPage redirect.

func (*Reader) MimeTypes

func (r *Reader) MimeTypes() []string

MimeTypes returns the archive's MIME-type list.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer accumulates entries and serialises them as a ZIM file. Build it with NewWriter, add content/redirects/metadata, optionally set a main page, then call WriteTo. The writer holds entries in memory; a kage mirror comfortably fits, and packing is a one-shot batch job.

func NewWriter

func NewWriter() *Writer

NewWriter returns an empty Writer.

func (*Writer) AddContent

func (w *Writer) AddContent(namespace byte, url, title, mime string, data []byte)

AddContent adds a content entry. A later add with the same namespace and url replaces the earlier one. An empty title defaults to the url.

func (*Writer) AddMetadata

func (w *Writer) AddMetadata(name, value string)

AddMetadata adds an 'M' namespace text entry, e.g. AddMetadata("Title", "...").

func (*Writer) AddMetadataBytes added in v0.2.1

func (w *Writer) AddMetadataBytes(name, mime string, data []byte)

AddMetadataBytes adds an 'M' namespace entry with an explicit MIME, for binary metadata such as Illustrator_48x48@1, the 48x48 PNG favicon Kiwix shows as the archive's icon.

func (*Writer) AddRedirect

func (w *Writer) AddRedirect(namespace byte, url, title string, targetNamespace byte, targetURL string)

AddRedirect adds a redirect from (namespace,url) to (targetNamespace,targetURL).

func (*Writer) SetCompress added in v0.3.0

func (w *Writer) SetCompress(f func([]byte) []byte)

SetCompress replaces the cluster compressor. The function must return zstd-compressed bytes (the writer marks those clusters as zstd), so a caching wrapper can short-circuit unchanged clusters while still producing a valid, byte-identical archive. A nil function restores the built-in codec.

func (*Writer) SetMainPage

func (w *Writer) SetMainPage(namespace byte, url string)

SetMainPage marks an entry as the archive's entry point.

func (*Writer) SetNoCompress

func (w *Writer) SetNoCompress(v bool)

SetNoCompress stores every cluster uncompressed. Useful when the input is already compressed or when a reader without zstd must open the file.

func (*Writer) WriteTo

func (w *Writer) WriteTo(out io.Writer) (int64, error)

WriteTo serialises the archive to out and returns the number of bytes written.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL