Documentation
¶
Overview ¶
Package zim reads and writes the ZIM offline-archive format, the open single-file container that Kiwix uses to ship offline content. kage uses it to pack a cloned mirror into one indexable, compressed file that a reader can random-access without unpacking.
The package is pure: no network, no clock, no global state beyond a lazily built zstd codec. A ZIM file is laid out as a fixed header, a MIME-type list, three pointer lists (URL, title, cluster), a run of directory entries, a run of clusters that hold the content, and a trailing MD5. Every cross-reference is an absolute file position recorded in the header, so the writer assigns positions in one pass and emits bytes in a second. All integers are little-endian.
We write the new namespace scheme (minor version 1): all content lives under the single 'C' namespace, metadata under 'M', and a 'W/mainPage' redirect points at the entry point. Reading handles redirects and both offset widths.
Index ¶
- Constants
- Variables
- func Compress(p []byte) []byte
- type Blob
- type Entry
- type Reader
- func (r *Reader) Close() error
- func (r *Reader) Count() uint32
- func (r *Reader) EntryAt(idx uint32) (Entry, error)
- func (r *Reader) Get(namespace byte, url string) (Blob, error)
- func (r *Reader) MainPage() (Blob, error)
- func (r *Reader) MainPageRef() (byte, string, bool)
- func (r *Reader) MimeTypes() []string
- type Writer
- func (w *Writer) AddContent(namespace byte, url, title, mime string, data []byte)
- func (w *Writer) AddMetadata(name, value string)
- func (w *Writer) AddMetadataBytes(name, mime string, data []byte)
- func (w *Writer) AddRedirect(namespace byte, url, title string, targetNamespace byte, targetURL string)
- func (w *Writer) SetCompress(f func([]byte) []byte)
- func (w *Writer) SetMainPage(namespace byte, url string)
- func (w *Writer) SetNoCompress(v bool)
- func (w *Writer) WriteTo(out io.Writer) (int64, error)
Constants ¶
const ( NamespaceContent byte = 'C' // pages and assets NamespaceMetadata byte = 'M' // M/Title, M/Date, ... NamespaceWellKnown byte = 'W' // W/mainPage redirect )
Namespaces in the new (minor version 1) scheme.
const Magic uint32 = 0x44D495A // 72173914
Magic is the ZIM header magic number, the first four bytes of every file.
Variables ¶
var ErrNotFound = errors.New("zim: not found")
ErrNotFound is returned by Get when no entry matches the namespace and url. Callers (such as the HTTP handler) test for it with errors.Is to map a miss to a 404.
Functions ¶
func Compress ¶ added in v0.3.0
Compress zstd-compresses p with the exact codec the writer uses for its clusters. It is exported so a caller can cache cluster compression across packs and feed the result back through Writer.SetCompress; a cached cluster is then byte-for-byte what a fresh compression would have produced.
Types ¶
type Entry ¶ added in v0.3.0
type Entry struct {
Namespace byte
URL string
Title string
MimeType string
Redirect bool
RedirectNamespace byte
RedirectURL string
Data []byte
}
Entry is one directory entry as stored, returned by EntryAt. A redirect keeps Data nil and names its target in RedirectNamespace/RedirectURL; any other entry carries its bytes in Data and its type in MimeType. Unlike Get, EntryAt does not follow redirects, so a caller can round-trip every entry, the redirects included.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader provides random access to a ZIM file's entries. Open one with Open or NewReader, then look entries up by namespace and url, or fetch the main page. Decompressed clusters are cached so repeated reads from one cluster are cheap.
func (*Reader) EntryAt ¶ added in v0.3.0
EntryAt returns the directory entry at idx, where 0 <= idx < Count, in the archive's URL order. It is the iteration counterpart to Get: it exposes every entry exactly as stored, including metadata and redirects, which is what an exporter needs to reproduce the archive.
func (*Reader) MainPageRef ¶ added in v0.3.0
MainPageRef returns the namespace and url of the archive's entry point and whether one is set, so an exporter can record which entry is the main page without following the W/mainPage redirect.
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer accumulates entries and serialises them as a ZIM file. Build it with NewWriter, add content/redirects/metadata, optionally set a main page, then call WriteTo. The writer holds entries in memory; a kage mirror comfortably fits, and packing is a one-shot batch job.
func (*Writer) AddContent ¶
AddContent adds a content entry. A later add with the same namespace and url replaces the earlier one. An empty title defaults to the url.
func (*Writer) AddMetadata ¶
AddMetadata adds an 'M' namespace text entry, e.g. AddMetadata("Title", "...").
func (*Writer) AddMetadataBytes ¶ added in v0.2.1
AddMetadataBytes adds an 'M' namespace entry with an explicit MIME, for binary metadata such as Illustrator_48x48@1, the 48x48 PNG favicon Kiwix shows as the archive's icon.
func (*Writer) AddRedirect ¶
func (w *Writer) AddRedirect(namespace byte, url, title string, targetNamespace byte, targetURL string)
AddRedirect adds a redirect from (namespace,url) to (targetNamespace,targetURL).
func (*Writer) SetCompress ¶ added in v0.3.0
SetCompress replaces the cluster compressor. The function must return zstd-compressed bytes (the writer marks those clusters as zstd), so a caching wrapper can short-circuit unchanged clusters while still producing a valid, byte-identical archive. A nil function restores the built-in codec.
func (*Writer) SetMainPage ¶
SetMainPage marks an entry as the archive's entry point.
func (*Writer) SetNoCompress ¶
SetNoCompress stores every cluster uncompressed. Useful when the input is already compressed or when a reader without zstd must open the file.