zonnx

command module

v0.9.0 Latest Latest Go to latest Published: Mar 26, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zonnx

Links

Open Source Insights

README ¶

zonnx

A standalone command-line tool for converting machine learning models to GGUF format. Supports ONNX and SafeTensors inputs, with built-in HuggingFace Hub integration for downloading models.

Features

ONNX / SafeTensors → GGUF conversion: Produce portable GGUF files compatible with the zerfoo runtime and llama.cpp.
Model inspection: Introspect model metadata, IOs, nodes and tensor stats for ONNX and GGUF files. JSON output with --pretty planned.
HuggingFace integration: Download ONNX models and tokenizer files in one step.
Post-conversion quantization: Quantize weights to Q4_0 or Q8_0 during conversion.
CGO-free builds: Ships as a single static binary. Easy to distribute and run in minimal containers.
Architecture-aware mappings: Tensor name and metadata mappings tuned per model family.

Supported Models

zonnx maps tensor names and metadata to GGUF conventions for each architecture family. The --arch flag selects the mapping.

Architecture	`--arch` value	Input Formats	Tensor Mapping	Notes
Llama	`llama` (default)	ONNX	Decoder layers (`model.layers.N.*`)	Llama 3, Code Llama, etc.
Gemma	`gemma`	ONNX	Decoder layers (`model.layers.N.*`)	Gemma, Gemma 2, Gemma 3
BERT	`bert`	ONNX, SafeTensors	Encoder layers (`bert.encoder.layer.N.*`)	Classification, embeddings, pooler
RoBERTa	`roberta`	ONNX, SafeTensors	Encoder layers (`roberta.encoder.layer.N.*`)	Same layer structure as BERT

Any architecture string can be passed via --arch. The metadata mapping is generic (maps hidden_size, num_hidden_layers, etc. to {arch}.* GGUF keys). However, tensor name mapping currently covers Llama-style decoder models and BERT/RoBERTa encoder models. Unsupported tensor name patterns pass through unchanged.

Metadata Mapped

These HuggingFace config.json fields are mapped to GGUF metadata for all architectures:

config.json field	GGUF key
`hidden_size`	`{arch}.embedding_length`
`num_hidden_layers`	`{arch}.block_count`
`num_attention_heads`	`{arch}.attention.head_count`
`num_key_value_heads`	`{arch}.attention.head_count_kv`
`intermediate_size`	`{arch}.feed_forward_length`
`vocab_size`	`{arch}.vocab_size`
`max_position_embeddings`	`{arch}.context_length`
`rms_norm_eps`	`{arch}.attention.layer_norm_rms_epsilon`
`rope_theta`	`{arch}.rope.freq_base`

BERT/RoBERTa additionally map layer_norm_eps, num_labels, and pooler_type.

Usage

Installation

go install github.com/zerfoo/zonnx/cmd/zonnx@latest

Or build from source:

go build -o zonnx ./cmd/zonnx

Requires Go 1.26+. CGO is not required (CGO_ENABLED=0 works).

Quickstart

# 1) Download an ONNX model and tokenizer files from HuggingFace
zonnx download --model google/gemma-2-2b-it --output ./models

# 2) Convert ONNX → GGUF
zonnx convert --arch gemma --output ./models/model.gguf ./models/model.onnx

# 3) Convert SafeTensors → GGUF (pass directory containing config.json + model.safetensors)
zonnx convert --format safetensors --arch bert --output ./models/model.gguf ./models/bert-dir/

# 4) Convert with quantization
zonnx convert --quantize q4_0 --output ./models/model-q4.gguf ./models/model.onnx

# 5) Inspect either format
zonnx inspect --pretty ./models/model.onnx
zonnx inspect --pretty ./models/model.gguf

Commands

`convert`

Convert ONNX or SafeTensors models to GGUF.

zonnx convert [flags] <input>

Flag	Default	Description
`--output`	`<input-dir>/<input-base>.gguf`	Output GGUF file path
`--arch`	`llama`	Model architecture for metadata/tensor mapping
`--format`	`onnx`	Input format: `onnx` or `safetensors`
`--quantize`	(none)	Quantize weights: `q4_0` or `q8_0`

For ONNX input, <input> is a .onnx model file. For SafeTensors, <input> is a directory containing config.json and model.safetensors.

`download`

Download an ONNX model and tokenizer files from HuggingFace Hub.

zonnx download --model <huggingface-model-id> [--output <dir>] [--api-key <key>]

Flag	Default	Description
`--model`	(required)	HuggingFace model ID (e.g., `google/gemma-2-2b-it`)
`--output`	`.`	Output directory
`--api-key`	`$HF_API_KEY`	HuggingFace API key for authenticated downloads

The --api-key flag takes precedence over the HF_API_KEY environment variable.

`inspect`

Inspect ONNX or GGUF model files.

zonnx inspect [--type onnx|gguf] [--pretty] <input-file>

Type is inferred from file extension when not specified.

`import` / `export`

Future-friendly aliases. import is an alias for convert. export (GGUF → ONNX) is planned.

Architectural Principles

zonnx is strictly decoupled from the zerfoo runtime:

GGUF-only output: Emits only GGUF files. No runtime code.
No zerfoo imports: The zonnx codebase does not import github.com/zerfoo/zerfoo.
No ONNX in zerfoo: The zerfoo runtime consumes only GGUF models.
Explicit schema: GGUF output captures all model attributes directly, without relying on ONNX runtime semantics.

Development

make test       # go test ./...
make lint       # golangci-lint run
make lint-fix   # golangci-lint run --fix
make format     # gofmt + goimports

License

Apache 2.0

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

test_convert.go

Directories ¶

Path	Synopsis
cmd
granite2gguf command Command granite2gguf converts IBM Granite Time Series SafeTensors models to GGUF format for use with zerfoo inference.	Command granite2gguf converts IBM Granite Time Series SafeTensors models to GGUF format for use with zerfoo inference.
zonnx command
internal
onnx
pkg
converter
downloader
gguf
importer
importer/layers
inspector
quantize
registry
zmf_inspector
safetensors Package safetensors implements a reader for HuggingFace's SafeTensors binary format, which stores named tensors with a JSON header followed by contiguous raw data.	Package safetensors implements a reader for HuggingFace's SafeTensors binary format, which stores named tensors with a JSON header followed by contiguous raw data.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

zonnx

Features

Supported Models

Metadata Mapped

Usage

Installation

Quickstart

Commands

convert

download

inspect

import / export

Architectural Principles

Development

License

Documentation ¶

Source Files ¶

Directories ¶

`convert`

`download`

`inspect`

`import` / `export`