12-vision-multimodal

command

v1.11.1 Latest Latest Go to latest Published: Mar 24, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Recipe 12: Vision / Multimodal Inference

Analyze images using a vision-capable GGUF model. The image is passed alongside a text prompt using the inference.Message API, the same format used by the OpenAI-compatible /v1/chat/completions endpoint.

Requirements:

A vision-capable GGUF model (e.g. LLaVA, Gemma 3 with vision encoder)

Usage:

go run ./docs/cookbook/12-vision-multimodal/ --model path/to/vision-model.gguf --image photo.jpg
go run ./docs/cookbook/12-vision-multimodal/ --model path/to/vision-model.gguf --image photo.jpg --prompt "Count the objects"

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL