12-vision-multimodal

command
v1.11.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 24, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Overview

Recipe 12: Vision / Multimodal Inference

Analyze images using a vision-capable GGUF model. The image is passed alongside a text prompt using the inference.Message API, the same format used by the OpenAI-compatible /v1/chat/completions endpoint.

Requirements:

  • A vision-capable GGUF model (e.g. LLaVA, Gemma 3 with vision encoder)

Usage:

go run ./docs/cookbook/12-vision-multimodal/ --model path/to/vision-model.gguf --image photo.jpg
go run ./docs/cookbook/12-vision-multimodal/ --model path/to/vision-model.gguf --image photo.jpg --prompt "Count the objects"

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL