Analyze images using a vision-capable GGUF language model.
How it works
Reads an image file (JPEG or PNG) from disk
Loads a vision-capable GGUF model via inference.LoadFile
Sends the image as part of an inference.Message with the Images field
Generates a text description or analysis using model.Chat
This uses the same multimodal API that powers the OpenAI-compatible /v1/chat/completions endpoint for vision requests.
Prerequisites
Requires a vision-capable model (e.g., LLaVA, Gemma 3 with vision encoder). Text-only models will ignore the image data.
Usage
go build -o vision-analysis ./examples/vision-analysis/
# Describe an image
./vision-analysis --model path/to/vision-model.gguf --image photo.jpg
# Ask a specific question about an image
./vision-analysis --model path/to/vision-model.gguf --image chart.png \
--prompt "What trend does this chart show?"
# With GPU
./vision-analysis --model path/to/vision-model.gguf --device cuda --image photo.jpg
Command vision-analysis demonstrates multimodal inference with image input.
It loads a vision-capable GGUF model, reads an image file, and asks the model
to describe or analyze the image. This uses the same inference.Message API
that the OpenAI-compatible server uses for vision requests.
Usage:
go build -o vision-analysis ./examples/vision-analysis/
./vision-analysis --model path/to/vision-model.gguf --image photo.jpg
./vision-analysis --model path/to/vision-model.gguf --image photo.jpg --prompt "What objects are in this image?"