imgfeed

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2026 License: MIT Imports: 20 Imported by: 0

README

go-imgfeed

Go Reference CI Go Report Card

Load an image from anywhere and feed it to a multimodal LLM in one call — from Go.

Every Go LLM SDK supports image input, but they all leave the same three chores to you: read the bytes, figure out the MIME type, and assemble the data:<mime>;base64,<...> URL — then wrap it in that SDK's content-part struct. go-imgfeed does all of it, adds optional downscaling to a token / byte budget and an image token-cost estimate, and hands the result to any SDK through a thin adapter.

img, _ := imgfeed.FromFile("photo.png",
    imgfeed.WithMaxDim(1024),          // downscale to control cost
    imgfeed.WithDetail(imgfeed.High))

fmt.Println(img.EstimateTokens("gpt-4o")) // budget before you send
url := img.DataURL()                       // ready for any image_url field

Why

  • One call, any sourceFromFile, FromBytes, FromReader, FromURL, FromImage (an image.Image).
  • MIME auto-detection from the actual magic bytes (http.DetectContentType), with a file-extension fallback — no more guessing "image/jpeg".
  • Budget controlWithMaxDim downscales (aspect-preserving) and WithMaxBytes re-encodes to fit a size limit, so you don't blow your token budget or hit a provider's image-size cap.
  • Cost estimateEstimateTokens(model) implements OpenAI's tile-based image formula (detail-aware), so you can predict input cost up front.
  • Provider-agnostic — the same loaded image drops into the official OpenAI SDK, the community sashabaranov/go-openai, langchaingo, the Anthropic SDK, or Google's genai (Gemini). Switch providers without re-writing your image plumbing.
  • No bloat — the core package imports only golang.org/x/image. Each SDK adapter lives in its own subpackage and imports only its own SDK, so you pull in just the one you use.

Install

go get github.com/ultramcu/go-imgfeed

Requires Go 1.25+ (the floor set by the bundled SDK adapters).

Core API

img, err := imgfeed.FromFile("cat.png")        // or FromBytes/FromReader/FromURL/FromImage
img.MIME            // "image/png"
img.Width, img.Height
img.DataURL()       // "data:image/png;base64,..."
img.Base64()        // raw base64, no prefix
img.EstimateTokens("gpt-4o")

Options: WithMaxDim(px), WithMaxBytes(n), WithFormat(imgfeed.PNG|imgfeed.JPEG), WithJPEGQuality(q), WithMIME(m), WithDetail(imgfeed.Auto|Low|High), WithHTTPClient(c).

Adapters

Each adapter turns an *imgfeed.Image into a content part for one SDK.

OpenAI — official openai/openai-go
import (
    "github.com/openai/openai-go/v3"
    "github.com/ultramcu/go-imgfeed"
    "github.com/ultramcu/go-imgfeed/openaidapter"
)

img, _ := imgfeed.FromFile("photo.png", imgfeed.WithDetail(imgfeed.High))
msg := openai.UserMessage([]openai.ChatCompletionContentPartUnionParam{
    openaidapter.Text("What is in this image?"),
    openaidapter.Part(img),
})
OpenAI — community sashabaranov/go-openai
import (
    openai "github.com/sashabaranov/go-openai"
    "github.com/ultramcu/go-imgfeed"
    "github.com/ultramcu/go-imgfeed/sashadapter"
)

img, _ := imgfeed.FromFile("photo.png")
msg := openai.ChatCompletionMessage{
    Role: openai.ChatMessageRoleUser,
    MultiContent: []openai.ChatMessagePart{
        sashadapter.Text("What is in this image?"),
        sashadapter.Part(img),
    },
}
langchaingo
import (
    "github.com/tmc/langchaingo/llms"
    "github.com/ultramcu/go-imgfeed"
    "github.com/ultramcu/go-imgfeed/lcadapter"
)

img, _ := imgfeed.FromFile("photo.png")
msg := llms.MessageContent{
    Role: llms.ChatMessageTypeHuman,
    Parts: []llms.ContentPart{
        lcadapter.Text("What is in this image?"),
        lcadapter.Part(img), // raw bytes + MIME; langchaingo serializes per provider
    },
}
// lcadapter.URLPart(img) is also available (data-URL form with detail).
Anthropic — anthropics/anthropic-sdk-go
import (
    "github.com/anthropics/anthropic-sdk-go"
    "github.com/ultramcu/go-imgfeed"
    "github.com/ultramcu/go-imgfeed/anthropicadapter"
)

img, _ := imgfeed.FromFile("photo.png")
msg := anthropic.NewUserMessage(
    anthropicadapter.Text("What is in this image?"),
    anthropicadapter.Block(img), // inline base64 image block
)
Google Gemini — google.golang.org/genai
import (
    "github.com/ultramcu/go-imgfeed"
    "github.com/ultramcu/go-imgfeed/genaidapter"
    "google.golang.org/genai"
)

img, _ := imgfeed.FromFile("photo.png")
content := genai.NewContentFromParts([]*genai.Part{
    genaidapter.Text("What is in this image?"),
    genaidapter.Part(img), // inline image Blob
}, genai.RoleUser)

Notes

  • EstimateTokens is an approximation: it follows OpenAI's documented tile formula and varies by model (the mini/nano tiers scale up to match text-token pricing). Unknown models fall back to the gpt-4o cost.
  • Anthropic and Gemini have no per-image "detail" concept, so WithDetail is ignored by anthropicadapter and genaidapter.
  • Decoders for PNG, JPEG, GIF, WebP, BMP and TIFF are registered; re-encoding (for resizing/byte budgets) outputs PNG or JPEG.

License

MIT — see LICENSE.

Documentation

Overview

Package imgfeed loads images from files, byte slices, readers, URLs or image.Image values and normalizes them into a small, provider-agnostic Image that is ready to "feed" to a multimodal LLM.

It removes the three menial steps every Go LLM SDK leaves to the caller: reading the bytes, detecting the MIME type, and assembling the "data:<mime>;base64,<...>" URL. On top of that it can optionally downscale an image to a pixel or byte budget (see WithMaxDim and WithMaxBytes) and estimate the number of input tokens it will cost (see Image.EstimateTokens).

The core package has no LLM SDK dependencies. To turn an Image into a content part for a specific SDK, import one of the adapter subpackages:

  • github.com/ultramcu/go-imgfeed/sashadapter (sashabaranov/go-openai)
  • github.com/ultramcu/go-imgfeed/openaidapter (openai/openai-go)
  • github.com/ultramcu/go-imgfeed/lcadapter (tmc/langchaingo)
  • github.com/ultramcu/go-imgfeed/anthropicadapter (anthropics/anthropic-sdk-go)
  • github.com/ultramcu/go-imgfeed/genaidapter (google.golang.org/genai, Gemini)

Each adapter imports only its own SDK, so importing the core (or one adapter) never pulls in the others.

Basic usage:

img, err := imgfeed.FromFile("photo.png",
	imgfeed.WithMaxDim(1024),
	imgfeed.WithDetail(imgfeed.High))
if err != nil {
	// handle error
}
url := img.DataURL()                 // ready for any image_url field
cost := img.EstimateTokens("gpt-4o") // approximate input tokens

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// ErrEmpty is returned when the source contains no data.
	ErrEmpty = errors.New("imgfeed: empty image data")
	// ErrNotImage is returned when the data is not a recognized image type.
	ErrNotImage = errors.New("imgfeed: data is not a recognized image type")
)

Errors returned by the loaders.

Functions

This section is empty.

Types

type Detail

type Detail string

Detail mirrors the OpenAI image "detail" hint, which controls how much detail the model extracts from an image and therefore how many tokens it costs. It is carried on the resulting Image, applied by the SDK adapters that support it, and used by Image.EstimateTokens.

const (
	// Auto lets the provider decide the detail level. It is the default.
	Auto Detail = "auto"
	// Low requests a low-resolution, fixed-cost reading of the image.
	Low Detail = "low"
	// High requests a high-resolution, tile-based reading of the image.
	High Detail = "high"
)

type Format

type Format string

Format is an output image encoding used when an image must be re-encoded (because of resizing, a byte budget, or an explicit conversion) or when it is built from an image.Image. Only the lossless PNG and lossy JPEG encoders are supported.

const (
	// PNG selects lossless PNG encoding.
	PNG Format = "image/png"
	// JPEG selects lossy JPEG encoding (see [WithJPEGQuality]).
	JPEG Format = "image/jpeg"
)

type Image

type Image struct {
	// Data holds the encoded image bytes (possibly re-encoded by resizing).
	Data []byte
	// MIME is the image media type, e.g. "image/png".
	MIME string
	// Width and Height are the pixel dimensions, or 0 if they could not be
	// determined.
	Width, Height int
	// Detail is the resolved detail hint (see [WithDetail]).
	Detail Detail
}

Image is a normalized, ready-to-send image: the encoded bytes plus the detected MIME type, decoded dimensions (0 if unknown) and the chosen detail hint. Use Image.DataURL for a value to drop into any image_url field, or one of the adapter subpackages to build an SDK-specific content part.

func FromBytes

func FromBytes(b []byte, opts ...Option) (*Image, error)

FromBytes loads an image from raw bytes.

func FromFile

func FromFile(path string, opts ...Option) (*Image, error)

FromFile loads an image from a file on disk. The file name is used as a fallback for MIME detection when the bytes themselves are ambiguous.

Example

Load an image from disk, downscale it to fit a token budget, and inspect the result. The DataURL is ready to drop into any provider's image_url field; the adapter subpackages turn the Image into an SDK content part.

img, err := imgfeed.FromFile("photo.png",
	imgfeed.WithMaxDim(1024),
	imgfeed.WithDetail(imgfeed.High))
if err != nil {
	// handle error
	return
}

_ = img.DataURL() // "data:image/png;base64,..." for any image_url field
fmt.Println(img.MIME, img.EstimateTokens("gpt-4o"))

func FromImage

func FromImage(img image.Image, opts ...Option) (*Image, error)

FromImage encodes an in-memory image.Image. The output format defaults to PNG and can be set with WithFormat. Resizing and byte-budget options apply as usual.

func FromReader

func FromReader(r io.Reader, opts ...Option) (*Image, error)

FromReader loads an image by reading r to completion.

func FromURL

func FromURL(ctx context.Context, rawURL string, opts ...Option) (*Image, error)

FromURL fetches an image over HTTP(S) and loads it. The request honors ctx and the client set by WithHTTPClient (default http.DefaultClient).

Example

Fetch a remote image and estimate what it will cost as input.

img, err := imgfeed.FromURL(context.Background(),
	"https://example.com/cat.jpg",
	imgfeed.WithMaxDim(768))
if err != nil {
	return
}
fmt.Printf("%dx%d ~%d tokens\n", img.Width, img.Height, img.EstimateTokens("gpt-4o"))

func (*Image) Base64

func (im *Image) Base64() string

Base64 returns the standard base64 encoding of the image bytes.

func (*Image) DataURL

func (im *Image) DataURL() string

DataURL returns the image as an RFC 2397 data URL, "data:<mime>;base64,<...>", suitable for any image_url field.

func (*Image) EstimateTokens

func (im *Image) EstimateTokens(model string) int

EstimateTokens returns an approximate number of input tokens the image will cost for the given model, using OpenAI's tile-based image formula:

  • Low detail costs a flat base amount.
  • High/Auto detail scales the image to fit a 2048x2048 box, then so its shortest side is 768px, and charges base + perTile per 512px tile.

It is an estimate; actual usage may differ slightly and varies by model. Unknown models fall back to the gpt-4o cost, and models whose name contains "mini" or "nano" use the scaled-up tier. If the image dimensions are unknown, the base cost is returned.

type Option

type Option func(*config)

Option customizes how an image is loaded and normalized.

func WithDetail

func WithDetail(d Detail) Option

WithDetail sets the OpenAI detail hint (Auto, Low or High). The default is Auto. The value is stored on the Image, forwarded by the adapters that support it, and used by Image.EstimateTokens.

func WithFormat

func WithFormat(f Format) Option

WithFormat forces the output encoding (PNG or JPEG); the image is always re-encoded to this format. When unset, original bytes are preserved unless a resize or byte budget forces a re-encode, in which case the source format is kept where possible (otherwise PNG).

func WithHTTPClient

func WithHTTPClient(h *http.Client) Option

WithHTTPClient sets the HTTP client used by FromURL. It defaults to http.DefaultClient.

func WithJPEGQuality

func WithJPEGQuality(q int) Option

WithJPEGQuality sets the JPEG quality (1-100) used when encoding to JPEG. The default is 85. Out-of-range values are reset to 85.

func WithMIME

func WithMIME(mime string) Option

WithMIME overrides MIME detection, e.g. when the bytes carry a format whose signature is not auto-detected.

func WithMaxBytes

func WithMaxBytes(n int) Option

WithMaxBytes ensures the encoded image stays at or below n bytes by progressively lowering JPEG quality and/or downscaling. It is best effort: if the floor is reached the smallest attempt is returned. A value <= 0 (the default) disables the limit.

func WithMaxDim

func WithMaxDim(px int) Option

WithMaxDim downscales the image so that neither side exceeds px pixels, preserving the aspect ratio. Images already within the bound are left untouched. A value <= 0 (the default) disables resizing.

Directories

Path Synopsis
Package anthropicadapter converts an imgfeed.Image into content blocks for the official SDK github.com/anthropics/anthropic-sdk-go.
Package anthropicadapter converts an imgfeed.Image into content blocks for the official SDK github.com/anthropics/anthropic-sdk-go.
Package genaidapter turns an imgfeed.Image into a content part for the Google Gen AI SDK (google.golang.org/genai), used by Gemini models.
Package genaidapter turns an imgfeed.Image into a content part for the Google Gen AI SDK (google.golang.org/genai), used by Gemini models.
Package lcadapter converts an imgfeed.Image into content parts for the LLM framework github.com/tmc/langchaingo.
Package lcadapter converts an imgfeed.Image into content parts for the LLM framework github.com/tmc/langchaingo.
Package openaidapter adapts an imgfeed.Image into content parts for the official OpenAI Go SDK (github.com/openai/openai-go/v3), targeting the Chat Completions multimodal message format.
Package openaidapter adapts an imgfeed.Image into content parts for the official OpenAI Go SDK (github.com/openai/openai-go/v3), targeting the Chat Completions multimodal message format.
Package sashadapter converts an imgfeed.Image into content parts for the community SDK github.com/sashabaranov/go-openai.
Package sashadapter converts an imgfeed.Image into content parts for the community SDK github.com/sashabaranov/go-openai.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL