grail

package module

v0.3.0 Latest Latest Go to latest Published: Dec 17, 2025 License: MIT Imports: 10 Imported by: 0

README ¶

grail

A lightweight Go SDK that unifies multiple AI providers behind a consistent interface for text and image generation.

Design Goals

One client for text & image generation across providers
Provider-agnostic by default, extensible when needed
Multimodal-first: ordered text + image inputs
Flexible Configuration: Client, provider and per-request options
Type-Safe Errors: Typed error codes for predictable error handling

Installation

go get github.com/montanaflynn/grail

Quick Start

// Create a provider (automatically uses OPENAI_API_KEY if not provided)
provider, _ := openai.New()

// Create a client
client := grail.NewClient(provider)

// Generate text
res, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{grail.InputText("Create a haiku")},
	Output: grail.OutputText(),
})
text, _ := res.Text()
fmt.Println(text)

// Generate image
imgRes, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{grail.InputText("A beautiful sunset")},
	Output: grail.OutputImage(grail.ImageSpec{Count: 1}),
})
imgs, _ := imgRes.Images()
os.WriteFile("sunset.png", imgs[0], 0644)

// Generate image with provider-specific options
import "github.com/montanaflynn/grail/providers/gemini"
imgRes2, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{grail.InputText("A landscape photo")},
	Output: grail.OutputImage(grail.ImageSpec{Count: 1}),
	ProviderOptions: []grail.ProviderOption{
		gemini.WithImageAspectRatio(gemini.ImageAspectRatio16_9),
		gemini.WithImageSize(gemini.ImageSize2K),
	},
})

// Image understanding (text from image)
imgData, _ := os.ReadFile("photo.jpg")
imgInput := grail.InputImage(imgData)
textRes, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{
		grail.InputText("Describe this image"),
		imgInput,
	},
	Output: grail.OutputText(),
})
text, _ := textRes.Text()
fmt.Println(text)

// Multimodal image generation (image from text + image)
imgRes3, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{
		grail.InputText("Create a variation of this image"),
		imgInput,
		grail.InputText("but make it more colorful"),
	},
	Output: grail.OutputImage(grail.ImageSpec{Count: 1}),
})
imgs, _ := imgRes3.Images()
os.WriteFile("variation.png", imgs[0], 0644)

// PDF understanding (text from PDF)
pdfData, _ := os.ReadFile("document.pdf")
pdfRes, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{
		grail.InputText("Summarize this document"),
		grail.InputPDF(pdfData),
	},
	Output: grail.OutputText(),
})
text, _ := pdfRes.Text()
fmt.Println(text)

// Model selection: explicit model name
res, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{grail.InputText("Hello")},
	Output: grail.OutputText(),
	Model:  "gpt-4o",  // Use this specific model
})

// Model selection: tier-based (provider picks the right model)
res, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{grail.InputText("Hello")},
	Output: grail.OutputText(),
	Tier:   grail.ModelTierFast,  // Let provider pick the fast text model
})

// Query available models
models, _ := client.ListModels(ctx)
for _, m := range models {
	fmt.Printf("%s: role=%s tier=%s\n", m.Name, m.Role, m.Tier)
}

// Get specific model by role and tier
model, _ := client.GetModel(ctx, grail.ModelRoleText, grail.ModelTierBest)
fmt.Printf("Best text model: %s\n", model.Name)

Examples

See the examples/ directory for complete, runnable examples:

Simple Text: Minimal text generation
Text Generation: Text generation with provider selection
Text to Image: Image generation from text prompts
Image Understanding: Text generation from images
PDF Understanding: Text generation from PDF documents
PDF to Image: Image generation from PDF documents (e.g., infographics)
OpenAI Image Options: Provider-specific image options (format, background, size, moderation, compression)
Gemini Image Options: Provider-specific image options (aspect ratio, size)

Providers

OpenAI

import "github.com/montanaflynn/grail/providers/openai"

// Basic usage (uses OPENAI_API_KEY env var)
provider, err := openai.New()

// With options
provider, err := openai.New(
    openai.WithAPIKey("sk-..."),
    openai.WithTextModel("gpt-4"),
    openai.WithImageModel("gpt-image-1"),
    openai.WithLogger(logger),
)

Options:

WithAPIKey(key string) - Set API key explicitly
WithAPIKeyFromEnv(env string) - Read API key from environment variable
WithTextModel(model string) - Override default text model (default: gpt-5.2)
WithImageModel(model string) - Override default image model (default: gpt-image-1)
WithLogger(logger *slog.Logger) - Set custom logger

Image Options:

WithImageFormat(format ImageFormat) - Set output format (png, jpeg, webp)
WithImageBackground(bg ImageBackground) - Set background (auto, transparent, opaque)
WithImageSize(size ImageSize) - Set image size (auto, 1024x1024, 1536x1024, 1024x1536, 256x256, 512x512, 1792x1024, 1024x1792)
WithImageModeration(moderation ImageModeration) - Set moderation level (auto, low)
WithImageOutputCompression(compression int) - Set output compression quality (0-100)

Text Options:

TextOptions{Model, MaxTokens, Temperature, TopP, SystemPrompt} - Provider-specific text generation options

Gemini

import "github.com/montanaflynn/grail/providers/gemini"

// Basic usage (uses GEMINI_API_KEY env var)
provider, err := gemini.New(ctx)

// With options
provider, err := gemini.New(ctx,
    gemini.WithAPIKey("..."),
    gemini.WithTextModel("gemini-3-flash-preview"),
    gemini.WithImageModel("gemini-2.5-flash-image"),
    gemini.WithLogger(logger),
)

Options:

WithAPIKey(key string) - Set API key explicitly
WithAPIKeyFromEnv(env string) - Read API key from environment variable
WithTextModel(model string) - Override default text model (default: gemini-3-flash-preview)
WithImageModel(model string) - Override default image model (default: gemini-2.5-flash-image)
WithLogger(logger *slog.Logger) - Set custom logger

Image Options:

WithImageAspectRatio(ratio ImageAspectRatio) - Set aspect ratio (1:1, 16:9, etc.)
WithImageSize(size ImageSize) - Set image size (1K, 2K, 4K)

Text Options:

TextOptions{Model, MaxTokens, Temperature, TopP, SystemPrompt} - Provider-specific text generation options

Links

Development

# Run tests
go test ./...

# Format code
go fmt ./...

# Run linter
go vet ./...

# Or use make
make format
make lint
make test
make # runs all

Contributing

See CONTRIBUTING.md for contribution guidelines.

Changelog

See CHANGELOG.md for a detailed list of changes.

License

MIT License - see LICENSE for details.

Documentation ¶

Overview ¶

Package grail provides a unified interface for AI text and image generation across multiple providers (OpenAI, Gemini, etc.). It supports multimodal inputs (ordered sequences of text, images, and PDFs) and provides type-safe error handling, structured logging, and flexible configuration options.

Example usage:

provider, _ := openai.New()
client := grail.NewClient(provider)
res, _ := client.Generate(ctx, grail.Request{
	Inputs: []grail.Input{grail.InputText("Hello, world!")},
	Output: grail.OutputText(),
})

Sub-packages:

This package provides the core client and interfaces. Provider implementations are available in sub-packages:

providers - All providers (https://pkg.go.dev/github.com/montanaflynn/grail/providers)
providers/openai - OpenAI provider (https://pkg.go.dev/github.com/montanaflynn/grail/providers/openai)
providers/gemini - Google Gemini provider (https://pkg.go.dev/github.com/montanaflynn/grail/providers/gemini)
providers/mock - Mock provider (https://pkg.go.dev/github.com/montanaflynn/grail/providers/mock)

Index ¶

Constants
Variables
func AsFileInput(input Input) ([]byte, string, string, bool)
func AsFileReaderInput(input Input) (io.Reader, int64, string, string, bool)
func AsTextInput(input Input) (string, bool)
func GetJSONOutput(output Output) (schema any, strict bool, ok bool)
func IsRateLimited(err error) bool
func IsRefused(err error) bool
func IsRetryable(err error) bool
func IsTextOutput(output Output) bool
func NewGrailError(code ErrorCode, message string) *grailError
func Pointer[T any](v T) *T
func SniffImageMIME(data []byte) string
type Client
- func NewClient(p Provider, opts ...ClientOption) Client
type ClientOption
- func WithDownloadLimits(maxBytes int64, timeout time.Duration) ClientOption
- func WithHTTPClient(hc *http.Client) ClientOption
- func WithLogger(l *slog.Logger) ClientOption
- func WithLoggerFormat(format string, level LoggerLevel) ClientOption
type ErrorCode
- func GetErrorCode(err error) ErrorCode
type FileOpt
- func WithFileName(name string) FileOpt
type GrailError
type ImageOutputInfo
type ImageSpec
- func GetImageSpec(output Output) (ImageSpec, bool)
type Input
- func InputFile(data []byte, mime string, opts ...FileOpt) Input
- func InputFileFromPath(path string, opts ...FileOpt) (Input, error)
- func InputFileReader(r io.Reader, size int64, mime string, opts ...FileOpt) Input
- func InputImage(data []byte, opts ...FileOpt) Input
- func InputImageFromPath(path string, opts ...FileOpt) (Input, error)
- func InputPDF(data []byte, opts ...FileOpt) Input
- func InputPDFFromPath(path string, opts ...FileOpt) (Input, error)
- func InputText(s string) Input
- func InputTextFile(text string, mime string, opts ...FileOpt) Input
type JSONOpt
- func WithStrictJSON(strict bool) JSONOpt
type LoggerAware
type LoggerLevel
type Model
- func (m Model) String() string
type ModelCapabilities
type ModelCatalog
type ModelDescriber
type ModelLister
type ModelResolver
type ModelRole
type ModelTier
type ModelUse
type Output
- func OutputImage(spec ImageSpec) Output
- func OutputJSON(schema any, opts ...JSONOpt) Output
- func OutputText() Output
type OutputPart
- func NewImageOutputPart(data []byte, mime, name string) OutputPart
- func NewJSONOutputPart(jsonData []byte) OutputPart
- func NewTextOutputPart(text string) OutputPart
type Provider
type ProviderExecutor
type ProviderInfo
type ProviderOption
type Request
type Response
- func (r Response) DecodeJSON(dst any) error
- func (r Response) ImageOutputs() []ImageOutputInfo
- func (r Response) Images() ([][]byte, bool)
- func (r Response) Text() (string, bool)
type Usage
type Warning

Constants ¶

View Source

const (
	MaxPDFSize  = 50 * 1024 * 1024  // 50 MB
	MaxFileSize = 100 * 1024 * 1024 // 100 MB
)

Variables ¶

View Source

var LoggerLevels = map[string]LoggerLevel{
	"debug": LoggerLevelDebug,
	"info":  LoggerLevelInfo,
	"warn":  LoggerLevelWarn,
	"error": LoggerLevelError,
}

Functions ¶

func AsFileInput ¶ added in v0.2.0

func AsFileInput(input Input) ([]byte, string, string, bool)

func AsFileReaderInput ¶ added in v0.2.0

func AsFileReaderInput(input Input) (io.Reader, int64, string, string, bool)

func AsTextInput ¶ added in v0.2.0

func AsTextInput(input Input) (string, bool)

Type assertion helpers for providers

func GetJSONOutput ¶ added in v0.2.0

func GetJSONOutput(output Output) (schema any, strict bool, ok bool)

func IsRateLimited ¶ added in v0.2.0

func IsRateLimited(err error) bool

func IsRefused ¶ added in v0.2.0

func IsRefused(err error) bool

func IsRetryable ¶ added in v0.2.0

func IsRetryable(err error) bool

func IsTextOutput ¶ added in v0.2.0

func IsTextOutput(output Output) bool

Output type checking helpers for providers

func NewGrailError ¶ added in v0.2.0

func NewGrailError(code ErrorCode, message string) *grailError

func Pointer ¶

func Pointer[T any](v T) *T

Pointer is a helper to take the address of a literal value (e.g., grail.Pointer(0.0)).

func SniffImageMIME ¶ added in v0.2.1

func SniffImageMIME(data []byte) string

SniffImageMIME detects image MIME type from magic bytes. It supports PNG, JPEG, GIF, and WebP formats.

Types ¶

type Client ¶

type Client interface {
	Generate(ctx context.Context, req Request) (Response, error)

	// Explicit helpers for loading remote content (HTTP/S only).
	// These helpers perform network I/O using the client's HTTP client
	// and return concrete Inputs (bytes + MIME).
	InputFileFromURI(ctx context.Context, uri string, opts ...FileOpt) (Input, error)
	InputImageFromURI(ctx context.Context, uri string, opts ...FileOpt) (Input, error)
	InputPDFFromURI(ctx context.Context, uri string, opts ...FileOpt) (Input, error)

	// ListModels returns all available models for the provider and their capabilities.
	// Returns an error if the provider doesn't support model listing.
	ListModels(ctx context.Context) ([]Model, error)

	// GetModel returns the model matching the given role and tier.
	// Returns an error if no matching model is found.
	GetModel(ctx context.Context, role ModelRole, tier ModelTier) (Model, error)
}

func NewClient ¶

func NewClient(p Provider, opts ...ClientOption) Client

type ClientOption ¶

type ClientOption interface {
	// contains filtered or unexported methods
}

func WithDownloadLimits ¶ added in v0.2.0

func WithDownloadLimits(maxBytes int64, timeout time.Duration) ClientOption

func WithHTTPClient ¶ added in v0.2.0

func WithHTTPClient(hc *http.Client) ClientOption

func WithLogger ¶

func WithLogger(l *slog.Logger) ClientOption

WithLogger sets a custom logger for client-level logs.

func WithLoggerFormat ¶

func WithLoggerFormat(format string, level LoggerLevel) ClientOption

WithLoggerFormat builds a default logger at the given level and format ("text" or "json"). This is a convenience if you don't want to construct a slog.Logger yourself.

type ErrorCode ¶

type ErrorCode string

const (
	InvalidArgument ErrorCode = "invalid_argument"
	Unauthorized    ErrorCode = "unauthorized"
	RateLimited     ErrorCode = "rate_limited"
	Timeout         ErrorCode = "timeout"
	Unavailable     ErrorCode = "unavailable"
	Unsupported     ErrorCode = "unsupported"
	Refused         ErrorCode = "refused"
	OutputInvalid   ErrorCode = "output_invalid"
	Internal        ErrorCode = "internal"
)

func GetErrorCode ¶

func GetErrorCode(err error) ErrorCode

type FileOpt ¶ added in v0.2.0

type FileOpt interface {
	// contains filtered or unexported methods
}

func WithFileName ¶ added in v0.2.0

func WithFileName(name string) FileOpt

type GrailError ¶ added in v0.2.0

type GrailError interface {
	error
	Code() ErrorCode
	Retryable() bool
	ProviderName() string
	RequestID() string
}

type ImageOutputInfo ¶ added in v0.2.0

type ImageOutputInfo struct {
	Data []byte
	MIME string
	Name string
}

ImageOutputInfo contains image data with MIME and optional name.

type ImageSpec ¶ added in v0.2.0

type ImageSpec struct {
	Count int // default 1
}

func GetImageSpec ¶ added in v0.2.0

func GetImageSpec(output Output) (ImageSpec, bool)

type Input ¶ added in v0.2.0

type Input interface {
	// contains filtered or unexported methods
}

func InputFile ¶ added in v0.2.0

func InputFile(data []byte, mime string, opts ...FileOpt) Input

func InputFileFromPath ¶ added in v0.2.0

func InputFileFromPath(path string, opts ...FileOpt) (Input, error)

func InputFileReader ¶ added in v0.2.0

func InputFileReader(r io.Reader, size int64, mime string, opts ...FileOpt) Input

func InputImage ¶ added in v0.2.0

func InputImage(data []byte, opts ...FileOpt) Input

func InputImageFromPath ¶ added in v0.2.0

func InputImageFromPath(path string, opts ...FileOpt) (Input, error)

func InputPDF ¶ added in v0.2.0

func InputPDF(data []byte, opts ...FileOpt) Input

func InputPDFFromPath ¶ added in v0.2.0

func InputPDFFromPath(path string, opts ...FileOpt) (Input, error)

func InputText ¶ added in v0.2.0

func InputText(s string) Input

func InputTextFile ¶ added in v0.2.0

func InputTextFile(text string, mime string, opts ...FileOpt) Input

type JSONOpt ¶ added in v0.2.0

type JSONOpt interface {
	// contains filtered or unexported methods
}

func WithStrictJSON ¶ added in v0.2.0

func WithStrictJSON(strict bool) JSONOpt

type LoggerAware ¶

type LoggerAware interface {
	SetLogger(*slog.Logger)
}

LoggerAware is an optional interface for providers to accept a logger from the client.

type LoggerLevel ¶

type LoggerLevel slog.Level

LoggerLevel is a small enum for convenience logger construction.

const (
	LoggerLevelDebug LoggerLevel = LoggerLevel(slog.LevelDebug)
	LoggerLevelInfo  LoggerLevel = LoggerLevel(slog.LevelInfo)
	LoggerLevelWarn  LoggerLevel = LoggerLevel(slog.LevelWarn)
	LoggerLevelError LoggerLevel = LoggerLevel(slog.LevelError)
)

type Model ¶ added in v0.3.0

type Model struct {
	Name         string            // Model identifier (e.g., "gpt-5.2", "gemini-3-flash-preview")
	Role         ModelRole         // text or image
	Tier         ModelTier         // best or fast
	Capabilities ModelCapabilities // What the model can do
}

Model describes a model and its capabilities. Providers export these as package-level variables for easy reference.

func (Model) String ¶ added in v0.3.0

func (m Model) String() string

String returns the model name for use in requests.

type ModelCapabilities ¶ added in v0.3.0

type ModelCapabilities struct {
	TextGeneration     bool // Can generate text from text input
	ImageGeneration    bool // Can generate images from text input
	ImageUnderstanding bool // Can understand/describe images
	PDFUnderstanding   bool // Can understand/extract from PDFs
	JSONOutput         bool // Can output structured JSON
}

ModelCapabilities describes what a model can do.

type ModelCatalog ¶ added in v0.3.0

type ModelCatalog interface {
	SetBestTextModel(model Model)
	SetFastTextModel(model Model)
	SetBestImageModel(model Model)
	SetFastImageModel(model Model)

	BestTextModel() Model
	FastTextModel() Model
	BestImageModel() Model
	FastImageModel() Model

	AllModels() []Model
}

ModelCatalog is an optional interface for providers to manage model selection. Providers implement this to allow users to override default models.

type ModelDescriber ¶ added in v0.3.0

type ModelDescriber interface {
	DescribeModels(req Request) string
}

ModelDescriber describes what models will be used for a request. Providers implement this to provide accurate logging when req.Model doesn't fully describe the models (e.g., OpenAI image generation uses both a text model and an image model).

type ModelLister ¶ added in v0.3.0

type ModelLister interface {
	ListModels(ctx context.Context) ([]Model, error)
}

ModelLister is an optional interface for providers to list available models.

type ModelResolver ¶ added in v0.3.0

type ModelResolver interface {
	ResolveModel(role ModelRole, tier ModelTier) (string, error)
}

ModelResolver resolves a role+tier to a model name. Providers implement this to support tier-based selection.

type ModelRole ¶ added in v0.3.0

type ModelRole string

ModelRole describes the primary function of a model.

const (
	ModelRoleText  ModelRole = "text"  // Text/language generation
	ModelRoleImage ModelRole = "image" // Image generation
)

type ModelTier ¶ added in v0.3.0

type ModelTier string

ModelTier describes the quality/speed trade-off of a model.

const (
	ModelTierBest ModelTier = "best" // Highest quality, may be slower/costlier
	ModelTierFast ModelTier = "fast" // Speed/cost optimized
)

type ModelUse ¶ added in v0.2.0

type ModelUse struct {
	Role string // "language", "image_generation", "moderation", etc.
	Name string // provider-native model identifier
}

type Output ¶ added in v0.2.0

type Output interface {
	// contains filtered or unexported methods
}

func OutputImage ¶ added in v0.2.0

func OutputImage(spec ImageSpec) Output

func OutputJSON ¶ added in v0.2.0

func OutputJSON(schema any, opts ...JSONOpt) Output

func OutputText ¶ added in v0.2.0

func OutputText() Output

type OutputPart ¶ added in v0.2.0

type OutputPart interface {
	// contains filtered or unexported methods
}

func NewImageOutputPart ¶ added in v0.2.0

func NewImageOutputPart(data []byte, mime, name string) OutputPart

func NewJSONOutputPart ¶ added in v0.2.0

func NewJSONOutputPart(jsonData []byte) OutputPart

func NewTextOutputPart ¶ added in v0.2.0

func NewTextOutputPart(text string) OutputPart

OutputPart construction helpers for providers

type Provider ¶

type Provider interface {
	Name() string
}

type ProviderExecutor ¶ added in v0.2.0

type ProviderExecutor interface {
	Provider
	DoGenerate(ctx context.Context, req Request) (Response, error)
}

ProviderExecutor is the internal execution seam (implemented by provider packages). This is exported so provider packages can implement it, but it's not part of the public API contract - users should not implement this directly.

type ProviderInfo ¶ added in v0.2.0

type ProviderInfo struct {
	Name   string
	Route  string // provider-defined (e.g. "responses", "images")
	Models []ModelUse
}

type ProviderOption ¶

type ProviderOption interface {
	ApplyProviderOption() // marker method - must be exported for provider packages
}

type Request ¶ added in v0.2.0

type Request struct {
	Inputs          []Input
	Output          Output
	Model           string    // Optional: explicit model name (highest priority)
	Tier            ModelTier // Optional: tier-based selection (if Model not set)
	ProviderOptions []ProviderOption
	Metadata        map[string]string
}

type Response ¶ added in v0.2.0

type Response struct {
	Outputs   []OutputPart
	Usage     Usage
	Provider  ProviderInfo
	RequestID string
	Warnings  []Warning
}

func (Response) DecodeJSON ¶ added in v0.2.0

func (r Response) DecodeJSON(dst any) error

func (Response) ImageOutputs ¶ added in v0.2.0

func (r Response) ImageOutputs() []ImageOutputInfo

ImageOutputs returns image output parts with MIME and name information.

func (Response) Images ¶ added in v0.2.0

func (r Response) Images() ([][]byte, bool)

func (Response) Text ¶ added in v0.2.0

func (r Response) Text() (string, bool)

type Usage ¶ added in v0.2.0

type Usage struct {
	InputTokens  int
	OutputTokens int
	TotalTokens  int
}

type Warning ¶ added in v0.2.0

type Warning struct {
	Code    string
	Message string
}

Source Files ¶

View all Source files

grail.go

Directories ¶

Path	Synopsis
examples
gemini-image-options command Gemini-image-options demonstrates Gemini-specific image generation options.	Gemini-image-options demonstrates Gemini-specific image generation options.
image-understanding command Image-understanding demonstrates text generation from image inputs.	Image-understanding demonstrates text generation from image inputs.
model-selection command Model-selection demonstrates choosing between best and fast model tiers.	Model-selection demonstrates choosing between best and fast model tiers.
openai-image-options command Openai-image-options demonstrates OpenAI-specific image generation options.	Openai-image-options demonstrates OpenAI-specific image generation options.
pdf-to-image command Pdf-to-image demonstrates image generation from PDF documents.	Pdf-to-image demonstrates image generation from PDF documents.
pdf-understanding command Pdf-understanding demonstrates text generation from PDF documents.	Pdf-understanding demonstrates text generation from PDF documents.
simple-text command Simple-text demonstrates minimal text generation with default settings.	Simple-text demonstrates minimal text generation with default settings.
text-generation command Text-generation demonstrates text generation with provider selection.	Text-generation demonstrates text generation with provider selection.
text-to-image command Text-to-image demonstrates image generation from text prompts.	Text-to-image demonstrates image generation from text prompts.
providers
gemini Package gemini provides a Google Gemini implementation of the grail.Provider interface.	Package gemini provides a Google Gemini implementation of the grail.Provider interface.
mock Package mock provides a test double implementation of the grail.Provider interface.	Package mock provides a test double implementation of the grail.Provider interface.
openai Package openai provides an OpenAI implementation of the grail.Provider interface.	Package openai provides an OpenAI implementation of the grail.Provider interface.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL