aigateway

package module

v1.1.0 Latest Latest Go to latest Published: May 24, 2026 License: Apache-2.0 Imports: 31 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ferro-labs/ai-gateway

Links

Open Source Insights

README ¶

English | 中文

Ferro Labs AI Gateway

High-performance AI gateway in Go. Route LLM requests across 30 providers via a single OpenAI-compatible API.

Deploy templates

🔀 30 providers, 2,500+ models — one API
⚡ 13,925 RPS at 1,000 concurrent users
📦 Single binary, zero dependencies, 32 MB base memory

Quick Start

Get from zero to first request in under 2 minutes.

Option A — Binary (fastest)

curl -fsSL https://github.com/ferro-labs/ai-gateway/releases/download/v1.0.6/ferrogw_1.0.6_linux_amd64.tar.gz | tar xz
chmod +x ferrogw
./ferrogw init          # generates config.yaml + MASTER_KEY
./ferrogw               # starts the server

Option B — Docker

docker pull ghcr.io/ferro-labs/ai-gateway:latest
docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-your-key \
  -e MASTER_KEY=fgw_your-master-key \
  ghcr.io/ferro-labs/ai-gateway:latest

Option C — Go

go install github.com/ferro-labs/ai-gateway/cmd/ferrogw@latest
ferrogw init            # first-run setup
ferrogw                 # start the server

First-time setup

ferrogw init generates a master key and writes a minimal config.yaml:

$ ferrogw init

  Master key (set as MASTER_KEY env var):
  fgw_a3f2e1d4c5b6a7f8e9d0c1b2a3f4e5d6

  Config written to: ./config.yaml

  Next steps:
    export MASTER_KEY=fgw_a3f2e1d4c5b6a7f8e9d0c1b2a3f4e5d6
    export OPENAI_API_KEY=sk-...
    ferrogw

The master key is shown once — store it in your .env file or secret manager. It is never written to disk.

Minimal config

Create config.yaml (or use ferrogw init):

strategy:
  mode: fallback

targets:
  - virtual_key: openai
    retry:
      attempts: 3
      on_status_codes: [429, 502, 503]
  - virtual_key: anthropic

aliases:
  fast: gpt-4o-mini
  smart: claude-3-5-sonnet-20241022

First request

export OPENAI_API_KEY=sk-your-key
export MASTER_KEY=fgw_your-master-key   # set by ferrogw init

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MASTER_KEY" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello from Ferro Labs AI Gateway"}]
  }' | jq

Why Ferro Labs

Most AI gateways are Python proxies that crack under load or JavaScript services that eat memory. Ferro Labs AI Gateway is written in Go from the ground up for real-world throughput — a single binary that routes LLM requests with predictable latency and minimal resource usage.

Feature	Ferro Labs	LiteLLM	Bifrost	Kong AI
Language	Go	Python	Go	Go/Lua
Single binary	✅	❌	✅	❌
Providers	30	100+	20+	10+
MCP support	✅	❌	✅	❌
Response cache	✅	✅	✅	❌ (paid)
Guardrails	✅	✅	❌	❌ (paid)
OSS license	Apache 2.0	MIT	Apache 2.0	Apache 2.0
Managed cloud	Coming Soon	✅	✅	✅

Performance

Benchmarked against Kong OSS, Bifrost, LiteLLM, and Portkey on GCP n2-standard-8 (8 vCPU, 32 GB RAM) using a 60ms fixed-latency mock upstream — results reflect gateway overhead only.

Ferro Labs Latency Profile

VU	RPS	p50	p99	Memory
50	813	61.3ms	64.1ms	36 MB
150	2,447	61.2ms	63.4ms	47 MB
300	4,890	61.2ms	64.4ms	72 MB
500	8,014	61.5ms	72.9ms	89 MB
1,000	13,925	68.1ms	111.9ms	135 MB

At 1,000 VU: 13,925 RPS, p50 overhead 8.1ms, memory 135 MB. No connection pool failures. No throughput ceiling.

Live Upstream Overhead (OpenAI API)

Measured against live OpenAI API (gpt-4o-mini) using two independent methods: the gateway's X-Gateway-Overhead-Ms response header (precise internal timing) and paired direct-vs-gateway requests (external black-box validation).

Configuration	Overhead p50	Overhead p99
No plugins (bare proxy)	0.002ms (2 microseconds)	0.03ms
With plugins (word-filter, max-token, logger, rate-limit)	0.025ms (25 microseconds)	0.074ms

The gateway adds 25 microseconds of processing overhead per request in a typical production configuration. LLM API calls take 500ms-2s — the gateway is 20,000x faster than the provider it proxies.

How to Reproduce

git clone https://github.com/ferro-labs/ai-gateway-performance-benchmarks
cd ai-gateway-performance-benchmarks
make setup && make bench

Full methodology, raw results, and flamegraph analysis: ferro-labs/ai-gateway-performance-benchmarks

Features

🔀 Routing

8 routing strategies: single, fallback, load balance, least latency, cost-optimized, content-based, A/B test, conditional
Provider failover with configurable retry policies and status code filters
Per-request model aliases (fast → gpt-4o-mini, smart → claude-3-5-sonnet)

🔌 Providers (30)

OpenAI & Compatible	Anthropic & Google	Cloud & Enterprise	Open Source & Inference
OpenAI	Anthropic	AWS Bedrock	Ollama, Ollama Cloud
Azure OpenAI	Google Gemini	Azure Foundry	Hugging Face
OpenRouter	Vertex AI	Databricks	Replicate
DeepSeek		Cloudflare Workers AI	Together AI
Perplexity			Fireworks
xAI (Grok)			DeepInfra
Mistral			NVIDIA NIM
Groq			SambaNova
Cohere			Novita AI
AI21			Cerebras
Moonshot / Kimi			Qwen / DashScope

🛡️ Guardrails & Plugins

Word/phrase filtering — block sensitive terms before they reach providers
Token and message limits — enforce max_tokens and max_messages per request
Response caching — in-memory cache with configurable TTL and entry limits
Rate limiting — global RPS plus per-API-key and per-user RPM limits
Budget controls — per-API-key USD spend tracking with configurable token pricing
Request logging — structured logs with optional SQLite/PostgreSQL persistence

⚡ Performance

Per-provider HTTP connection pools with optimized settings
sync.Pool for JSON marshaling buffers and streaming I/O
Zero-allocation stream detection, async hook dispatch batching
Single binary, ~32 MB base memory, linear scaling to 1,000+ VUs

🤖 MCP (Model Context Protocol)

Agentic tool-call loop — the gateway drives tool_calls automatically
Streamable HTTP transport (MCP 2025-11-25 spec)
Tool filtering with allowed_tools and bounded max_call_depth
Multiple MCP servers with cross-server tool deduplication

📊 Observability

OpenTelemetry tracing (v1.1.0+) — OTLP gRPC/HTTP exporter, W3C traceparent propagation, GenAI semantic conventions (gen_ai.*) plus ferro.* extensions for cost, routing, MCP, and stream timings; privacy_level enforced on error recording; configurable shutdown_grace
Prometheus metrics at /metrics
Deep health checks at /health with per-provider status
Structured JSON request logging with SQLite/PostgreSQL persistence (trace ID unified across logs, OTel spans, and X-Request-ID response header)
Admin API with usage stats, request logs, and config history/rollback
Built-in dashboard UI at /dashboard
HTTP-level connection tracing with DNS, TLS, and first-byte latency

Examples

Integration examples for common use cases are in ferro-labs/ai-gateway-examples:

Example	Description
basic	Single chat completion to the first configured provider
fallback	Fallback strategy — try providers in order with retries
loadbalance	Weighted load balancing across targets (70/30 split)
with-guardrails	Built-in word-filter and max-token guardrail plugins
with-mcp	Local MCP server with tool-calling integration
embedded	Embed the gateway as an HTTP handler inside an existing server

Configuration

Full annotated example — copy to config.yaml and customize:

# Routing strategy
strategy:
  mode: fallback  # single | fallback | loadbalance | conditional
                  # least-latency | cost-optimized | content-based | ab-test

# Provider targets (tried in order for fallback mode)
targets:
  - virtual_key: openai
    retry:
      attempts: 3
      on_status_codes: [429, 502, 503]
      initial_backoff_ms: 100
  - virtual_key: anthropic
    retry:
      attempts: 2
  - virtual_key: gemini

# Model aliases — resolved before routing
aliases:
  fast: gpt-4o-mini
  smart: claude-3-5-sonnet-20241022
  cheap: gemini-1.5-flash

# Plugins — executed in order at the configured stage
plugins:
  - name: word-filter
    type: guardrail
    stage: before_request
    enabled: true
    config:
      blocked_words: ["password", "secret"]
      case_sensitive: false

  - name: max-token
    type: guardrail
    stage: before_request
    enabled: true
    config:
      max_tokens: 4096
      max_messages: 50

  - name: rate-limit
    type: guardrail
    stage: before_request
    enabled: true
    config:
      requests_per_second: 100
      key_rpm: 60

  - name: request-logger
    type: logging
    stage: before_request
    enabled: true
    config:
      level: info
      persist: true
      backend: sqlite
      dsn: ferrogw-requests.db

# MCP tool servers (optional)
mcp_servers:
  - name: my-tools
    url: https://mcp.example.com/mcp
    headers:
      Authorization: Bearer ${MY_TOOLS_TOKEN}
    allowed_tools: [search, get_weather]
    max_call_depth: 5
    timeout_seconds: 30

See config.example.yaml and config.example.json for the full template with all options.

Observability

Ferro Labs AI Gateway ships first-class OpenTelemetry support in v1.1.0+. When OTel is disabled (the default) the gateway runs with a zero-allocation no-op provider — there is no cost to leaving it off. When you set an OTLP endpoint, every request emits a gateway.request root span with rich GenAI semantic conventions plus Ferro-specific extensions for cost, routing, and stream timings.

Enable in one step

Either set the standard OTel environment variable:

export OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317
ferrogw serve

…or add an observability block to config.yaml:

observability:
  tracing:
    enabled: true
    endpoint: localhost:4317   # or leave blank to read OTEL_EXPORTER_OTLP_ENDPOINT
    protocol: grpc             # grpc | http/protobuf
    service_name: ferrogw
    sample_ratio: 1.0
    privacy_level: metadata    # none | metadata | full  (see below)
    shutdown_grace: 10s        # max time to drain OTel exports on shutdown
    # headers:                        # OTLP export headers for authenticated backends
    #   dd-api-key: "${DATADOG_API_KEY}"  # values support ${ENV_VAR} interpolation

  # exporters wires plugin observability exporters (see "Plugin exporters" below).
  # exporters:
  #   - name: langsmith
  #     enabled: true
  #     config:
  #       api_key: "${LANGSMITH_API_KEY}"

Standard OTEL_* environment variables (e.g. OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_TRACES_SAMPLER) always take precedence over the config file — this matches the OTel SDK convention and is required for predictable container deployments.

observability.tracing.headers lets you send OTLP traces to authenticated managed backends (Datadog, New Relic, Honeycomb, Grafana Cloud) by setting vendor-specific headers such as API keys. Values support ${ENV_VAR} interpolation so secrets are never stored literally in the config file. The standard OTEL_EXPORTER_OTLP_HEADERS environment variable also applies per OTel convention.

The endpoint scheme selects transport security: an https:// endpoint uses TLS, while an http:// endpoint or a bare host:port (e.g. localhost:4317) connects in plaintext. Managed backends require the https:// form.

What gets emitted

The following attributes are currently emitted on the gateway.request root span. Attributes marked "Planned" are reserved but not yet wired.

gateway.request root span per request (SERVER kind) with gen_ai.system, gen_ai.operation.name, gen_ai.request.model, gen_ai.response.model, gen_ai.usage.{input,output}_tokens
HTTP {GET,POST} child span per outbound provider call (CLIENT kind, via otelhttp transport wrapping) — propagates traceparent to upstream providers
ferro.* emitted attributes: ferro.cost.{usd,input_usd,output_usd,cache_read_usd,cache_write_usd,reasoning_usd,model_found}, ferro.routing.{strategy,target_key}, ferro.stream.time_to_{first,last}_token_ms, ferro.gateway.trace_id, ferro.plugin.{name,kind,stage,outcome,reason}, ferro.mcp.{server,tool,latency_ms}
W3C TraceContext + Baggage propagation: inbound traceparent is honoured; outbound requests carry it forward
Unified trace ID: the OTel trace_id, the X-Request-ID response header, and the trace_id field on every log line are guaranteed equal per request for all requests served through the gateway's HTTP stack. (Embedders that bypass logging.Middleware receive a consistent-but-independent span trace ID.)

Try it locally with Jaeger

docker run --rm -p 16686:16686 -p 4317:4317 jaegertracing/all-in-one
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 ferrogw serve
# fire a request, then open http://localhost:16686

Privacy levels

privacy_level controls how error messages are recorded on spans. No prompt or response content is exported at any level — that requires a future L3 exporter plugin.

Level	Error recording on spans	Default
`none`	Status and exception carry only the static string `"redacted"` — no content or internal type exposed	—
`metadata`	Error message is redacted (email / JWT / AWS keys replaced by tokens) before being attached	✅
`full`	Raw error text recorded without redaction — for trusted self-hosted debugging only	—

Invalid values are rejected at startup by config validation.

Plugin exporters

The observability.exporters config block wires plugin exporters that receive gateway.request.completed and gateway.request.failed events on every request. Exporters operate independently of whether an OTLP tracing endpoint is configured.

No built-in exporter plugins ship in this repo. They are provided by the ai-gateway-plugins repository and self-register via observability.RegisterExporter in their init(). The observability.Exporter contract is stable as of v1.1.0. Unrecognised or failing exporters emit a warning and are skipped — the gateway still starts.

CLI

ferrogw is a single binary — no separate CLI tool required.

Command	Description
`ferrogw`	Start the gateway server (default)
`ferrogw serve`	Start the gateway server (explicit)
`ferrogw init`	First-run setup — generate master key and config
`ferrogw validate`	Validate a config file without starting
`ferrogw doctor`	Check environment (API keys, config, connectivity)
`ferrogw status`	Show gateway health and provider status
`ferrogw version`	Print version, commit, and build info
`ferrogw admin keys list`	List API keys
`ferrogw admin keys create <name>`	Create an API key
`ferrogw admin logs stats`	Show request log statistics
`ferrogw plugins`	List registered plugins

Global flags available on all subcommands: --gateway-url, --api-key, --format (table/json/yaml).

Deployment

Local development

export OPENAI_API_KEY=sk-your-key
export MASTER_KEY=fgw_your-master-key
export GATEWAY_CONFIG=./config.yaml
make build && ./bin/ferrogw

Railway (SQLite)

For a fast Railway deploy with persistent SQLite storage, attach a Railway Volume at /data and set:

MASTER_KEY=fgw_your-master-key
OPENAI_API_KEY=sk-your-key
PORT=8080
API_KEY_STORE_BACKEND=sqlite
API_KEY_STORE_DSN=/data/keys.db
CONFIG_STORE_BACKEND=sqlite
CONFIG_STORE_DSN=/data/config.db
REQUEST_LOG_STORE_BACKEND=sqlite
REQUEST_LOG_STORE_DSN=/data/logs.db
RAILWAY_RUN_UID=0

Render (PostgreSQL)

The repo includes a render.yaml Blueprint for a one-click Render deploy with a Docker web service and managed Postgres database. It generates MASTER_KEY, asks the user for OPENAI_API_KEY, and wires the three store DSNs to the database's internal connection string automatically.

Use the button at the top of this README, or deploy directly from:

https://render.com/deploy?repo=https://github.com/ferro-labs/ai-gateway

Option D — Docker Compose (dev & prod)

The repo ships three Compose files that follow the standard override pattern:

File	Purpose
`docker-compose.yml`	Base — shared image, port mapping, all provider env var stubs
`docker-compose.dev.yml`	Dev — builds from source, debug logging, live config mount, Ollama host access
`docker-compose.prod.yml`	Prod — pinned image tag, restart policy, health check, resource limits, log rotation

Dev (builds from source):

docker compose -f docker-compose.yml -f docker-compose.dev.yml up

Prod (pin to a release tag — never use latest in production):

IMAGE_TAG=v1.0.6 CORS_ORIGINS=https://your-domain.com \
  docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

Provider API keys are commented out in docker-compose.yml. Uncomment and set the ones you need, or supply them via a .env file in the same directory.

Docker Compose (with PostgreSQL)

services:
  ferrogw:
    image: ghcr.io/ferro-labs/ai-gateway:latest
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - GATEWAY_CONFIG=/etc/ferrogw/config.yaml
      - CONFIG_STORE_BACKEND=postgres
      - CONFIG_STORE_DSN=postgresql://ferrogw:ferrogw@db:5432/ferrogw?sslmode=disable
      - API_KEY_STORE_BACKEND=postgres
      - API_KEY_STORE_DSN=postgresql://ferrogw:ferrogw@db:5432/ferrogw?sslmode=disable
      - REQUEST_LOG_STORE_BACKEND=postgres
      - REQUEST_LOG_STORE_DSN=postgresql://ferrogw:ferrogw@db:5432/ferrogw?sslmode=disable
    volumes:
      - ./config.yaml:/etc/ferrogw/config.yaml:ro
    depends_on:
      - db

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: ferrogw
      POSTGRES_PASSWORD: ferrogw
      POSTGRES_DB: ferrogw
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Kubernetes via Helm

helm repo add ferro-labs https://ferro-labs.github.io/helm-charts
helm repo update
helm install ferro-gw ferro-labs/ai-gateway \
  --set env.OPENAI_API_KEY=sk-your-key

Helm charts: github.com/ferro-labs/helm-charts | ArtifactHub

Migrate to Ferro Labs AI Gateway

From LiteLLM

LiteLLM users can migrate in one step. Ferro Labs AI Gateway is OpenAI-compatible — change one line in your code:

Python (before — LiteLLM):

from litellm import completion

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

Python (after — Ferro Labs AI Gateway):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-ferro-api-key",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

Node.js (after — Ferro Labs AI Gateway):

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "your-ferro-api-key",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

Why migrate from LiteLLM:

14x higher throughput at 150 concurrent users (2,447 vs 175 RPS)
23x less memory at peak load (47 MB vs 1,124 MB under streaming)
Single binary — no Python environment, no pip, no virtualenv
Predictable latency — p99 stays under 65 ms at 150 VU vs LiteLLM's timeouts at the same concurrency

Config migration:

# LiteLLM config.yaml               # Ferro Labs config.yaml
model_list:                          strategy:
  - model_name: gpt-4o                mode: fallback
    litellm_params:
      model: gpt-4o                  targets:
      api_key: sk-...                  - virtual_key: openai
  - model_name: claude-3-5-sonnet     - virtual_key: anthropic
    litellm_params:
      model: claude-3-5-sonnet       aliases:
      api_key: sk-ant-...              fast: gpt-4o
                                       smart: claude-3-5-sonnet-20241022

Provider API keys are set via environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) — not in the config file.

From Portkey

Portkey users: Ferro Labs AI Gateway uses the standard OpenAI SDK — no custom headers required in self-hosted mode.

Before (Portkey hosted):

from portkey_ai import Portkey

client = Portkey(api_key="portkey-key")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

After (Ferro Labs AI Gateway self-hosted):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-ferro-api-key",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

Why migrate from Portkey:

Fully open source — no per-request pricing, no log limits
Self-hosted — your data never leaves your infrastructure
No vendor lock-in — Apache 2.0 license
MCP support — Portkey self-hosted lacks native MCP
FerroCloud (coming soon) for teams that want a managed service

From OpenAI SDK directly

No gateway yet? Add Ferro Labs AI Gateway in front of your existing code with a single base_url change. No other code changes required.

# Before — calling OpenAI directly
client = OpenAI(api_key="sk-...")

# After — routing through Ferro Labs AI Gateway
# Gains: failover, caching, rate limiting, cost tracking
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-ferro-api-key",
)

Ferro Labs AI Gateway handles provider failover automatically — if OpenAI is down, your requests fall through to Anthropic or Gemini with zero application code changes.

FerroCloud

FerroCloud — the managed version of Ferro Labs AI Gateway with multi-tenancy, analytics, and cost governance — is coming soon.

👉 Join the waitlist at ferrolabs.ai

SDKs

Official client libraries for the Ferro Labs AI Gateway:

SDK	Install	Repository
Python	`pip install ferrolabs`	ferro-labs/ferrolabs-python-sdk
TypeScript	`npm install ferrolabs`	ferro-labs/ferrolabs-typescript-sdk

Python

from ferrolabs import FerroClient

client = FerroClient(
    base_url="http://localhost:8080/v1",
    api_key="your-ferro-api-key",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

TypeScript

import { FerroClient } from "ferrolabs";

const client = new FerroClient({
  baseURL: "http://localhost:8080/v1",
  apiKey: "your-ferro-api-key",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

OpenAI SDK Compatible

You can also use the standard OpenAI SDK directly — just change the base URL:

Python:

from openai import OpenAI

client = OpenAI(
    api_key="sk-ferro-...",
    base_url="http://localhost:8080/v1",
)

TypeScript:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-ferro-...",
  baseURL: "http://localhost:8080/v1",
});

Contributing

We welcome contributions. New providers go in this OSS repo only — never in FerroCloud. See CONTRIBUTING.md for branch strategy, commit conventions, and PR guidelines.

Community

GitHub Discussions
Discord
Built with Ferro Labs AI Gateway? Open a PR to add to our showcase.

License

Apache 2.0 — see LICENSE.

Documentation ¶

Overview ¶

Package aigateway provides a high-performance, zero-dependency AI gateway for routing requests to large language model (LLM) providers.

The Gateway type is the main entry point: create one with New, register providers with RegisterProvider, load plugins from config with LoadPlugins, and route requests with Route or RouteStream.

Plugins and routing strategies (single, fallback, load-balance, conditional, content-based, ab-test) are configured via Config which can be loaded from a YAML or JSON file using LoadConfig.

Index ¶

Constants
func ValidateConfig(cfg Config) error
type ABVariantConfig
type CircuitBreakerConfig
type Condition
type Config
- func LoadConfig(path string) (*Config, error)
type ContentCondition
type EventHookFunc
type ExporterConfig
type Gateway
- func New(cfg Config) (*Gateway, error)
- func (g *Gateway) AddHook(fn EventHookFunc)
- func (g *Gateway) AllModels() []providers.ModelInfo
- func (g *Gateway) Catalog() models.Catalog
- func (g *Gateway) Close() error
- func (g *Gateway) Embed(ctx context.Context, req providers.EmbeddingRequest) (*providers.EmbeddingResponse, error)
- func (g *Gateway) FindByModel(model string) (providers.Provider, bool)
- func (g *Gateway) FindStreamingByModel(model string) (providers.StreamProvider, bool)
- func (g *Gateway) GenerateImage(ctx context.Context, req providers.ImageRequest) (*providers.ImageResponse, error)
- func (g *Gateway) Get(name string) (providers.Provider, bool)
- func (g *Gateway) GetConfig() Config
- func (g *Gateway) GetProvider(name string) (providers.Provider, bool)
- func (g *Gateway) List() []string
- func (g *Gateway) ListProviders() []string
- func (g *Gateway) LoadPlugins() error
- func (g *Gateway) MCPInitDone() <-chan struct{}
- func (g *Gateway) Observability() observability.Provider
- func (g *Gateway) RegisterPlugin(stage plugin.Stage, p plugin.Plugin) error
- func (g *Gateway) RegisterProvider(p providers.Provider)
- func (g *Gateway) ReloadConfig(cfg Config) error
- func (g *Gateway) Route(ctx context.Context, req providers.Request) (*providers.Response, error)
- func (g *Gateway) RouteStream(ctx context.Context, req providers.Request) (<-chan providers.StreamChunk, error)
- func (g *Gateway) SetObservability(p observability.Provider)
- func (g *Gateway) StartDiscovery(ctx context.Context, interval time.Duration) error
type ObservabilityConfig
type PluginConfig
type RetryConfig
type StrategyConfig
type StrategyMode
type Target
type TracingConfig

Constants ¶

View Source

const (
	SubjectRequestCompleted = "gateway.request.completed"
	SubjectRequestFailed    = "gateway.request.failed"
)

Event subject constants used when invoking gateway hooks.

Variables ¶

This section is empty.

Functions ¶

func ValidateConfig ¶

func ValidateConfig(cfg Config) error

ValidateConfig validates a Config for correctness.

Types ¶

type ABVariantConfig ¶ added in v0.8.5

type ABVariantConfig struct {
	// TargetKey is the virtual_key of the provider for this variant.
	TargetKey string `json:"target_key" yaml:"target_key"`
	// Weight is the relative traffic share for this variant.
	// All weights are summed; each variant's fraction is Weight/Total.
	// Zero is treated as 1 (equal distribution).
	Weight float64 `json:"weight" yaml:"weight"`
	// Label is a short human-readable identifier (e.g. "control", "challenger").
	// It is logged with every routed request for observability.
	Label string `json:"label" yaml:"label"`
}

ABVariantConfig defines a single traffic variant for the "ab-test" strategy.

type CircuitBreakerConfig ¶ added in v0.2.0

type CircuitBreakerConfig struct {
	// FailureThreshold is the number of consecutive failures before the circuit
	// opens. Defaults to 5.
	FailureThreshold int `json:"failure_threshold" yaml:"failure_threshold"`
	// SuccessThreshold is the number of consecutive successes in half-open state
	// required to close the circuit. Defaults to 1.
	SuccessThreshold int `json:"success_threshold" yaml:"success_threshold"`
	// Timeout is the duration the circuit stays open before transitioning to
	// half-open (e.g. "30s"). Defaults to "30s".
	Timeout string `json:"timeout" yaml:"timeout"`
}

CircuitBreakerConfig configures the per-provider circuit breaker.

type Condition ¶

type Condition struct {
	Key       string `json:"key" yaml:"key"`
	Value     string `json:"value" yaml:"value"`
	TargetKey string `json:"target_key" yaml:"target_key"`
}

Condition represents a condition for conditional routing.

type Config ¶

type Config struct {
	// Strategy defines how requests are routed (e.g., single, fallback, loadbalance).
	Strategy StrategyConfig `json:"strategy" yaml:"strategy"`
	// Targets is a list of provider targets to route requests to.
	Targets []Target `json:"targets" yaml:"targets"`
	// Plugins configuration (optional).
	Plugins []PluginConfig `json:"plugins,omitempty" yaml:"plugins,omitempty"`
	// Aliases maps friendly model names (e.g. "fast", "smart") to concrete model IDs.
	// Aliases are resolved before routing — they must not reference other aliases.
	Aliases map[string]string `json:"aliases,omitempty" yaml:"aliases,omitempty"`
	// MCPServers configures external MCP tool servers for agentic tool calling.
	// When set, the gateway injects discovered tools into every chat completion
	// request and executes an agentic loop when the LLM returns tool_calls.
	// FerroCloud populates this field from the tenant's mcp_servers table at
	// gateway.New() time — no separate MCPRegistry() public method is exposed.
	MCPServers []mcp.ServerConfig `json:"mcp_servers,omitempty" yaml:"mcp_servers,omitempty"`
	// MCPToolCallAuditFn, if non-nil, is called after every MCP tool invocation.
	// This field cannot be set via JSON or YAML — set it programmatically before
	// calling New. FerroCloud uses it to write async audit entries to the
	// mcp_tool_call_logs table.
	MCPToolCallAuditFn mcp.ToolCallAuditFn `json:"-" yaml:"-"`
	// Observability configures OpenTelemetry tracing. When omitted the
	// gateway runs with a NoOp provider (zero allocations on the hot
	// path). See internal/otel.
	Observability ObservabilityConfig `json:"observability,omitempty" yaml:"observability,omitempty"`
}

Config holds the configuration for the AI Gateway.

func LoadConfig ¶

func LoadConfig(path string) (*Config, error)

LoadConfig reads and parses a config file from the given path. Supported formats: JSON (.json), YAML (.yaml, .yml).

type ContentCondition ¶ added in v0.8.5

type ContentCondition struct {
	// Type is the matching rule type.
	Type string `json:"type" yaml:"type"`
	// Value is the substring or regex pattern to match against.
	Value string `json:"value" yaml:"value"`
	// TargetKey is the virtual_key of the provider to route to when this rule matches.
	TargetKey string `json:"target_key" yaml:"target_key"`
}

ContentCondition maps a prompt-content matching rule to a routing target. Used with the "content-based" strategy mode.

Supported types:

"prompt_contains" — case-insensitive substring match on user messages
"prompt_not_contains" — true when NO user message contains the value
"prompt_regex" — Go regular expression match on user messages

type EventHookFunc ¶ added in v0.2.0

type EventHookFunc func(ctx context.Context, subject string, data map[string]interface{})

EventHookFunc is called asynchronously after a gateway event (request completed or failed). It replaces the old EventPublisher interface with a simpler function-based hook pattern.

type ExporterConfig ¶ added in v1.1.0

type ExporterConfig struct {
	// Name is the canonical exporter name, e.g. "langsmith".
	// Must match the name passed to observability.RegisterExporter.
	Name string `json:"name" yaml:"name"`
	// Enabled gates the exporter. Set to false to temporarily disable
	// without removing the config block.
	Enabled bool `json:"enabled" yaml:"enabled"`
	// Config is the exporter-specific configuration map. Passed
	// verbatim to Exporter.Init at gateway startup.
	Config map[string]any `json:"config,omitempty" yaml:"config,omitempty"`
}

ExporterConfig configures a single observability plugin exporter. Plugin authors register their factory via observability.RegisterExporter in their package init(); gateway operators then reference the name here.

Example (YAML):

exporters:
  - name: langsmith
    enabled: true
    config:
      api_key: "${LANGSMITH_API_KEY}"

type Gateway ¶

type Gateway struct {
	// contains filtered or unexported fields
}

Gateway is the main entry point for routing LLM requests.

func New ¶

func New(cfg Config) (*Gateway, error)

New creates a new Gateway instance with the given configuration.

func (*Gateway) AddHook ¶ added in v0.2.0

func (g *Gateway) AddHook(fn EventHookFunc)

AddHook registers an EventHookFunc that is called asynchronously on each completed or failed request. Multiple hooks may be registered; all are invoked for every event on the shared bounded hook worker pool, so hook implementations should return promptly and avoid indefinite blocking.

func (*Gateway) AllModels ¶ added in v0.2.0

func (g *Gateway) AllModels() []providers.ModelInfo

AllModels returns ModelInfo from all registered providers. If auto-discovery has run for a provider, discovered models take precedence over the provider's static model list.

func (*Gateway) Catalog ¶ added in v0.4.5

func (g *Gateway) Catalog() models.Catalog

Catalog returns a shallow copy of the loaded model catalog. A copy is returned so callers cannot mutate the gateway's internal catalog.

func (*Gateway) Close ¶

func (g *Gateway) Close() error

Close cleans up resources.

func (*Gateway) Embed ¶ added in v0.3.0

func (g *Gateway) Embed(ctx context.Context, req providers.EmbeddingRequest) (*providers.EmbeddingResponse, error)

Embed routes an embedding request to the first registered EmbeddingProvider that supports the requested model.

func (*Gateway) FindByModel ¶ added in v0.2.0

func (g *Gateway) FindByModel(model string) (providers.Provider, bool)

FindByModel returns the first registered provider that supports the given model.

func (*Gateway) FindStreamingByModel ¶ added in v1.0.0

func (g *Gateway) FindStreamingByModel(model string) (providers.StreamProvider, bool)

FindStreamingByModel returns the first registered streaming-capable provider that supports the given model.

func (*Gateway) GenerateImage ¶ added in v0.3.0

func (g *Gateway) GenerateImage(ctx context.Context, req providers.ImageRequest) (*providers.ImageResponse, error)

GenerateImage routes an image generation request to the first registered ImageProvider that supports the requested model.

func (*Gateway) Get ¶ added in v0.2.0

func (g *Gateway) Get(name string) (providers.Provider, bool)

Get satisfies providers.ProviderSource (alias for GetProvider).

func (*Gateway) GetConfig ¶

func (g *Gateway) GetConfig() Config

GetConfig returns a copy of the current configuration.

func (*Gateway) GetProvider ¶ added in v0.2.0

func (g *Gateway) GetProvider(name string) (providers.Provider, bool)

GetProvider returns a registered provider by name.

func (*Gateway) List ¶ added in v0.2.0

func (g *Gateway) List() []string

List satisfies providers.ProviderSource (alias for ListProviders).

func (*Gateway) ListProviders ¶ added in v0.2.0

func (g *Gateway) ListProviders() []string

ListProviders returns the names of all registered providers.

func (*Gateway) LoadPlugins ¶

func (g *Gateway) LoadPlugins() error

LoadPlugins initializes and registers plugins from the gateway configuration.

func (*Gateway) MCPInitDone ¶ added in v0.8.0

func (g *Gateway) MCPInitDone() <-chan struct{}

MCPInitDone returns a channel that is closed once all MCP servers have completed their initialization handshake. The channel is pre-closed when no MCP servers are configured.

func (*Gateway) Observability ¶ added in v1.1.0

func (g *Gateway) Observability() observability.Provider

Observability returns the current observability.Provider. Always non-nil; defaults to NoOp.

func (*Gateway) RegisterPlugin ¶

func (g *Gateway) RegisterPlugin(stage plugin.Stage, p plugin.Plugin) error

RegisterPlugin registers a plugin at the given lifecycle stage.

func (*Gateway) RegisterProvider ¶

func (g *Gateway) RegisterProvider(p providers.Provider)

RegisterProvider registers a provider with the gateway.

func (*Gateway) ReloadConfig ¶

func (g *Gateway) ReloadConfig(cfg Config) error

ReloadConfig validates and applies a new configuration, forcing strategy rebuild on next request.

func (*Gateway) Route ¶

func (g *Gateway) Route(ctx context.Context, req providers.Request) (*providers.Response, error)

Route routes a request to the appropriate provider based on the configuration.

func (*Gateway) RouteStream ¶

func (g *Gateway) RouteStream(ctx context.Context, req providers.Request) (<-chan providers.StreamChunk, error)

RouteStream runs before-request plugins then returns a metered streaming response channel. Provider resolution follows the configured strategy mode, then falls back to any registered provider that supports the requested model and streaming. Prometheus metrics and event hooks are emitted when the returned channel drains (matching the behaviour of Route for non-streaming).

When MCP servers are configured the request is routed through Route instead so that the full agentic tool-call loop can run. The final response is wrapped into a single-chunk stream and returned to the caller (Phase 1 behaviour — true final-response streaming is Phase 1.5).

func (*Gateway) SetObservability ¶ added in v1.1.0

func (g *Gateway) SetObservability(p observability.Provider)

SetObservability installs an observability.Provider on the gateway. Pass observability.NoOp() to disable. The provider's StartRequestSpan is called at the top of Route and RouteStream; span attributes are populated incrementally as the request progresses through routing, provider execution, plugins, and final cost/usage calculation.

Safe to call only at startup, before serving traffic. The cmd/ferrogw wire-up constructs the provider via internal/otel.Init.

func (*Gateway) StartDiscovery ¶ added in v0.3.0

func (g *Gateway) StartDiscovery(ctx context.Context, interval time.Duration) error

StartDiscovery periodically refreshes model lists from providers that implement DiscoveryProvider. It runs in a background goroutine until ctx is cancelled. interval must be greater than zero; an error is returned otherwise.

type ObservabilityConfig ¶ added in v1.1.0

type ObservabilityConfig struct {
	// Tracing holds the OTLP tracing configuration. v1.1.0 ships
	// tracing only; metrics and logs exporters arrive in later
	// releases (see docs/OSS-ECOSYSTEM-ROADMAP.md).
	Tracing TracingConfig `json:"tracing,omitempty" yaml:"tracing,omitempty"`
	// Exporters lists the plugin observability exporters that should
	// receive gateway events (request completed / request failed).
	// Each entry names an exporter registered via
	// observability.RegisterExporter and carries its own Config block.
	// Exporters that are not registered at startup emit a warning and
	// are skipped — they do not prevent the gateway from starting.
	Exporters []ExporterConfig `json:"exporters,omitempty" yaml:"exporters,omitempty"`
}

ObservabilityConfig is the user-facing observability section of gateway config. It mirrors internal/otel.Config but lives here so the public Config schema does not pull in internal packages.

Standard OTEL_* environment variables (notably OTEL_EXPORTER_OTLP_ENDPOINT) always take precedence — this matches the OTel SDK convention required for predictable container deployments.

type PluginConfig ¶

type PluginConfig struct {
	Name    string                 `json:"name" yaml:"name"`
	Type    string                 `json:"type" yaml:"type"`
	Stage   string                 `json:"stage" yaml:"stage"`
	Enabled bool                   `json:"enabled" yaml:"enabled"`
	Config  map[string]interface{} `json:"config" yaml:"config"`
}

PluginConfig holds plugin configuration.

type RetryConfig ¶

type RetryConfig struct {
	// Attempts is the maximum number of attempts per target (1 = no retries).
	Attempts int `json:"attempts" yaml:"attempts"`
	// OnStatusCodes, when non-empty, limits retries to the listed HTTP status
	// codes. A retry is skipped when the provider returns a code not in the
	// list, and the strategy moves on to the next target immediately.
	// Leave empty to retry on any error (default behaviour).
	// Example: [429, 502, 503]
	OnStatusCodes []int `json:"on_status_codes,omitempty" yaml:"on_status_codes,omitempty"`
	// InitialBackoffMs is the base backoff in milliseconds for the exponential
	// back-off formula: delay = InitialBackoffMs * 2^(attempt-1).
	// Defaults to 100 ms when unset or zero.
	InitialBackoffMs int `json:"initial_backoff_ms,omitempty" yaml:"initial_backoff_ms,omitempty"`
}

RetryConfig defines retry behavior for the fallback strategy.

type StrategyConfig ¶

type StrategyConfig struct {
	Mode       StrategyMode `json:"mode" yaml:"mode"`
	Conditions []Condition  `json:"conditions,omitempty" yaml:"conditions,omitempty"` // For conditional routing
	// ContentConditions defines rules for the content-based routing strategy.
	// Rules are evaluated in order; the first match wins.
	ContentConditions []ContentCondition `json:"content_conditions,omitempty" yaml:"content_conditions,omitempty"`
	// ABVariants defines the weighted variants for the ab-test strategy.
	ABVariants []ABVariantConfig `json:"ab_variants,omitempty" yaml:"ab_variants,omitempty"`
}

StrategyConfig defines the routing strategy.

type StrategyMode ¶

type StrategyMode string

StrategyMode represents the routing strategy mode.

const (
	ModeSingle        StrategyMode = "single"
	ModeFallback      StrategyMode = "fallback"
	ModeLoadBalance   StrategyMode = "loadbalance"
	ModeConditional   StrategyMode = "conditional"
	ModeLatency       StrategyMode = "least-latency"
	ModeCostOptimized StrategyMode = "cost-optimized"
	ModeContentBased  StrategyMode = "content-based"
	ModeABTest        StrategyMode = "ab-test"
)

StrategyMode constants define the supported routing strategies.

type Target ¶

type Target struct {
	// VirtualKey is the unique identifier for the provider (or a virtual key in the vault).
	VirtualKey string `json:"virtual_key" yaml:"virtual_key"`
	// Weight is used for load balancing.
	Weight float64 `json:"weight,omitempty" yaml:"weight,omitempty"`
	// Retry configuration for this target.
	Retry *RetryConfig `json:"retry,omitempty" yaml:"retry,omitempty"`
	// CircuitBreaker configuration for this target (optional).
	CircuitBreaker *CircuitBreakerConfig `json:"circuit_breaker,omitempty" yaml:"circuit_breaker,omitempty"`
}

Target represents a specific provider target.

type TracingConfig ¶ added in v1.1.0

type TracingConfig struct {
	// Enabled is the master switch. Defaults to true; the pipeline
	// still short-circuits to NoOp when no OTLP endpoint is configured.
	Enabled bool `json:"enabled" yaml:"enabled"`
	// Endpoint overrides OTEL_EXPORTER_OTLP_ENDPOINT (host:port form).
	Endpoint string `json:"endpoint,omitempty" yaml:"endpoint,omitempty"`
	// Protocol selects the OTLP transport: "grpc" (default) or "http/protobuf".
	Protocol string `json:"protocol,omitempty" yaml:"protocol,omitempty"`
	// ServiceName populates the OTel service.name resource attribute.
	ServiceName string `json:"service_name,omitempty" yaml:"service_name,omitempty"`
	// SampleRatio is the head sampler ratio (0.0–1.0). Pointer so an
	// explicit 0.0 (sample nothing) is distinguishable from an omitted
	// field; nil falls back to the default of 1.0 (sample everything).
	SampleRatio *float64 `json:"sample_ratio,omitempty" yaml:"sample_ratio,omitempty"`
	// PrivacyLevel controls whether prompt/response content is exported.
	// One of: "none", "metadata" (default), "full".
	PrivacyLevel string `json:"privacy_level,omitempty" yaml:"privacy_level,omitempty"`
	// ShutdownGrace is the maximum time the gateway waits for in-flight
	// OTel exports to drain during graceful shutdown. Accepts any Go
	// duration string, e.g. "10s", "500ms". Defaults to 10s when empty
	// or unparseable.
	ShutdownGrace string `json:"shutdown_grace,omitempty" yaml:"shutdown_grace,omitempty"`
	// Headers are additional HTTP/gRPC metadata headers sent with every OTLP
	// export request. Use this to authenticate with managed backends such as
	// Datadog, New Relic, Honeycomb, or Grafana Cloud.
	//
	// SECURITY: prefer ${ENV_VAR} references for secret values — only the
	// template (e.g. "${DATADOG_API_KEY}") is persisted in config and returned
	// by the admin config API; the secret is resolved from the environment at
	// export time and never stored. A literal value IS persisted verbatim and
	// exposed via /admin/config, so do not hard-code raw secrets here. The
	// standard OTEL_EXPORTER_OTLP_HEADERS environment variable also applies per
	// OTel convention.
	Headers map[string]string `json:"headers,omitempty" yaml:"headers,omitempty"`
}

TracingConfig configures the OTLP tracing pipeline. All fields are optional; sensible defaults apply when omitted (see internal/otel.DefaultConfig).

Source Files ¶

View all Source files

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
ferrogw command Package main is the entry point for the ferrogw gateway server and CLI.	Package main is the entry point for the ferrogw gateway server and CLI.
internal
admin Package admin provides HTTP handlers for the gateway administration API.	Package admin provides HTTP handlers for the gateway administration API.
apierror Package apierror provides OpenAI-compatible JSON error response helpers.	Package apierror provides OpenAI-compatible JSON error response helpers.
bootstrap Package bootstrap provides env-driven factory functions for persistence backends.	Package bootstrap provides env-driven factory functions for persistence backends.
cache Package cache provides the CacheEntry and Cache interface used by the response-cache plugin.	Package cache provides the CacheEntry and Cache interface used by the response-cache plugin.
circuitbreaker Package circuitbreaker implements the circuit-breaker pattern for provider calls.	Package circuitbreaker implements the circuit-breaker pattern for provider calls.
cli Package cli provides shared types and helpers for the ferrogw CLI commands.	Package cli provides shared types and helpers for the ferrogw CLI commands.
dashboard Package dashboard provides template rendering and asset helpers for the gateway web dashboard.	Package dashboard provides template rendering and asset helpers for the gateway web dashboard.
discovery Package discovery provides shared helpers for providers that support live model enumeration via OpenAI-compatible GET /v1/models (or similar) endpoints.	Package discovery provides shared helpers for providers that support live model enumeration via OpenAI-compatible GET /v1/models (or similar) endpoints.
events Package events defines compact internal hook event payloads for the gateway hot path and converts them to the public map form only at dispatch time.	Package events defines compact internal hook event payloads for the gateway hot path and converts them to the public map form only at dispatch time.
handler Package handler provides HTTP handler functions for the OpenAI-compatible API.	Package handler provides HTTP handler functions for the OpenAI-compatible API.
httpclient Package httpclient provides the shared process-wide HTTP client used by providers so connection pooling is reused consistently under load.	Package httpclient provides the shared process-wide HTTP client used by providers so connection pooling is reused consistently under load.
httpserver Package httpserver provides HTTP server construction helpers for the gateway.	Package httpserver provides HTTP server construction helpers for the gateway.
latency Package latency provides a thread-safe rolling-window latency tracker used by the least-latency routing strategy to pick the fastest provider.	Package latency provides a thread-safe rolling-window latency tracker used by the least-latency routing strategy to pick the fastest provider.
logging Package logging provides structured JSON logging with trace ID propagation.	Package logging provides structured JSON logging with trace ID propagation.
mcp Package mcp implements the Model Context Protocol (MCP) 2025-11-25 Streamable HTTP transport for the Ferro Labs AI Gateway.	Package mcp implements the Model Context Protocol (MCP) 2025-11-25 Streamable HTTP transport for the Ferro Labs AI Gateway.
metrics Package metrics registers the Prometheus metrics used by the gateway.	Package metrics registers the Prometheus metrics used by the gateway.
middleware Package middleware provides HTTP middleware for the gateway server.	Package middleware provides HTTP middleware for the gateway server.
otel Package otel wires the gateway core to OpenTelemetry.	Package otel wires the gateway core to OpenTelemetry.
plugins/budget Package budget provides a gateway plugin that enforces per-API-key USD spend limits using in-memory accumulation.	Package budget provides a gateway plugin that enforces per-API-key USD spend limits using in-memory accumulation.
plugins/cache Package cache provides a response-cache plugin that stores LLM responses in memory and serves them on exact-match cache hits, reducing provider cost and latency for repeated requests.	Package cache provides a response-cache plugin that stores LLM responses in memory and serves them on exact-match cache hits, reducing provider cost and latency for repeated requests.
plugins/logger Package logger provides a request-logger plugin that records each LLM request and response to standard output.	Package logger provides a request-logger plugin that records each LLM request and response to standard output.
plugins/maxtoken Package maxtoken provides a max-token guardrail plugin that caps the max_tokens and message count on outgoing requests.	Package maxtoken provides a max-token guardrail plugin that caps the max_tokens and message count on outgoing requests.
plugins/ratelimit Package ratelimit provides a gateway plugin that enforces per-request rate limits using an in-memory token bucket.	Package ratelimit provides a gateway plugin that enforces per-request rate limits using an in-memory token bucket.
plugins/wordfilter Package wordfilter provides a word-filter guardrail plugin that rejects requests containing blocked words.	Package wordfilter provides a word-filter guardrail plugin that rejects requests containing blocked words.
proxy Package proxy provides a transparent pass-through HTTP reverse proxy that forwards unhandled /v1/* requests to the matching upstream provider.	Package proxy provides a transparent pass-through HTTP reverse proxy that forwards unhandled /v1/* requests to the matching upstream provider.
ratelimit Package ratelimit provides a simple in-memory token-bucket rate limiter.	Package ratelimit provides a simple in-memory token-bucket rate limiter.
redact Package redact strips sensitive substrings from text before it is emitted to logs or observability backends.	Package redact strips sensitive substrings from text before it is emitted to logs or observability backends.
requestlog Package requestlog provides persistent storage primitives for request/response logs.	Package requestlog provides persistent storage primitives for request/response logs.
sse Package sse provides Server-Sent Events streaming for OpenAI-compatible responses.	Package sse provides Server-Sent Events streaming for OpenAI-compatible responses.
strategies Package strategies implements the routing strategies used by the gateway.	Package strategies implements the routing strategies used by the gateway.
streamwrap Package streamwrap provides a metering wrapper for streaming LLM responses.	Package streamwrap provides a metering wrapper for streaming LLM responses.
testutil Package testutil provides shared test helpers.	Package testutil provides shared test helpers.
transport Package transport owns all HTTP transports used for upstream provider calls.	Package transport owns all HTTP transports used for upstream provider calls.
version Package version holds build-time version information for Ferro Labs AI Gateway binaries.	Package version holds build-time version information for Ferro Labs AI Gateway binaries.
mcp Package mcp exposes the public configuration types for Ferro Labs AI Gateway's MCP (Model Context Protocol) integration.	Package mcp exposes the public configuration types for Ferro Labs AI Gateway's MCP (Model Context Protocol) integration.
models Package models provides the model catalog — a structured map of every supported model's pricing, capabilities, and lifecycle metadata.	Package models provides the model catalog — a structured map of every supported model's pricing, capabilities, and lifecycle metadata.
observability Package observability is the public, semver-stable surface for the Ferro Labs AI Gateway observability subsystem.	Package observability is the public, semver-stable surface for the Ferro Labs AI Gateway observability subsystem.
plugin Package plugin defines the Plugin interface and the lifecycle stages used to hook into the gateway request pipeline.	Package plugin defines the Plugin interface and the lifecycle stages used to hook into the gateway request pipeline.
providers Package providers re-exports all contracts and types from providers/core as type aliases so that existing code importing this package continues to compile without any changes.	Package providers re-exports all contracts and types from providers/core as type aliases so that existing code importing this package continues to compile without any changes.
ai21 Package ai21 provides a client for the AI21 Labs API (Jamba and Jurassic models).	Package ai21 provides a client for the AI21 Labs API (Jamba and Jurassic models).
anthropic Package anthropic provides a client for the Anthropic API (Claude models).	Package anthropic provides a client for the Anthropic API (Claude models).
azure_foundry Package azurefoundry provides a client for the Azure AI Foundry API.	Package azurefoundry provides a client for the Azure AI Foundry API.
azure_openai Package azureopenai provides a client for the Azure OpenAI API.	Package azureopenai provides a client for the Azure OpenAI API.
bedrock Package bedrock provides a client for AWS Bedrock.	Package bedrock provides a client for AWS Bedrock.
cerebras Package cerebras provides a client for the Cerebras inference API.	Package cerebras provides a client for the Cerebras inference API.
cloudflare Package cloudflare provides a client for Cloudflare Workers AI.	Package cloudflare provides a client for Cloudflare Workers AI.
cohere Package cohere provides a client for the Cohere API.	Package cohere provides a client for the Cohere API.
core Package core defines the stable public contracts for the providers layer: interfaces, shared data types, and supporting helpers.	Package core defines the stable public contracts for the providers layer: interfaces, shared data types, and supporting helpers.
databricks Package databricks provides a client for the Databricks model serving API.	Package databricks provides a client for the Databricks model serving API.
deepinfra Package deepinfra provides a client for the DeepInfra OpenAI-compatible API.	Package deepinfra provides a client for the DeepInfra OpenAI-compatible API.
deepseek Package deepseek provides a client for the DeepSeek API.	Package deepseek provides a client for the DeepSeek API.
fireworks Package fireworks provides a client for the Fireworks AI API.	Package fireworks provides a client for the Fireworks AI API.
gemini Package gemini provides a client for the Google Gemini API.	Package gemini provides a client for the Google Gemini API.
groq Package groq provides a client for the Groq API.	Package groq provides a client for the Groq API.
hugging_face Package huggingface provides a client for the Hugging Face Inference API.	Package huggingface provides a client for the Hugging Face Inference API.
mistral Package mistral provides a client for the Mistral AI API.	Package mistral provides a client for the Mistral AI API.
moonshot Package moonshot provides a client for the Moonshot AI OpenAI-compatible API.	Package moonshot provides a client for the Moonshot AI OpenAI-compatible API.
novita Package novita provides a client for the Novita OpenAI-compatible API.	Package novita provides a client for the Novita OpenAI-compatible API.
nvidia_nim Package nvidianim provides a client for the NVIDIA NIM OpenAI-compatible API.	Package nvidianim provides a client for the NVIDIA NIM OpenAI-compatible API.
ollama Package ollama provides a client for the Ollama local LLM server.	Package ollama provides a client for the Ollama local LLM server.
ollama_cloud Package ollamacloud provides a client for the Ollama Cloud API.	Package ollamacloud provides a client for the Ollama Cloud API.
openai Package openai provides a client for the OpenAI API using the official Go SDK.	Package openai provides a client for the OpenAI API using the official Go SDK.
openrouter Package openrouter provides a client for the OpenRouter API.	Package openrouter provides a client for the OpenRouter API.
perplexity Package perplexity provides a client for the Perplexity AI API.	Package perplexity provides a client for the Perplexity AI API.
qwen Package qwen provides a client for the Alibaba Cloud DashScope OpenAI-compatible API.	Package qwen provides a client for the Alibaba Cloud DashScope OpenAI-compatible API.
replicate Package replicate provides a client for the Replicate API.	Package replicate provides a client for the Replicate API.
sambanova Package sambanova provides a client for the SambaNova OpenAI-compatible API.	Package sambanova provides a client for the SambaNova OpenAI-compatible API.
together Package together provides a client for the Together AI API.	Package together provides a client for the Together AI API.
vertex_ai Package vertexai provides a client for Google Vertex AI.	Package vertexai provides a client for Google Vertex AI.
xai Package xai provides a client for the xAI (Grok) API.	Package xai provides a client for the xAI (Grok) API.
scripts
catalog-check command catalog-check reads every "source" URL from models/catalog.json and performs a HEAD request against each one.	catalog-check reads every "source" URL from models/catalog.json and performs a HEAD request against each one.
web Package web contains embedded web UI template assets.	Package web contains embedded web UI template assets.