gateway

package module

v1.1.3 Latest Latest Go to latest Published: May 13, 2026 License: MIT Imports: 86 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/hanzoai/gateway

Links

Open Source Insights

README ¶

Hanzo Gateway

High-performance API gateway for Hanzo AI services. Routes 147+ API endpoints across production clusters with rate limiting, authentication forwarding, CORS, circuit breakers, and telemetry -- all driven by declarative JSON configuration.

Overview

Hanzo Gateway is the unified API entry point for all Hanzo platform traffic. It sits behind Hanzo Ingress (L7 reverse proxy) and routes requests to internal services with per-endpoint rate limiting, header forwarding, and circuit breaker protection.

Cluster	Domain	Endpoints	Rate Limit (global)	Rate Limit (per IP)
hanzo-k8s	`api.hanzo.ai`	133	5,000 req/s	100 req/s

The gateway can also be deployed independently by other organizations with their own configuration.

For full documentation, see docs.hanzo.ai/docs/services/gateway.

Architecture

                    Internet
                       |
              +--------+--------+
              |  Cloudflare CDN |
              +--------+--------+
                       |
              +--------+---------+
              | Hanzo Ingress    |
              | (L7 TLS/routing) |
              +--------+---------+
                       |
              +--------+---------+
              | Hanzo Gateway    |
              | 133 endpoints    |
              +---+----+----+----+
                  |    |    |
               Cloud  IAM  Commerce
               API         API

API Endpoints

OpenAI-Compatible LLM Routes (`api.hanzo.ai`)

These endpoints are fully compatible with the OpenAI API format. Point any OpenAI SDK client at https://api.hanzo.ai and it works out of the box.

Method	Path	Backend	Description
`POST`	`/v1/chat/completions`	cloud-api:8000	Chat completions (streaming and non-streaming)
`POST`	`/v1/completions`	cloud-api:8000	Text completions
`POST`	`/v1/messages`	cloud-api:8000	Anthropic Messages API compatibility
`GET`	`/v1/models`	cloud-api:8000	List available models
`POST`	`/v1/embeddings`	cloud-api:8000	Text embedding generation
`POST`	`/v1/images/generations`	cloud-api:8000	Image generation
`POST`	`/v1/audio/transcriptions`	cloud-api:8000	Audio transcription (Whisper)
`POST`	`/v1/audio/speech`	cloud-api:8000	Text-to-speech synthesis
`POST`	`/v1/zap`	cloud-api:8000	Hanzo Zap (structured extraction)
`POST`	`/v1/async-invoke`	cloud-api:8000	Async inference (long-running jobs)
`GET`	`/v1/async-invoke/{id}/status`	cloud-api:8000	Poll async job status
`GET`	`/v1/async-invoke/{id}`	cloud-api:8000	Retrieve async job result

Platform Service Routes (`api.hanzo.ai`)

All platform routes are available at both /{service}/* and /v1/{service}/*.

Path prefix	Backend	Description
`/auth/*`	iam:8000	IAM, authentication, OAuth
`/cloud/*`	cloud-api:8000	Cloud API (projects, deployments)
`/commerce/*`	commerce:8001	Commerce (orders, payments, products)
`/analytics/*`	analytics	Unified analytics and events
`/billing/*`	billing	Usage metering and invoicing
`/console/*`	console	Admin console API
`/agents/*`	agents	Agent orchestration
`/search/*`	search	AI-powered search
`/vector/*`	vector	Vector database operations
`/operative/*`	operative	Computer-use automation
`/bot/*`	bot	Bot framework (REST + WebSocket)
`/kms/*`	kms	Key management service
`/platform/*`	platform	PaaS deployment API
`/functions/*`	functions	Serverless functions
`/web3/*`	web3	Web3 and blockchain APIs
`/pricing/*`	pricing	Model pricing and rate cards
`/pricing/model/{name}`	pricing	Single model price lookup

Monitoring Endpoints

Path	Description
`/__health`	Gateway health check (port 8080)
`/health`	Application health check
`/pubsub/healthz`	PubSub health
`/pubsub/varz`	PubSub variables / metrics
`/pubsub/connz`	PubSub connections
`/pubsub/subsz`	PubSub subscriptions
`/pubsub/jsz`	PubSub JetStream

Model Routing

Hanzo Gateway proxies all LLM requests through the Hanzo Cloud API (cloud-api), which handles model routing, load balancing, and provider selection. The gateway itself is provider-agnostic -- it forwards authenticated requests and streams responses back to the client.

How It Works

Client                Gateway              Cloud API            Provider
  |                      |                     |                    |
  |-- POST /v1/chat ---->|                     |                    |
  |   model: "zen4"      |-- forward --------->|                    |
  |                      |                     |-- route to tier -->|
  |                      |                     |   (Fireworks)      |
  |<---- streaming ------|----- streaming -----|<--- streaming -----|

The client sends a request to api.hanzo.ai/v1/chat/completions with a model field.
The gateway forwards the request (with all auth headers) to the Cloud API backend.
The Cloud API resolves the model name to a provider and endpoint based on the model's tier and availability.
Responses stream back through the gateway to the client with no buffering.

Model Tiers

Tier	Models (examples)	Provider	Notes
Free	`zen3-nano`, `zen4-mini`	Hanzo DO cluster	Best-effort, rate-limited
Standard	`zen4-pro`, `zen3-vl`, `zen4-coder-flash`	Fireworks, Together	Low latency, high availability
Premium	`zen4`, `zen4-max`, `zen4-ultra`	Fireworks	Dedicated capacity, highest throughput
Third-party	`gpt-4o`, `claude-sonnet-4-20250514`, `gemini-2.5-pro`	OpenAI, Anthropic, Google	Pass-through with unified billing

The gateway does not need to know about model tiers -- it passes all requests to the Cloud API, which handles routing logic, fallback, and retries. Model availability is returned by GET /v1/models.

Authentication

All requests to api.hanzo.ai require a valid API key. Keys are issued through the Hanzo Console and scoped to a project.

API Key Authentication

Pass your API key in the Authorization header using the Bearer scheme:

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer hk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen4-pro",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Auth Flow

Client --> Gateway --> Cloud API --> IAM (hanzo.id)
                                        |
                                    Validate key
                                    Resolve org/project
                                    Check rate limits
                                    Return user context

The gateway validates bearer JWTs against IAM (hanzo.id) using JWKS and re-emits the canonical 3 identity headers. Opaque API keys (hk-, sk-, fw_, hz_, pk-) pass through to the backend services for validation.

Header Forwarding

After JWT validation, the gateway emits exactly three canonical identity headers to downstream services and strips every other vendor/legacy variant on ingress:

X-User-Id -- user ID from JWT sub claim
X-Org-Id -- org slug from JWT owner claim
X-Roles -- comma-joined role names from JWT roles claim

Auxiliary headers emitted by the gateway (derivatives of the JWT):

X-User-Email -- email from JWT email claim
X-Phone-Number -- phone from JWT phone_number/phone claim
X-User-IsAdmin -- "true" when the JWT asserts isAdmin

Standard passthrough headers:

Authorization -- Bearer token or API key
Content-Type -- Request body encoding
Accept -- Response format preference
X-Request-ID -- Client-provided request tracing ID

Headers stripped unconditionally on ingress (never trusted from clients): X-User-Id, X-Org-Id, X-Roles, X-User-Email, X-Phone-Number, X-User-IsAdmin, X-User-Role, X-User-Roles, X-User-Name, X-Tenant-Id, X-Org, and every X-IAM-* / X-HANZO-* variant.

Rate Limiting

The gateway enforces rate limits at two levels: global (across all clients) and per-client (by IP address).

Global Configuration

{
  "extra_config": {
    "qos/ratelimit/router": {
      "max_rate": 5000,
      "client_max_rate": 100,
      "strategy": "ip"
    }
  }
}

Parameter	Description	Default
`max_rate`	Total requests/second across all clients	5,000
`client_max_rate`	Requests/second per client IP	100
`strategy`	Client identification method	`ip`

Per-Endpoint Overrides

Individual endpoints can override the global limits. This is useful for high-traffic inference routes or sensitive administrative endpoints:

{
  "endpoint": "/v1/chat/completions",
  "method": "POST",
  "extra_config": {
    "qos/ratelimit/router": {
      "max_rate": 10000,
      "client_max_rate": 50,
      "strategy": "ip",
      "every": "1s"
    }
  }
}

The every field sets the time window for the rate counter. Default is "1s" (per second). Set to "1m" for per-minute limits.

Rate Limit Responses

When a client exceeds their rate limit, the gateway returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{"message": "rate limit exceeded"}

Observability

Logging

Structured logging is enabled by default with the [GATEWAY] prefix:

{
  "extra_config": {
    "telemetry/logging": {
      "level": "INFO",
      "prefix": "[GATEWAY]",
      "syslog": false,
      "stdout": true
    }
  }
}

Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.

Health Check

# Gateway health (always returns 200 when the process is up)
curl http://localhost:8080/__health

# Application health (checks backend connectivity)
curl https://api.hanzo.ai/health

Metrics

The gateway exposes Prometheus-compatible metrics for scraping. Key metrics include:

Request count by endpoint and status code
Response latency histograms
Backend connection pool utilization
Circuit breaker state transitions
Rate limiter rejection counts

Circuit Breakers

Backend failures are automatically isolated. When a backend exceeds the error threshold, the circuit opens and requests are rejected immediately until the backend recovers. This prevents cascade failures across services.

Quick Start

Build from Source

# Build gateway binary
make build

# Build ingress sidecar binary
make build-ingress

# Run tests
make test

# Validate all configs
make validate

Run Locally

# Run with default config
./gateway run -c configs/hanzo/gateway.json

Docker

# Pull and run the latest image
docker run -p 8080:8080 ghcr.io/hanzoai/gateway:latest

# Build from source
make docker

Docker Compose

services:
  gateway:
    image: ghcr.io/hanzoai/gateway:latest
    ports:
      - "8080:8080"
    volumes:
      - ./configs/hanzo/gateway.json:/etc/gateway/gateway.json:ro
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:8080/__health"]
      interval: 10s
      timeout: 3s
      retries: 3
    restart: unless-stopped

Save as compose.yml and run:

docker compose up -d

Production Deployment

Hanzo Gateway runs on the hanzo-k8s DOKS cluster (do-sfo3-hanzo-k8s) in the hanzo namespace. Continuous deployment is handled by GitHub Actions -- every push to main builds a new image, applies the ConfigMap, and performs a rolling restart.

Deploy

# Apply config and restart pods
make deploy-hanzo

# Check status
make status

# Tail logs
make logs-hanzo

Infrastructure Details

Property	Value
Image	`ghcr.io/hanzoai/gateway:latest`
Replicas	2
Service type	ClusterIP (behind Ingress)
Namespace	`hanzo`
K8s context	`do-sfo3-hanzo-k8s`
Health check	`GET /__health` :8080
CI/CD	GitHub Actions (deploy.yml)

K8s Manifests

k8s/
  hanzo/
    deployment.yaml     # Gateway deployment (2 replicas)
    service.yaml        # ClusterIP service
    ingress.yaml        # Ingress resource for api.hanzo.ai

Configuration

All routing is defined in JSON configuration files. Each cluster has its own config.

Editing Routes

Edit the config file:
```
$EDITOR configs/hanzo/gateway.json
```
Validate the config:
```
make validate
```
Deploy:
```
make deploy-hanzo
```

The Makefile creates a ConfigMap from the JSON file and triggers a rolling restart.

Config Structure

{
  "version": 3,
  "name": "Hanzo API Gateway",
  "port": 8080,
  "timeout": "120s",
  "extra_config": {
    "router": {
      "return_error_msg": true
    },
    "qos/ratelimit/router": {
      "max_rate": 5000,
      "client_max_rate": 100,
      "strategy": "ip"
    },
    "telemetry/logging": {
      "level": "INFO",
      "prefix": "[GATEWAY]",
      "stdout": true
    }
  },
  "endpoints": [
    {
      "endpoint": "/v1/chat/completions",
      "method": "POST",
      "input_headers": ["*"],
      "output_encoding": "no-op",
      "backend": [{
        "url_pattern": "/api/chat/completions",
        "host": ["http://cloud-api.hanzo.svc.cluster.local:8000"],
        "encoding": "no-op"
      }]
    }
  ]
}

Repository Structure

configs/
  hanzo/
    gateway.json        # Hanzo API Gateway config (133 endpoints)
    ingress.json        # Hanzo Ingress sidecar config
k8s/
  hanzo/                # K8s manifests for hanzo-k8s cluster
cmd/
  gateway/              # Gateway binary entry point
  ingress/              # Ingress sidecar binary entry point
tests/                  # Integration tests
Dockerfile              # Multi-stage build (Go 1.25 + Alpine 3.23)
Makefile                # Build, test, validate, deploy commands

DNS

Domain	Path	Target
`*.hanzo.ai`	Cloudflare	hanzo-k8s LB -> Ingress -> Gateway

Hanzo Gateway is one of four products in the Hanzo AI infrastructure stack:

Product	Role	Repository
Hanzo Ingress	L7 reverse proxy, TLS termination, load balancing	`hanzoai/ingress`
Hanzo Gateway	API gateway, rate limiting, endpoint routing	`hanzoai/gateway`
Hanzo Engine	GPU inference engine, model serving	`hanzoai/engine`
Hanzo Edge	On-device inference runtime (mobile, web, embedded)	`hanzoai/edge`

Internet -> Ingress (TLS/L7) -> Gateway (API routing) -> Engine (inference) / Cloud API / Services
                                                          Edge (on-device, client-side)

License

MIT -- see LICENSE.

Documentation ¶

Overview ¶

Package gateway — base_ha backend upstream.

Implements the `base_ha` upstream kind for Hanzo Base HA clusters (hanzoai/base-ha). One writer, N replicas. The gateway is the leader tracker — clients never see a 307.

Per-backend `extra_config`:

"github.com/hanzoai/gateway/base_ha": {
  "service_dns":           "foo-hs.hanzo.svc.cluster.local",
  "port":                  8090,
  "leader_poll_interval":  "1s",
  "write_methods":         ["POST","PUT","PATCH","DELETE"],
  "read_your_writes_ttl":  "5s"
}

The enclosing backend's single Host is the ClusterIP service (round-robin for reads). The base_ha factory overrides the transport:

write methods (or X-Base-Writer: required header) → writer pod
reads → round-robin via the ClusterIP service
for 5s after a write, same client (X-Forwarded-For + X-Org-Id) pins to the writer for read-your-writes consistency

Leader discovery: a single goroutine per service_dns polls GET http://{service_dns}:{port}/_ha/leader every leader_poll_interval. The response is stored in an atomic.Pointer so the hot path is lock-free. On writer 5xx/connect-refused, the poller is force-refreshed and one retry is issued. Two consecutive 5xx = hard fail (no retry storm on OOM).

Package gateway — base-network backend upstream.

Implements `base-network://<service>` routing for base/network-enabled services (ATS, BD, TA, IAM, KMS, AML). See ~/work/hanzo/base/docs/NETWORK.md.

Per-backend `extra_config`:

"github.com/hanzoai/gateway/base-network": {
  "shard_key":        "user_id",
  "shard_key_source": "jwt.sub"    // | jwt.owner | header:X-Shard | cookie:sid
}

The enclosing backend's `host` names the headless Service whose members are polled at GET /-/base/members every 5 s. Writes route to a single rendezvous-hash owner pod; reads spread across the shard's member subset. A 307 from a pod we picked means the target isn't caught up — follow once.

Package gateway rebrands every user-visible Lura/KrakenD identifier so that nothing a client or backend sees contains the string "krakend" or "KrakenD".

Lura (the upstream SDK we build on) emits several hard-coded brand strings:

response header "X-KRAKEND"
response header "X-Krakend-Completed"
outbound backend User-Agent "KrakenD Version x.y"
core.KrakendHeaderValue "Version x.y"

The response header name "X-KRAKEND" is a const in lura/core and cannot be reassigned at runtime; it is stripped and replaced by the BrandingMiddleware below. Everything else is a package-level var and is reassigned in init().

Index ¶

Constants
Variables
func AddCheck(f func(context.Context, *cobra.Command, string, string) bool)
func BaseHABackendFactory(logger logging.Logger, next proxy.BackendFactory) proxy.BackendFactory
func BaseNetworkBackendFactory(logger logging.Logger, next proxy.BackendFactory) proxy.BackendFactory
func BrandingMiddleware() gin.HandlerFunc
func InitZapListenerFromEnv()
func LoadPlugins(folder, pattern string, logger logging.Logger)
func LoadPluginsWithContext(ctx context.Context, folder, pattern string, logger logging.Logger)
func LoadRoutes(cfg *RoutesConfig) error
func LoadRoutesFromFile(path string) error
func NewAuthMiddleware(cfg AuthConfig) gin.HandlerFunc
func NewBackendFactory(logger logging.Logger, metricCollector *metrics.Metrics) proxy.BackendFactory
func NewBackendFactoryWithContext(ctx context.Context, logger logging.Logger, metricCollector *metrics.Metrics) proxy.BackendFactory
func NewEngine(cfg config.ServiceConfig, opt luragin.EngineOptions) *gin.Engine
func NewExecutor(ctx context.Context) cmd.Executor
func NewHandlerFactory(logger logging.Logger, metricCollector *metrics.Metrics, ...) router.HandlerFactory
func NewProxyFactory(logger logging.Logger, backendFactory proxy.BackendFactory, ...) proxy.Factory
func NewTestPluginCmd() cmd.Command
func NewTestPluginCmdWithArgs(flags ...cmd.FlagBuilder) cmd.Command
func NewWidgetSecurityMiddleware(cfg WidgetSecurityConfig) gin.HandlerFunc
func RegisterEncoders()
func RegisterSubscriberFactories(_ context.Context, _ config.ServiceConfig, _ logging.Logger) func(n string, p int)
func StartZapListener(cfg ZapListenerConfig) error
func StopZapListener()
func ZapBackendFactory(logger logging.Logger, next proxy.BackendFactory) proxy.BackendFactory
type AgentStarter
type AuthConfig
- func DefaultAuthConfig() AuthConfig
type BackendFactory
type BaseHAConfig
type BaseNetworkConfig
type BloomFilterJWT
- func (BloomFilterJWT) NewTokenRejecter(ctx context.Context, cfg config.ServiceConfig, l logging.Logger, ...) (jose.ChainedRejecterFactory, error)
type DefaultRunServerFactory
- func (*DefaultRunServerFactory) NewRunServer(l logging.Logger, next router.RunServerFunc) RunServer
type EngineFactory
type ExecutorBuilder
- func (e *ExecutorBuilder) NewCmdExecutor(ctx context.Context) cmd.Executor
type HandlerFactory
type LoggerBuilder
- func (LoggerBuilder) NewLogger(cfg config.ServiceConfig) (logging.Logger, io.Writer, error)
type LoggerFactory
type MetricsAndTraces
- func (m *MetricsAndTraces) Close()
- func (m *MetricsAndTraces) Register(ctx context.Context, cfg config.ServiceConfig, l logging.Logger) *metrics.Metrics
type MetricsAndTracesRegister
type PluginLoader
type PluginLoaderWithContext
type ProxyFactory
type RouteEntry
type RoutesConfig
type RunServer
type RunServerFactory
type SubscriberFactoriesRegister
type TokenRejecterFactory
type WidgetSecurityConfig
- func DefaultWidgetSecurityConfig() WidgetSecurityConfig
type ZapConfig
type ZapListenerConfig

Constants ¶

View Source

const (
	// ZapNamespace is the config key for ZAP backend configuration.
	ZapNamespace = "github.com/hanzoai/gateway/zap"

	// ZAP message types for gateway RPC
	MsgTypeHTTPRequest  uint16 = 200
	MsgTypeHTTPResponse uint16 = 201
)

View Source

const BaseHANamespace = "github.com/hanzoai/gateway/base_ha"

BaseHANamespace is the extra_config key for base_ha upstreams.

View Source

const BaseNetworkNamespace = "github.com/hanzoai/gateway/base-network"

View Source

const CompletedHeader = "X-Gateway-Completed"

CompletedHeader is the user-visible response header that replaces "X-Krakend-Completed". It keeps the same "true" / "false" semantic so any client depending on the completion flag only needs to change the header key.

View Source

const PoweredByHeader = "X-Powered-By"

PoweredByHeader is the user-visible response header that replaces "X-KRAKEND".

Variables ¶

View Source

var BrandName = "hanzoai/gateway"

BrandName is the canonical rebrand string emitted on responses and outbound backend calls. Overridable at build time via:

-ldflags "-X github.com/hanzoai/gateway.BrandName=..."

Functions ¶

func AddCheck ¶

func AddCheck(f func(context.Context, *cobra.Command, string, string) bool)

func BaseHABackendFactory ¶ added in v1.0.0

func BaseHABackendFactory(logger logging.Logger, next proxy.BackendFactory) proxy.BackendFactory

BaseHABackendFactory wraps the next BackendFactory with base_ha routing. Backends without BaseHANamespace in their extra_config fall through.

func BaseNetworkBackendFactory ¶ added in v1.0.0

func BaseNetworkBackendFactory(logger logging.Logger, next proxy.BackendFactory) proxy.BackendFactory

BaseNetworkBackendFactory wraps `next` and intercepts backends carrying a BaseNetworkNamespace extra_config block. All others fall through.

func BrandingMiddleware ¶ added in v1.0.0

func BrandingMiddleware() gin.HandlerFunc

BrandingMiddleware strips any residual lura-emitted branding headers that leak through because their names are compile-time consts in lura/core, and replaces them with canonical Hanzo equivalents. Must run before the lura endpoint handler so the wrapped writer is in place by the time lura calls c.Header().

func InitZapListenerFromEnv ¶

func InitZapListenerFromEnv()

InitZapListenerFromEnv initializes the ZAP listener from environment variables. Set ZAP_LISTENER_ENABLED=true to enable.

func LoadPlugins ¶

func LoadPlugins(folder, pattern string, logger logging.Logger)

LoadPlugins loads and registers the plugins so they can be used if enabled at the configuration

func LoadPluginsWithContext ¶

func LoadPluginsWithContext(ctx context.Context, folder, pattern string, logger logging.Logger)

func LoadRoutes ¶

func LoadRoutes(cfg *RoutesConfig) error

LoadRoutes loads routing config from YAML. Called at startup and on hot-reload.

func LoadRoutesFromFile ¶

func LoadRoutesFromFile(path string) error

LoadRoutesFromFile loads routes from a YAML file.

func NewAuthMiddleware ¶

func NewAuthMiddleware(cfg AuthConfig) gin.HandlerFunc

NewAuthMiddleware creates a gin middleware that validates IAM JWT tokens, checks billing status, and injects identity headers for downstream services.

Canonical identity headers (the only ones downstream services should rely on):

X-User-Id: user ID from JWT "sub" (fallback: preferred_username, name)
X-Org-Id: org slug from JWT "owner" claim
X-Roles: comma-joined role names from JWT "roles" claim
X-User-Permissions: base-10 int64 bit-field derived from JWT permissions
isAdmin (commerce treats absent/0 as no rights).

Auxiliary headers (derivatives of the JWT for convenience):

X-User-Email: email from JWT "email" claim
X-Phone-Number: phone from JWT "phone_number" or "phone" claim
X-User-IsAdmin: "true" if the JWT asserts isAdmin

Trust boundary: all of the above are stripped on ingress (see stripIdentityHeaders) and only re-set after the JWT is validated. A client-supplied X-User-Permissions can NEVER reach a downstream service — Red P0-1 (2026-04-27).

Billing:

Checks commerce service for positive balance
Fail-open: if billing service is unreachable, request proceeds
If balance <= 0: returns 402 Payment Required

Public endpoints (configurable allowlist) bypass all auth checks.

func NewBackendFactory ¶

func NewBackendFactory(logger logging.Logger, metricCollector *metrics.Metrics) proxy.BackendFactory

NewBackendFactory creates a BackendFactory by stacking all the available middlewares: - oauth2 client credentials - http cache - martian - pubsub - amqp - cel - lua - rate-limit - circuit breaker - metrics collector - opencensus collector

func NewBackendFactoryWithContext ¶

func NewBackendFactoryWithContext(ctx context.Context, logger logging.Logger, metricCollector *metrics.Metrics) proxy.BackendFactory

NewBackendFactoryWithContext creates a BackendFactory by stacking all the available middlewares and injecting the received context

func NewEngine ¶

func NewEngine(cfg config.ServiceConfig, opt luragin.EngineOptions) *gin.Engine

NewEngine creates a new gin engine with middlewares and routing.

func NewExecutor ¶

func NewExecutor(ctx context.Context) cmd.Executor

NewExecutor returns an executor for the cmd package. The executor initalizes the entire gateway by registering the components and composing a RouterFactory wrapping all the middlewares.

func NewHandlerFactory ¶

func NewHandlerFactory(logger logging.Logger, metricCollector *metrics.Metrics, rejecter jose.RejecterFactory) router.HandlerFactory

NewHandlerFactory returns a HandlerFactory with a rate-limit and a metrics collector middleware injected

func NewProxyFactory ¶

func NewProxyFactory(logger logging.Logger, backendFactory proxy.BackendFactory, metricCollector *metrics.Metrics) proxy.Factory

NewProxyFactory returns a new ProxyFactory wrapping the injected BackendFactory with the default proxy stack and a metrics collector

func NewTestPluginCmd ¶

func NewTestPluginCmd() cmd.Command

func NewTestPluginCmdWithArgs ¶

func NewTestPluginCmdWithArgs(flags ...cmd.FlagBuilder) cmd.Command

func NewWidgetSecurityMiddleware ¶

func NewWidgetSecurityMiddleware(cfg WidgetSecurityConfig) gin.HandlerFunc

NewWidgetSecurityMiddleware creates a gin middleware that enforces:

Per-IP rate limiting for widget keys (hz_*): prevents any single IP from abusing the free widget endpoint.
Global rate limiting across all widget requests: prevents distributed attacks from exhausting model API budget.
Origin validation: widget keys are only accepted from approved Hanzo domains (Origin or Referer header). Non-browser clients (curl, scripts) that omit Origin headers are rejected for widget keys.

This middleware MUST run after NewAuthMiddleware. It only acts on requests with hz_ bearer tokens; all other requests pass through untouched.

func RegisterEncoders ¶

func RegisterEncoders()

RegisterEncoders registers all the available encoders

func RegisterSubscriberFactories ¶

func RegisterSubscriberFactories(_ context.Context, _ config.ServiceConfig, _ logging.Logger) func(n string, p int)

RegisterSubscriberFactories registers all the available sd adaptors

func StartZapListener ¶

func StartZapListener(cfg ZapListenerConfig) error

StartZapListener starts a TLS 1.3+PQ listener on the given port. External clients (e.g. dev CLI) connect here with TLS-wrapped ZAP binary. Each accepted TLS connection is transparently proxied to the internal ZAP node (started by the ZapBackendFactory pool on the internal port), which handles the ZAP handshake, message dispatch, and forwarding to cloud-api.

func StopZapListener ¶

func StopZapListener()

StopZapListener gracefully shuts down the ZAP TLS listener.

func ZapBackendFactory ¶

func ZapBackendFactory(logger logging.Logger, next proxy.BackendFactory) proxy.BackendFactory

ZapBackendFactory wraps a standard BackendFactory and adds ZAP transport support. Backends with "github.com/hanzoai/gateway/zap" in their extra_config will use ZAP binary transport instead of HTTP.

Types ¶

type AgentStarter ¶

type AgentStarter interface {
	Start(
		context.Context,
		[]*config.AsyncAgent,
		logging.Logger,
		chan<- string,
		proxy.Factory,
	) func() error
}

AgentStarter defines a type that starts a set of agents

type AuthConfig ¶

type AuthConfig struct {
	// Enabled controls whether the auth middleware is active.
	// Default: true. Set to false via AUTH_ENABLED=false to disable
	// all auth checks (useful for integration tests and development).
	Enabled bool

	// JWKS URL to fetch signing keys (default: https://hanzo.id/.well-known/jwks)
	JWKSURL string

	// Expected JWT issuer (default: https://hanzo.id)
	Issuer string

	// Expected JWT audience (default: https://api.hanzo.ai)
	Audience string

	// Billing check endpoint (default: http://commerce.hanzo.svc.cluster.local:8001)
	BillingURL string

	// BillingToken is the COMMERCE_SERVICE_TOKEN for authenticating with Commerce.
	BillingToken string

	// BillingEnabled controls whether billing checks are performed.
	// Default: true (checks enabled). Set to false to disable.
	BillingEnabled bool

	// Paths that bypass auth entirely (exact prefix match)
	PublicPaths []string

	// Hosts that bypass auth entirely (e.g. hanzo.id for login)
	PublicHosts []string

	// If true, requests without a token are rejected (402/401).
	// If false (default), requests without a token pass through without headers.
	RequireAuth bool
}

AuthConfig holds configuration for the auth middleware.

func DefaultAuthConfig ¶

func DefaultAuthConfig() AuthConfig

DefaultAuthConfig returns the default auth configuration from environment variables.

type BackendFactory ¶

type BackendFactory interface {
	NewBackendFactory(context.Context, logging.Logger, *metrics.Metrics) proxy.BackendFactory
}

BackendFactory returns a Gateway backend factory, ready to be passed to the Gateway proxy factory

type BaseHAConfig ¶ added in v1.0.0

type BaseHAConfig struct {
	// ServiceDNS is the headless or ClusterIP DNS name of the base-ha
	// service. Used only for the /_ha/leader poll — the actual read path
	// uses the enclosing backend Host (round-robin).
	ServiceDNS string `json:"service_dns"`
	// Port is the HTTP port the base-ha pods listen on.
	Port int `json:"port"`
	// LeaderPollInterval is parsed by time.ParseDuration. Default 1s.
	LeaderPollInterval string `json:"leader_poll_interval"`
	// WriteMethods lists HTTP methods that require writer pinning.
	// Default: POST, PUT, PATCH, DELETE.
	WriteMethods []string `json:"write_methods"`
	// ReadYourWritesTTL is how long after a write the same client pins
	// to the writer for reads. Default 5s. Set to "0s" to disable.
	ReadYourWritesTTL string `json:"read_your_writes_ttl"`
}

BaseHAConfig mirrors the JSON schema documented above. Zero-value defaults are applied at factory time.

type BaseNetworkConfig ¶ added in v1.0.0

type BaseNetworkConfig struct {
	ShardKey             string `json:"shard_key"`
	ShardKeySource       string `json:"shard_key_source"`
	MemberPollIntervalMS int    `json:"member_poll_interval_ms"`
}

type BloomFilterJWT ¶

type BloomFilterJWT struct{}

BloomFilterJWT is the default TokenRejecterFactory implementation.

func (BloomFilterJWT) NewTokenRejecter ¶

func (BloomFilterJWT) NewTokenRejecter(ctx context.Context, cfg config.ServiceConfig, l logging.Logger, reg func(n string, p int)) (jose.ChainedRejecterFactory, error)

NewTokenRejecter registers the bloomfilter component and links it to a token rejecter. Then it returns a chained rejecter factory with the created token rejecter and other based on the CEL component.

type DefaultRunServerFactory ¶

type DefaultRunServerFactory struct{}

DefaultRunServerFactory creates the default RunServer by wrapping the injected RunServer with the plugin loader and the CORS module

func (*DefaultRunServerFactory) NewRunServer ¶

func (*DefaultRunServerFactory) NewRunServer(l logging.Logger, next router.RunServerFunc) RunServer

type EngineFactory ¶

type EngineFactory interface {
	NewEngine(config.ServiceConfig, router.EngineOptions) *gin.Engine
}

EngineFactory returns a gin engine, ready to be passed to the Gateway RouterFactory

type ExecutorBuilder ¶

type ExecutorBuilder struct {
	// PluginLoader is deprecated: Use PluginLoaderWithContext
	PluginLoader                PluginLoader
	PluginLoaderWithContext     PluginLoaderWithContext
	LoggerFactory               LoggerFactory
	SubscriberFactoriesRegister SubscriberFactoriesRegister
	TokenRejecterFactory        TokenRejecterFactory
	MetricsAndTracesRegister    MetricsAndTracesRegister
	EngineFactory               EngineFactory

	ProxyFactory        ProxyFactory
	BackendFactory      BackendFactory
	HandlerFactory      HandlerFactory
	RunServerFactory    RunServerFactory
	AgentStarterFactory AgentStarter

	Middlewares []gin.HandlerFunc
}

ExecutorBuilder is a composable builder. Every injected property is used by the NewCmdExecutor method.

func (*ExecutorBuilder) NewCmdExecutor ¶

func (e *ExecutorBuilder) NewCmdExecutor(ctx context.Context) cmd.Executor

NewCmdExecutor returns an executor for the cmd package. The executor initializes the entire gateway by delegating most of the tasks to the injected collaborators. They register the components and compose a RouterFactory wrapping all the middlewares. Every nil collaborator is replaced by the default one offered by this package.

type HandlerFactory ¶

type HandlerFactory interface {
	NewHandlerFactory(logging.Logger, *metrics.Metrics, jose.RejecterFactory) router.HandlerFactory
}

HandlerFactory returns a Gateway router handler factory, ready to be passed to the Gateway RouterFactory

type LoggerBuilder ¶

type LoggerBuilder struct{}

LoggerBuilder is the default BuilderFactory implementation.

func (LoggerBuilder) NewLogger ¶

func (LoggerBuilder) NewLogger(cfg config.ServiceConfig) (logging.Logger, io.Writer, error)

NewLogger sets up the logging components as defined at the configuration.

type LoggerFactory ¶

type LoggerFactory interface {
	NewLogger(config.ServiceConfig) (logging.Logger, io.Writer, error)
}

LoggerFactory returns a Gateway Logger factory, ready to be passed to the Gateway RouterFactory

type MetricsAndTraces ¶

type MetricsAndTraces struct {
	// contains filtered or unexported fields
}

MetricsAndTraces is the default implementation of the MetricsAndTracesRegister interface.

func (*MetricsAndTraces) Close ¶

func (m *MetricsAndTraces) Close()

func (*MetricsAndTraces) Register ¶

func (m *MetricsAndTraces) Register(ctx context.Context, cfg config.ServiceConfig, l logging.Logger) *metrics.Metrics

Register registers the metrics, influx and opencensus packages as required by the given configuration.

type MetricsAndTracesRegister ¶

type MetricsAndTracesRegister interface {
	Register(context.Context, config.ServiceConfig, logging.Logger) *metrics.Metrics
}

MetricsAndTracesRegister registers the defined observability components and returns a metrics collector, if required.

type PluginLoader ¶

type PluginLoader interface {
	Load(folder, pattern string, logger logging.Logger)
}

PluginLoader defines the interface for the collaborator responsible of starting the plugin loaders Deprecated: Use PluginLoaderWithContext

type PluginLoaderWithContext ¶

type PluginLoaderWithContext interface {
	LoadWithContext(ctx context.Context, folder, pattern string, logger logging.Logger)
}

PluginLoaderWithContext defines the interface for the collaborator responsible of starting the plugin loaders

type ProxyFactory ¶

type ProxyFactory interface {
	NewProxyFactory(logging.Logger, proxy.BackendFactory, *metrics.Metrics) proxy.Factory
}

ProxyFactory returns a Gateway proxy factory, ready to be passed to the Gateway RouterFactory

type RouteEntry ¶

type RouteEntry struct {
	Prefix  string `yaml:"prefix" json:"prefix"`
	Backend string `yaml:"backend" json:"backend"`
	Rewrite string `yaml:"rewrite,omitempty" json:"rewrite,omitempty"` // optional: rewrite prefix
}

RouteEntry maps a path prefix to a backend URL.

type RoutesConfig ¶

type RoutesConfig struct {
	Redirects  map[string]string       `yaml:"redirects" json:"redirects"`
	Routes     map[string][]RouteEntry `yaml:"routes" json:"routes"`
	Subdomains map[string]string       `yaml:"subdomains" json:"subdomains"`
}

RoutesConfig is the YAML structure for gateway routing. Loaded from KMS (GATEWAY_ROUTES_KMS_PATH) or local file (GATEWAY_ROUTES_FILE).

type RunServer ¶

type RunServer func(context.Context, config.ServiceConfig, http.Handler) error

RunServer defines the interface of a function used by the Gateway router to start the service

type RunServerFactory ¶

type RunServerFactory interface {
	NewRunServer(logging.Logger, router.RunServerFunc) RunServer
}

RunServerFactory returns a RunServer with several wraps around the injected one

type SubscriberFactoriesRegister ¶

type SubscriberFactoriesRegister interface {
	Register(context.Context, config.ServiceConfig, logging.Logger) func(string, int)
}

SubscriberFactoriesRegister registers all the required subscriber factories from the available service discover components and adapters and returns a service register function. The service register function will register the service by the given name and port to all the available service discover clients

type TokenRejecterFactory ¶

type TokenRejecterFactory interface {
	NewTokenRejecter(context.Context, config.ServiceConfig, logging.Logger, func(string, int)) (jose.ChainedRejecterFactory, error)
}

TokenRejecterFactory returns a jose.ChainedRejecterFactory containing all the required jose.RejecterFactory. It also should setup and manage any service related to the management of the revocation process, if required.

type WidgetSecurityConfig ¶

type WidgetSecurityConfig struct {
	// MaxRequestsPerIP is the maximum number of widget requests per IP
	// within the rate limit window. Default: 10.
	MaxRequestsPerIP int

	// Window is the sliding window duration for per-IP rate limiting.
	// Default: 1 minute.
	Window time.Duration

	// GlobalMaxRequests is the maximum total widget requests across all
	// IPs within the window. Protects against distributed abuse.
	// Default: 600.
	GlobalMaxRequests int

	// AllowedOrigins is the set of origin domains allowed for widget
	// requests. If empty, origin checking is disabled.
	AllowedOrigins []string

	// CleanupInterval controls how often stale entries are evicted
	// from the per-IP rate limit map. Default: 5 minutes.
	CleanupInterval time.Duration
}

WidgetSecurityConfig holds configuration for widget key rate limiting and origin validation.

func DefaultWidgetSecurityConfig ¶

func DefaultWidgetSecurityConfig() WidgetSecurityConfig

DefaultWidgetSecurityConfig returns safe defaults.

AllowedOrigins can be overridden via WIDGET_ALLOWED_ORIGINS env var (comma-separated list of bare hostnames, no scheme/port). Subdomain matches are automatic: "hanzo.ai" also allows "*.hanzo.ai".

type ZapConfig ¶

type ZapConfig struct {
	// NodeID is the ZAP node ID for this service
	NodeID string `json:"node_id"`
	// PeerID is the ZAP node ID of the target backend
	PeerID string `json:"peer_id"`
	// PeerAddr is the direct address (host:port) if mDNS is disabled
	PeerAddr string `json:"peer_addr"`
	// ServiceType is the mDNS service type (e.g., "_hanzo._tcp")
	ServiceType string `json:"service_type"`
	// Port is the local ZAP port
	Port int `json:"port"`
	// Timeout is the request timeout in milliseconds
	Timeout int `json:"timeout_ms"`
	// NoDiscovery disables mDNS and uses direct connection
	NoDiscovery bool `json:"no_discovery"`
}

ZapConfig holds the ZAP backend configuration extracted from gateway.json.

type ZapListenerConfig ¶

type ZapListenerConfig struct {
	Port     int
	CertFile string
	KeyFile  string
	// InternalAddr is the local ZAP node's address to proxy to (e.g. "127.0.0.1:9652").
	InternalAddr string
}

ZapListenerConfig configures the inbound ZAP listener for external clients.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
gateway command
gateway-integration command
ingress command Hanzo Ingress — lightweight host-based reverse proxy Replaces nginx-ingress with a minimal, config-driven proxy.	Hanzo Ingress — lightweight host-based reverse proxy Replaces nginx-ingress with a minimal, config-driven proxy.
tests Package tests implements utility functions to help with API Gateway testing.	Package tests implements utility functions to help with API Gateway testing.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Hanzo Gateway

Overview

Architecture

API Endpoints

OpenAI-Compatible LLM Routes (api.hanzo.ai)

Platform Service Routes (api.hanzo.ai)

Monitoring Endpoints

Model Routing

How It Works

Model Tiers

Authentication

API Key Authentication

Auth Flow

Header Forwarding

Rate Limiting

Global Configuration

Per-Endpoint Overrides

Rate Limit Responses

Observability

Logging

Health Check

Metrics

Circuit Breakers

Quick Start

Build from Source

Run Locally

Docker

Docker Compose

Production Deployment

Deploy

Infrastructure Details

K8s Manifests

Configuration

Editing Routes

Config Structure

Repository Structure

DNS

Related Projects

License

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func AddCheck ¶

func BaseHABackendFactory ¶ added in v1.0.0

func BaseNetworkBackendFactory ¶ added in v1.0.0

func BrandingMiddleware ¶ added in v1.0.0

func InitZapListenerFromEnv ¶

func LoadPlugins ¶

func LoadPluginsWithContext ¶

func LoadRoutes ¶

func LoadRoutesFromFile ¶

func NewAuthMiddleware ¶

func NewBackendFactory ¶

func NewBackendFactoryWithContext ¶

func NewEngine ¶

func NewExecutor ¶

func NewHandlerFactory ¶

func NewProxyFactory ¶

func NewTestPluginCmd ¶

func NewTestPluginCmdWithArgs ¶

func NewWidgetSecurityMiddleware ¶

func RegisterEncoders ¶

func RegisterSubscriberFactories ¶

func StartZapListener ¶

func StopZapListener ¶

func ZapBackendFactory ¶

Types ¶

type AgentStarter ¶

type AuthConfig ¶

func DefaultAuthConfig ¶

type BackendFactory ¶

type BaseHAConfig ¶ added in v1.0.0

type BaseNetworkConfig ¶ added in v1.0.0

type BloomFilterJWT ¶

func (BloomFilterJWT) NewTokenRejecter ¶

type DefaultRunServerFactory ¶

OpenAI-Compatible LLM Routes (`api.hanzo.ai`)

Platform Service Routes (`api.hanzo.ai`)