llm-proxy

module

v0.0.1 Latest Latest Go to latest Published: Mar 20, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/acksell/llm-proxy

Links

Open Source Insights

README ¶

LLM Proxy

A simple, Go-based alternative to the litellm proxy, without all the extra stuff you don't need! A modular reverse proxy that forwards requests to various LLM providers (OpenAI, Anthropic, Gemini) using Go and the Gorilla web toolkit.

Features

Multi-provider support: Full support for OpenAI, Anthropic, and Gemini
Streaming Support: Native streaming support for all providers
OpenAI Integration: Complete OpenAI API compatibility with /openai prefix
Anthropic Integration: Claude API support with /anthropic prefix
Gemini Integration: Google Gemini API support with /gemini prefix
Comprehensive Logging: Request/response monitoring with streaming detection
CORS Support: Browser-based application compatibility
Health Check: Detailed health status for all providers
Configurable Port: Environment variable configuration (default: 9002)
Rate Limiting (experimental): Optional request/token-based limits per user/API key/model/provider

Quick Start

# Get help on available commands
make help

# Install dependencies and build
make install build

# Run the proxy
make run

# Or run in development mode
make dev

Making requests

Once the proxy is running, you can make requests to LLM providers through the proxy:

# Health check (shows all provider statuses)
curl http://localhost:9002/health

# OpenAI Chat completions (replace YOUR_API_KEY with your actual OpenAI API key)
curl -X POST http://localhost:9002/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello, world!"}],
    "max_tokens": 50
  }'

# OpenAI Streaming
curl -X POST http://localhost:9002/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

# Anthropic Messages
curl -X POST http://localhost:9002/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-sonnet-20240229",
    "max_tokens": 100,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Gemini Generate Content
curl -X POST http://localhost:9002/gemini/v1/models/gemini-pro:generateContent?key=YOUR_API_KEY \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Hello!"}]}]
  }'

Testing

The project includes comprehensive integration tests for all providers:

# Run all tests
make test-all

# Run tests for specific providers
make test-openai
make test-anthropic
make test-gemini

# Run health check tests only
make test-health

# Check environment variables
make env-check

Setting up API Keys

To run integration tests, you need to set up environment variables:

export OPENAI_API_KEY=your_openai_key
export ANTHROPIC_API_KEY=your_anthropic_key
export GEMINI_API_KEY=your_gemini_key

Configuration

PORT: Environment variable to set the server port (default: 9002)

Rate Limiting (Experimental)

Disabled by default. Enable via config: see configs/base.yml and configs/dev.yml.
Supports provisional token estimation with post-response reconciliation using X-LLM-Input-Tokens (input tokens only).
Returns 429 Too Many Requests with Retry-After and X-RateLimit-* headers when throttled.
Redis backend is currently not supported; only the in-process memory backend is available.

Minimal dev example (see configs/dev.yml for a full setup):

features:
  rate_limiting:
    enabled: true
    backend: "memory" # single instance only
    estimation:
      max_sample_bytes: 20000
      bytes_per_token: 4 # Fallback to request size (Content-Length based)
      chars_per_token: 4 # Default for message-based estimation
      # Optional per-provider overrides (recommended)
      provider_chars_per_token:
        openai: 5      # ~185–190 tokens per 1k chars (from scripts/token_estimation.py)
        anthropic: 3   # ~290–315 tokens per 1k chars (from scripts/token_estimation.py)
    limits:
      requests_per_minute: 0   # 0 = unlimited (dev defaults)
      tokens_per_minute: 0

Token estimation behavior

We currently account for and reconcile only input tokens. Output tokens are not yet considered for rate limits/credits.
For small JSON requests (size controlled by max_sample_bytes), the proxy extracts textual message content via provider-specific parsers and estimates tokens by character count using chars_per_token (with per-provider overrides).
Default per-provider values come from benchmarks produced by scripts/token_estimation.py. You can run the script to generate your own table and override values in config.
Non-text modalities (images/videos) are not supported for estimation at this time and will fall back to credit-based only behavior essentially via max_sample_bytes.
Optimistic first request: to avoid estimation blocking initial traffic, the first token-bearing request in a window (when current token count is zero) is allowed even if token limits would otherwise apply. Subsequent requests are enforced normally.

API Endpoints

General

GET /health - Health check endpoint for all providers

OpenAI

POST /openai/v1/chat/completions - OpenAI chat completions endpoint (streaming supported)
POST /openai/v1/completions - OpenAI completions endpoint (streaming supported)
* /openai/v1/* - All other OpenAI API endpoints

Anthropic

POST /anthropic/v1/messages - Anthropic messages endpoint (streaming supported)
* /anthropic/v1/* - All other Anthropic API endpoints

Gemini

POST /gemini/v1/models/{model}:generateContent - Gemini content generation (streaming supported)
POST /gemini/v1/models/{model}:streamGenerateContent - Explicit streaming endpoint
* /gemini/v1/* - All other Gemini API endpoints

Architecture

The proxy is built with a modular architecture:

main.go: Core server setup, middleware, and provider registration
providers/openai.go: OpenAI-specific proxy implementation with streaming support
providers/anthropic.go: Anthropic proxy implementation with streaming support
providers/gemini.go: Gemini proxy implementation with streaming support
providers/provider.go: Common interfaces and provider management

Each provider implements its own:

Route registration
Request/response handling with streaming support
Error handling
Health status reporting
Response metadata parsing

Development

Available Make Commands

# Get help on all available commands
make help

# Code quality
make check         # Run all code quality checks
make fmt           # Format Go code
make vet           # Run go vet
make lint          # Run golint

# Building
make build         # Build the binary
make clean         # Clean build artifacts
make install       # Install dependencies

# Running
make run           # Run the built binary
make dev           # Run in development mode

# Testing
make test          # Run unit tests
make test-all      # Run all tests including integration
make test-openai   # Run OpenAI tests only
make test-anthropic # Run Anthropic tests only
make test-gemini   # Run Gemini tests only

Test Structure

Tests are organized by provider:

openai_test.go: OpenAI integration tests (streaming and non-streaming)
anthropic_test.go: Anthropic integration tests (streaming and non-streaming)
gemini_test.go: Gemini integration tests (streaming and non-streaming)
common_test.go: Health check and environment variable tests
test_helpers.go: Shared test utilities

Middleware

Logging: Logs all incoming requests with streaming detection
CORS: Adds CORS headers for browser compatibility
Streaming: Optimized handling for streaming responses
Error Handling: Provider-specific error handling

Adding New Providers

To add a new provider:

Create a new file (e.g., newprovider.go)
Implement the Provider interface
Add streaming detection logic
Add response metadata parsing
Create corresponding test file
Register the provider in main.go

Dependencies

Gorilla Mux - HTTP router and URL matcher

Build Information

The binary includes build-time information:

Git commit hash
Build timestamp
Go version

View build info with:

make version

Directories ¶

Path	Synopsis
cmd
llm-proxy command
llm-proxy-admin command
llm-proxy-keys command
internal
admin
apikeys
config
cost
dynamodb
middleware
providers
ratelimit
pkg
admin Package admin provides shared types and a client SDK for the llm-proxy admin API.	Package admin provides shared types and a client SDK for the llm-proxy admin API.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL