proxy

package
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 26, 2026 License: MIT Imports: 27 Imported by: 0

Documentation

Overview

Package proxy provides the Anthropic token counting endpoint handler.

Package proxy provides the embeddings endpoint handler.

Package proxy provides HTTP handlers for the API proxy.

Package proxy provides the Anthropic messages endpoint handler.

Package proxy provides native messages endpoint handler.

Package proxy provides rate limiting middleware.

Package proxy provides HTTP handlers for the API proxy.

Package proxy provides token counting utilities using tiktoken.

Why Token Estimation Exists

This package estimates input tokens because of a fundamental protocol mismatch between Anthropic's streaming API and OpenAI's Chat Completions API:

  • Anthropic: The message_start event (FIRST event) must include input_tokens
  • OpenAI: Token usage appears in the FINAL chunk of the stream

Since this proxy translates Anthropic-format requests to OpenAI format and sends them to GitHub Copilot, we face a temporal problem: we must emit input_tokens before we know the actual count from the upstream provider.

The solution is to estimate tokens from the request content using tiktoken, then emit that estimate in message_start. The actual token count (when available from the upstream provider) replaces our estimate in the final message_delta event.

This means clients see an estimated count initially, then the real count at the end. For most use cases this is acceptable — the estimate is close enough for UI display, and any billing/quota tracking uses the final accurate count.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CountTokens

func CountTokens(text string) int

CountTokens counts tokens in a string using tiktoken. Falls back to character-based estimation if tokenizer fails.

func EstimateInputTokens

func EstimateInputTokens(req *translate.AnthropicRequest) int

EstimateInputTokens counts input tokens for an Anthropic request. Uses tiktoken for accurate counting of text content and Anthropic's dimension-based formula for images.

func EstimateTokensFromCountRequest

func EstimateTokensFromCountRequest(req *CountTokensRequest) int

EstimateTokensFromCountRequest counts tokens from a CountTokensRequest.

func EstimateTokensFromCountRequestWithBeta

func EstimateTokensFromCountRequestWithBeta(req *CountTokensRequest, anthropicBeta string) int

EstimateTokensFromCountRequestWithBeta counts tokens, accounting for MCP/Skill tools.

Types

type CountTokensRequest

type CountTokensRequest struct {
	Model    string                       `json:"model"`
	Messages []translate.AnthropicMessage `json:"messages"`
	System   json.RawMessage              `json:"system,omitempty"`
	Tools    []translate.AnthropicTool    `json:"tools,omitempty"`
}

CountTokensRequest is the request body for token counting.

type CountTokensResponse

type CountTokensResponse struct {
	InputTokens int `json:"input_tokens"`
}

CountTokensResponse is the response for token counting.

type Handler

type Handler struct {
	// contains filtered or unexported fields
}

Handler provides HTTP handlers for the proxy.

func NewHandler

func NewHandler(cfg config.Config, client *copilot.Client) *Handler

NewHandler creates a new handler.

func (*Handler) HandleCompletions

func (h *Handler) HandleCompletions(w http.ResponseWriter, r *http.Request)

HandleCompletions handles chat completion requests.

func (*Handler) HandleCountTokens

func (h *Handler) HandleCountTokens(w http.ResponseWriter, r *http.Request)

HandleCountTokens handles Anthropic token counting requests. This provides an estimate since we don't have access to the actual tokenizer.

func (*Handler) HandleEmbeddings

func (h *Handler) HandleEmbeddings(w http.ResponseWriter, r *http.Request)

HandleEmbeddings handles embedding requests.

func (*Handler) HandleMessages

func (h *Handler) HandleMessages(w http.ResponseWriter, r *http.Request)

HandleMessages handles Anthropic-compatible messages requests. Routes to native /v1/messages if the model supports it, otherwise translates to OpenAI format.

func (*Handler) HandleModels

func (h *Handler) HandleModels(w http.ResponseWriter, r *http.Request)

HandleModels handles model listing requests.

func (*Handler) HandleNativeMessages

func (h *Handler) HandleNativeMessages(w http.ResponseWriter, r *http.Request)

HandleNativeMessages handles Anthropic messages requests by passing them directly to Copilot's native /v1/messages endpoint without translation. This verifies that Copilot natively supports the Anthropic Messages API.

func (*Handler) HandleResponses

func (h *Handler) HandleResponses(w http.ResponseWriter, r *http.Request)

HandleResponses handles OpenAI Responses API requests. This is a pass-through proxy - we forward the request to Copilot's /responses endpoint and stream the response back, fixing ID inconsistencies in the stream.

func (*Handler) HandleRoot

func (h *Handler) HandleRoot(w http.ResponseWriter, r *http.Request)

HandleRoot handles the root endpoint.

type RateLimiter

type RateLimiter struct {
	// contains filtered or unexported fields
}

RateLimiter provides token bucket rate limiting. It is safe for concurrent use.

Unlike a simple interval-based limiter, a token bucket properly queues requests when waitOnLimit is true, preventing bursts after waiting.

func NewRateLimiter

func NewRateLimiter(intervalSecs int, waitOnLimit bool, verbose bool) *RateLimiter

NewRateLimiter creates a new rate limiter. intervalSecs is the minimum time between requests (0 disables rate limiting). waitOnLimit determines whether to wait or return 429 when rate limited.

The limiter uses a token bucket algorithm with burst=1, meaning requests are spaced evenly rather than allowing bursts after idle periods.

func (*RateLimiter) Check

func (rl *RateLimiter) Check() error

Check checks the rate limit and either waits or returns an error. Returns nil if the request can proceed.

func (*RateLimiter) CheckWithContext

func (rl *RateLimiter) CheckWithContext(ctx context.Context) error

CheckWithContext checks the rate limit with context support for cancellation.

func (*RateLimiter) Middleware

func (rl *RateLimiter) Middleware(next http.Handler) http.Handler

Middleware wraps an http.Handler with rate limiting.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL