proxy

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package proxy sits between any client and the inference server. Ollama has no metrics endpoint; the only place per-request numbers exist is the response stream itself, where the final chunk carries token counts and timings. So we forward traffic untouched and read the chunks as they pass through. OpenAI-style endpoints get the same treatment using their usage block.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GPUSample added in v1.2.0

type GPUSample struct {
	Name              string
	Util              int
	MemUsed, MemTotal int
	Temp              int
	Power             float64
}

GPUSample is the slice of GPU state /metrics re-exports. It's a copy of what internal/gpu reads, kept here so this package doesn't import it just to render some gauges.

type ModelStat added in v1.1.0

type ModelStat struct {
	Model    string
	Count    int
	AvgTok   float64
	P50, P95 float64
	OutTk    int
}

type Proxy

type Proxy struct {
	// contains filtered or unexported fields
}

func New

func New(upstream string, store *Store) (*Proxy, error)

func (*Proxy) Handler

func (p *Proxy) Handler() http.Handler

func (*Proxy) Listen

func (p *Proxy) Listen(addr string) error

type Request

type Request struct {
	When     time.Time
	Path     string
	Model    string
	PromptTk int
	OutTk    int
	TokSec   float64
	Total    time.Duration
}

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store keeps the last N proxied requests, newest first.

func NewStore

func NewStore(max int) *Store

func (*Store) Add

func (s *Store) Add(r Request)

func (*Store) ByModel added in v1.1.0

func (s *Store) ByModel() []ModelStat

ByModel aggregates everything seen so far, busiest model first.

func (*Store) Err

func (s *Store) Err() error

func (*Store) LastSeen

func (s *Store) LastSeen(model string) time.Time

LastSeen returns when a model last handled a request through the proxy, or zero if it never did.

func (*Store) OnAdd added in v1.2.0

func (s *Store) OnAdd(fn func(Request))

OnAdd runs fn after each new request lands. Used to append history.

func (*Store) Percentiles added in v1.1.0

func (s *Store) Percentiles() (p50, p95 float64)

Percentiles of tok/s across everything in the buffer.

func (*Store) Preload added in v1.2.0

func (s *Store) Preload(reqs []Request)

Preload drops in requests recovered from disk on startup. reqs is newest-first like everything else here, and goes behind whatever's already live. Doesn't fire onAdd, so loading doesn't re-write history.

func (*Store) PromText added in v1.1.0

func (s *Store) PromText() string

PromText renders what the proxy has seen in prometheus exposition format, served at /metrics on the proxy port. One snapshot so the per-model lines and the percentiles describe the same set of requests.

func (*Store) Recent

func (s *Store) Recent(n int) []Request

func (*Store) SetErr

func (s *Store) SetErr(err error)

func (*Store) SetGPU added in v1.2.0

func (s *Store) SetGPU(g []GPUSample)

SetGPU stashes the latest GPU read so /metrics can include it.

func (*Store) TokRates

func (s *Store) TokRates(n int) []float64

TokRates returns tok/s of the last n requests, oldest first.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL