Documentation
¶
Overview ¶
Package proxy sits between any client and the inference server. Ollama has no metrics endpoint; the only place per-request numbers exist is the response stream itself, where the final chunk carries token counts and timings. So we forward traffic untouched and read the chunks as they pass through. OpenAI-style endpoints get the same treatment using their usage block.
Index ¶
- type GPUSample
- type ModelStat
- type Proxy
- type Request
- type Store
- func (s *Store) Add(r Request)
- func (s *Store) ByModel() []ModelStat
- func (s *Store) Err() error
- func (s *Store) LastSeen(model string) time.Time
- func (s *Store) OnAdd(fn func(Request))
- func (s *Store) Percentiles() (p50, p95 float64)
- func (s *Store) Preload(reqs []Request)
- func (s *Store) PromText() string
- func (s *Store) Recent(n int) []Request
- func (s *Store) SetErr(err error)
- func (s *Store) SetGPU(g []GPUSample)
- func (s *Store) TokRates(n int) []float64
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type GPUSample ¶ added in v1.2.0
GPUSample is the slice of GPU state /metrics re-exports. It's a copy of what internal/gpu reads, kept here so this package doesn't import it just to render some gauges.
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store keeps the last N proxied requests, newest first.
func (*Store) ByModel ¶ added in v1.1.0
ByModel aggregates everything seen so far, busiest model first.
func (*Store) LastSeen ¶
LastSeen returns when a model last handled a request through the proxy, or zero if it never did.
func (*Store) OnAdd ¶ added in v1.2.0
OnAdd runs fn after each new request lands. Used to append history.
func (*Store) Percentiles ¶ added in v1.1.0
Percentiles of tok/s across everything in the buffer.
func (*Store) Preload ¶ added in v1.2.0
Preload drops in requests recovered from disk on startup. reqs is newest-first like everything else here, and goes behind whatever's already live. Doesn't fire onAdd, so loading doesn't re-write history.
func (*Store) PromText ¶ added in v1.1.0
PromText renders what the proxy has seen in prometheus exposition format, served at /metrics on the proxy port. One snapshot so the per-model lines and the percentiles describe the same set of requests.