metadata-center

module

v0.0.0-...-4b532c3 Latest Latest Go to latest Published: Nov 7, 2025 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/aigw-project/metadata-center

Links

Open Source Insights

README ¶

Metadata Center

A near real-time load metric collection component, designed for intelligent inference scheduler in large-scale inference services.

English | 中文

Status

Early & quick developing

Background

Load metrics is very import for LLM inference scheduler.

Typically, the following four load metrics are very important: (for each engine level)

Total number of requests
Token usage (KVCache usage)
Number of requests in Prefill
Prompt length in Prefill

Timeliness is critical in large scale service. Poor timeliness will lead to large races, may choosing the same inference engine before the load metrics are updated.

There will be a fixed periodic delay, when polling metrics from engines. Especially in large-scale scenarios, as the QPS (throughput) increases, the race will also increase significantly.

Architecture

Cooperating with Inference Gateway(i.e. AIGW), we can achieve near real-time load metric collection by the following steps:

Request proxy to Inference Engine:

a. prefill & total request number: +1

b. prefill prompt length: +prompt-length
First token responded

a. prefill request number: -1

b. prefill prompt length: -prompt-length
Request done

a. total request number: -1

Even more, we can introduce CAS API to reduce race, when it is required in the feature.

📚 Documentation

📜 License

This project is licensed under Apache 2.0.

Directories ¶

Path	Synopsis
cmd
pkg
api
config
ginx
log
meta/cache
meta/load
middleware
prom
replicator
server
server/router
servicediscovery
servicediscovery/types
utils/errors
utils/helper
utils/json
utils/logger
utils/trace
utils/version

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL