README ¶ inference-manager The inference-manager manages inference runtimes (e.g., vLLM and Ollama) in containers, load models, and process requests. Inferece Request flow. Please see inference_request_flow.md. Expand ▾ Collapse ▴ Directories ¶ Show internal Expand all Path Synopsis api v1 Package v1 is a reverse proxy. Package v1 is a reverse proxy. common pkg/api pkg/sse pkg/test engine cmd command internal/autoscaler internal/config internal/httputil internal/metrics internal/modeldownloader internal/modeldownloader/common internal/modeldownloader/huggingface internal/modeldownloader/s3 internal/ollama internal/processor internal/puller internal/runtime internal/runtime/vllm server cmd command internal/admin internal/config internal/heartbeater internal/infprocessor internal/monitoring internal/rag internal/rate internal/router internal/server internal/taskexchanger triton-proxy cmd command internal/server Click to show internal directories. Click to hide internal directories.