module
Version:
v0.15.0
Opens a new window with list of versions in this module.
Published: Apr 26, 2024
License: Apache-2.0
Opens a new window with license information.
README
¶
inference-manager
TODO
- Implement the API endpoints (but still bypass to Ollama)
- Replace Ollama with its own code
- Be able to support multiple open source models
- Be able to support multiple models that are fine-tuned by users
- Support Autoscaling (with KEDA?)
- Support multi-GPU & multi-node inference (?)
- Explore optimizations
Here are some other notes:
Running Engine Locally
Run the following command:
make build-docker-engine
docker run \
-v ./configs/engine:/config \
-p 8080:8080 \
-p 8081:8081 \
llm-operator/inference-manager-engine \
run \
--config /config/config.yaml
Then hit the HTTP point and verify that Ollama responds.
curl http://localhost:11434/api/generate -d '{
"model": "gemma:2b",
"prompt":"Why is the sky blue?"
}'
Directories
¶
Click to show internal directories.
Click to hide internal directories.