ome

module

v0.1.3 Latest Latest Go to latest Published: Jul 16, 2025 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/sgl-project/ome

Links

Open Source Insights

README ¶

OME

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource utilization, and sophisticated deployment patterns.

Read the documentation to learn more about OME capabilities and features.

Features Overview

Model Management: Models are first-class citizen custom resources in OME. Sophisticated model parsing extracts architecture, parameter count, and capabilities directly from model files. Supports distributed storage with automated repair, double encryption, namespace scoping, and multiple formats (SafeTensors, PyTorch, TensorRT, ONNX).
Intelligent Runtime Selection: Automatic matching of models to optimal runtime configurations through weighted scoring based on architecture, format, quantization, parameter size, and framework compatibility.
Optimized Deployments: Supports multiple deployment patterns including prefill-decode disaggregation, multi-node inference, and traditional Kubernetes deployments with advanced scaling controls.
Resource Optimization: Specialized GPU bin-packing scheduling with dynamic re-optimization to maximize cluster efficiency while ensuring high availability.
Runtime Integrations: First-class support for SGLang - the most advanced inference engine with cache-aware load balancing, multi-node deployment, prefill-decode disaggregated serving, multi-LoRA adapter serving, and much more. Also supports Triton for general model inference.
Kubernetes Ecosystem Integration: Deep integration with modern Kubernetes components including Kueue for gang scheduling of multi-pod workloads, LeaderWorkerSet for resilient multi-node deployments, KEDA for advanced custom metrics-based autoscaling, K8s Gateway API for sophisticated traffic routing, and Gateway API Inference Extension for standardized inference endpoints.
Automated Benchmarking: Built-in performance evaluation through the BenchmarkJob custom resource, supporting configurable traffic patterns, concurrent load testing, and comprehensive result storage. Enables systematic performance comparison across models and service configurations.

Production Readiness Status

✅ API version: v1beta1
✅ Comprehensive documentation
✅ Unit and integration test coverage
✅ Production deployments with large-scale LLM workloads
✅ Monitoring via standard metrics and Kubernetes events
✅ Security: RBAC-based access control and model encryption
✅ High availability mode with redundant model storage

Installation

Requires Kubernetes 1.28 or newer

Option 1: OCI Registry (Recommended)

Install OME directly from the OCI registry:

# Install OME CRDs
helm upgrade --install ome-crd oci://ghcr.io/moirai-internal/charts/ome-crd --namespace ome --create-namespace

# Install OME resources
helm upgrade --install ome oci://ghcr.io/moirai-internal/charts/ome-resources --namespace ome

Option 2: Helm Repository

Install using the traditional Helm repository:

# Add the OME Helm repository
helm repo add ome https://sgl-project.github.io/ome
helm repo update

# Install OME CRDs first
helm upgrade --install ome-crd ome/ome-crd --namespace ome --create-namespace

# Install OME resources
helm upgrade --install ome ome/ome-resources --namespace ome

Option 3: Install from Source

For development or customization:

# Clone the repository
git clone https://github.com/sgl-project/ome.git
cd ome

# Install from local charts
helm install ome-crd charts/ome-crd --namespace ome --create-namespace
helm install ome charts/ome-resources --namespace ome

Read the installation guide for more options and advanced configurations.

Learn more about:

OME concepts
Common tasks

Architecture

OME uses a component-based architecture built on Kubernetes custom resources:

BaseModel/ClusterBaseModel: Define model sources and metadata
ServingRuntime/ClusterServingRuntime: Define how models are served
InferenceService: Connects models to runtimes for deployment
BenchmarkJob: Measures model performance under different workloads

OME's controller automatically:

Downloads and parses models to understand their characteristics
Selects the optimal runtime configuration for each model
Generates Kubernetes resources for efficient deployment
Continuously optimizes resource utilization across the cluster

Roadmap

High-level overview of the main priorities:

Enhanced model parsing for additional model families and architectures
Support for model quantization and optimization workflows
Federation across multiple Kubernetes clusters

Community and Support

GitHub Issues for bug reports and feature requests
Documentation for guides and reference

License

OME is licensed under the MIT License.

Directories ¶

Path	Synopsis
cmd
crd-gen command
manager command
model-agent command
multinode-prober command
ome-agent command
qpext command
spec-gen command
internal
ome-agent/enigma
ome-agent/fine-tuned-adapter
ome-agent/model-metadata Package modelmetadata provides functionality to extract metadata from model files stored in PVCs and update BaseModel/ClusterBaseModel CRs.	Package modelmetadata provides functionality to extract metadata from model files stored in PVCs and update BaseModel/ClusterBaseModel CRs.
ome-agent/replica
ome-agent/serving-agent
pkg
afero
apis Package apis contains Kubernetes API groups.	Package apis contains Kubernetes API groups.
apis/ome/v1beta1 Package v1beta1 contains API Schema definitions for the serving v1beta1 API group +k8s:openapi-gen=true +kubebuilder:object:generate=true +k8s:defaulter-gen=TypeMeta +groupName=ome.io	Package v1beta1 contains API Schema definitions for the serving v1beta1 API group +k8s:openapi-gen=true +kubebuilder:object:generate=true +k8s:defaulter-gen=TypeMeta +groupName=ome.io
auth
auth/aws
auth/azure
auth/gcp
auth/oci
client/clientset/versioned
client/clientset/versioned/fake This package has the automatically generated fake clientset.	This package has the automatically generated fake clientset.
client/clientset/versioned/scheme This package contains the scheme of the automatically generated clientset.	This package contains the scheme of the automatically generated clientset.
client/clientset/versioned/typed/ome/v1beta1 This package has the automatically generated typed clients.	This package has the automatically generated typed clients.
client/clientset/versioned/typed/ome/v1beta1/fake Package fake has the automatically generated clients.	Package fake has the automatically generated clients.
client/informers/externalversions
client/informers/externalversions/internalinterfaces
client/informers/externalversions/ome
client/informers/externalversions/ome/v1beta1
client/listers/ome/v1beta1
configutils
constants
controller/v1beta1/basemodel
controller/v1beta1/benchmark
controller/v1beta1/benchmark/reconcilers/job
controller/v1beta1/benchmark/utils
controller/v1beta1/controllerconfig
controller/v1beta1/inferenceservice
controller/v1beta1/inferenceservice/components
controller/v1beta1/inferenceservice/reconcilers/autoscaler
controller/v1beta1/inferenceservice/reconcilers/common
controller/v1beta1/inferenceservice/reconcilers/deployment
controller/v1beta1/inferenceservice/reconcilers/external_service
controller/v1beta1/inferenceservice/reconcilers/hpa
controller/v1beta1/inferenceservice/reconcilers/ingress
controller/v1beta1/inferenceservice/reconcilers/ingress/builders
controller/v1beta1/inferenceservice/reconcilers/ingress/factory
controller/v1beta1/inferenceservice/reconcilers/ingress/interfaces
controller/v1beta1/inferenceservice/reconcilers/ingress/services
controller/v1beta1/inferenceservice/reconcilers/ingress/strategies
controller/v1beta1/inferenceservice/reconcilers/istiosidecar
controller/v1beta1/inferenceservice/reconcilers/keda
controller/v1beta1/inferenceservice/reconcilers/knative
controller/v1beta1/inferenceservice/reconcilers/lws
controller/v1beta1/inferenceservice/reconcilers/modelconfig
controller/v1beta1/inferenceservice/reconcilers/multinode
controller/v1beta1/inferenceservice/reconcilers/multinodevllm
controller/v1beta1/inferenceservice/reconcilers/raw
controller/v1beta1/inferenceservice/reconcilers/rbac
controller/v1beta1/inferenceservice/reconcilers/service
controller/v1beta1/inferenceservice/status
controller/v1beta1/inferenceservice/utils
hfutil/hub
hfutil/hub/samples/basic_download command Package main demonstrates basic Hugging Face Hub download functionality.	Package main demonstrates basic Hugging Face Hub download functionality.
hfutil/hub/samples/enhanced_client command Package main demonstrates the enhanced Hugging Face Hub client with enterprise features.	Package main demonstrates the enhanced Hugging Face Hub client with enterprise features.
hfutil/hub/samples/llama_download command Package main demonstrates downloading large language models (Llama) from Hugging Face Hub.	Package main demonstrates downloading large language models (Llama) from Hugging Face Hub.
hfutil/hub/samples/progress_logging command Package main demonstrates Hugging Face Hub with progress bars and logging.	Package main demonstrates Hugging Face Hub with progress bars and logging.
hfutil/modelconfig
hfutil/modelconfig/examples command
imds
logging
logging/ginlog
modelagent Package modelagent implements the model agent components for managing models in OME.	Package modelagent implements the model agent components for managing models in OME.
modelver
ociobjectstore Package ociobjectstore provides a data store abstraction backed by Oracle Object Storage.	Package ociobjectstore provides a data store abstraction backed by Oracle Object Storage.
openapi
principals
runtimeselector
testing
utils
utils/storage
vault
vault/kmscrypto
vault/kmsmgm
vault/kmsvault
vault/secret
vault/secret_in_vault
vault/secret_retrieval
vault/vault
version
webhook/admission/benchmark
webhook/admission/isvc
webhook/admission/pod
webhook/admission/servingruntime
zipper
tests

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL