kube_ai_scheduler

package module
v0.20.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

README

License

Kubernetes AI (KAI) Scheduler

The Kubernetes AI Scheduler is a robust, efficient, and scalable Kubernetes scheduler that optimizes GPU resource allocation for AI and machine learning workloads.

Designed to manage large-scale GPU clusters, including thousands of nodes, and high-throughput of workloads, makes the KAI Scheduler ideal for extensive and demanding environments. The Kubernetes AI Scheduler allows administrators of Kubernetes clusters to dynamically allocate GPU resources to workloads.

KAI Scheduler supports the entire AI lifecycle, from small, interactive jobs that require minimal resources to large training and inference, all within the same cluster. It ensures optimal resource allocation while maintaining resource fairness between the different consumers. It can run alongside other schedulers installed on the cluster.

Key Features

  • Batch Scheduling: Ensure all pods in a group are scheduled simultaneously or not at all.
  • Bin Packing & Spread Scheduling: Optimize node usage either by minimizing fragmentation (bin-packing) or increasing resiliency and load balancing (spread scheduling).
  • Workload Priority: Prioritize workloads effectively within queues.
  • Hierarchical Queues: Manage workloads with two-level queue hierarchies for flexible organizational control.
  • Resource distribution: Customize quotas, over-quota weights, limits, and priorities per queue.
  • Fairness Policies: Ensure equitable resource distribution using Dominant Resource Fairness (DRF) and resource reclamation across queues.
  • Workload Consolidation: Reallocate running workloads intelligently to reduce fragmentation and increase cluster utilization.
  • Elastic Workloads: Dynamically scale workloads within defined minimum and maximum pod counts.
  • Dynamic Resource Allocation (DRA): Support vendor-specific hardware resources through Kubernetes ResourceClaims (e.g., GPUs from NVIDIA or AMD).
  • GPU Sharing: Allow multiple workloads to efficiently share single or multiple GPUs, maximizing resource utilization.
  • Cloud & On-premise Support: Fully compatible with dynamic cloud infrastructures (including auto-scalers like Karpenter) as well as static on-premise deployments.

Prerequisites

Before installing KAI Scheduler, ensure you have:

  • A running Kubernetes cluster
  • Helm CLI installed
  • NVIDIA GPU-Operator installed in order to schedule workloads that request GPU resources

Installation

KAI Scheduler will be installed in kai-scheduler namespace. When submitting workloads make sure to use a dedicated namespace.

Installation Methods

KAI Scheduler can be installed:

  • From Production (Recommended)
  • From Source (Build it Yourself)
Install from Production
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm upgrade -i kai-scheduler nvidia/kai-scheduler -n kai-scheduler --create-namespace
Build from Source

Follow the instructions here

Quick Start

To start scheduling workloads with KAI Scheduler, please continue to Quick Start example

API

PLEACEHOLDER - How to use?

Support and Getting Help

Please open an issue on the GitHub project for any questions. Your feedback is appreciated.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
cmd
pkg
apis/client/clientset/versioned/fake
This package has the automatically generated fake clientset.
This package has the automatically generated fake clientset.
apis/client/clientset/versioned/scheme
This package contains the scheme of the automatically generated clientset.
This package contains the scheme of the automatically generated clientset.
apis/client/clientset/versioned/typed/scheduling/v1alpha2
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
apis/client/clientset/versioned/typed/scheduling/v1alpha2/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
apis/client/clientset/versioned/typed/scheduling/v2
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
apis/client/clientset/versioned/typed/scheduling/v2/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
apis/client/clientset/versioned/typed/scheduling/v2alpha2
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
apis/client/clientset/versioned/typed/scheduling/v2alpha2/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
apis/scheduling/v1alpha2
+groupName=scheduling.run.ai
+groupName=scheduling.run.ai
apis/scheduling/v2
+groupName=scheduling.run.ai
+groupName=scheduling.run.ai
apis/scheduling/v2alpha2
+groupName=scheduling.run.ai
+groupName=scheduling.run.ai
binder/binding/mock
Package mock_binder is a generated GoMock package.
Package mock_binder is a generated GoMock package.
binder/binding/resourcereservation/mock
Package mock_resourcereservation is a generated GoMock package.
Package mock_resourcereservation is a generated GoMock package.
binder/plugins/k8s-plugins/common/mock
Package mock_common_plugins is a generated GoMock package.
Package mock_common_plugins is a generated GoMock package.
binder/plugins/mock
Package mock_plugins is a generated GoMock package.
Package mock_plugins is a generated GoMock package.
scheduler/api/pod_affinity
Package pod_affinity is a generated GoMock package.
Package pod_affinity is a generated GoMock package.
scheduler/cache
Package cache is a generated GoMock package.
Package cache is a generated GoMock package.
scheduler/cache/cluster_info/data_lister
Package data_lister is a generated GoMock package.
Package data_lister is a generated GoMock package.
scheduler/k8s_utils
Package k8s_utils is a generated GoMock package.
Package k8s_utils is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL