Documentation
¶
Overview ¶
Package batcher implements a continuous batching scheduler for inference serving. (Stability: beta)
Package batcher implements a continuous batching scheduler for inference serving.
Unlike fixed-batch schedulers that pad all sequences to the same length and wait for the entire batch to finish, a continuous batching scheduler:
- Assembles variable-length (ragged) batches each step with zero padding.
- Evicts completed sequences immediately, freeing the slot for a new request.
- Fills vacated slots from the pending queue without stalling active sequences.
This approach typically achieves 2x+ throughput over fixed batching at the same concurrency because GPU cycles are never wasted on padding tokens and slots are never blocked by the slowest sequence in the batch.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CompletionResult ¶
CompletionResult is delivered to the caller when a request finishes.
type Option ¶
type Option func(*Scheduler)
Option configures a Scheduler.
func WithPollInterval ¶
WithPollInterval sets how often the scheduler checks for new work when idle.
type Request ¶
type Request struct {
ID string // Caller-assigned identifier.
Tokens []int // Input token IDs (prompt).
// MaxNewTokens is the maximum number of decode steps for this request.
MaxNewTokens int
}
Request represents an incoming inference request.
type Scheduler ¶
type Scheduler struct {
// contains filtered or unexported fields
}
Scheduler implements continuous batching.
func New ¶
New creates a continuous batching scheduler.
maxBatchSize controls the maximum number of concurrent active slots. stepFn is invoked once per decode step with the current ragged batch.
func (*Scheduler) Start ¶
func (s *Scheduler) Start()
Start begins the scheduling loop in a background goroutine.
type Slot ¶
type Slot struct {
Request Request
GeneratedToks []int // Tokens produced so far (decode output).
Done bool // True once the sequence is finished.
}
Slot tracks the state of one active sequence inside the scheduler.
type StepBatch ¶
type StepBatch struct {
Slots []*Slot
}
StepBatch is the ragged batch handed to the caller each step. It contains only active (non-done) slots with their actual token counts — no padding is ever added.
func (*StepBatch) TotalTokens ¶
TotalTokens returns the total number of tokens across all slots (prompt + generated). Because the batch is ragged this is a simple sum, not maxLen * batchSize.