batcher

package
v1.25.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 26, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package batcher implements a continuous batching scheduler for inference serving. (Stability: beta)

Package batcher implements a continuous batching scheduler for inference serving.

Unlike fixed-batch schedulers that pad all sequences to the same length and wait for the entire batch to finish, a continuous batching scheduler:

  • Assembles variable-length (ragged) batches each step with zero padding.
  • Evicts completed sequences immediately, freeing the slot for a new request.
  • Fills vacated slots from the pending queue without stalling active sequences.

This approach typically achieves 2x+ throughput over fixed batching at the same concurrency because GPU cycles are never wasted on padding tokens and slots are never blocked by the slowest sequence in the batch.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CompletionResult

type CompletionResult struct {
	RequestID string
	Tokens    []int // All generated tokens.
	Err       error
}

CompletionResult is delivered to the caller when a request finishes.

type Option

type Option func(*Scheduler)

Option configures a Scheduler.

func WithPollInterval

func WithPollInterval(d time.Duration) Option

WithPollInterval sets how often the scheduler checks for new work when idle.

type Request

type Request struct {
	ID     string // Caller-assigned identifier.
	Tokens []int  // Input token IDs (prompt).

	// MaxNewTokens is the maximum number of decode steps for this request.
	MaxNewTokens int
}

Request represents an incoming inference request.

type Scheduler

type Scheduler struct {
	// contains filtered or unexported fields
}

Scheduler implements continuous batching.

func New

func New(maxBatchSize int, stepFn StepFunc, opts ...Option) *Scheduler

New creates a continuous batching scheduler.

maxBatchSize controls the maximum number of concurrent active slots. stepFn is invoked once per decode step with the current ragged batch.

func (*Scheduler) Start

func (s *Scheduler) Start()

Start begins the scheduling loop in a background goroutine.

func (*Scheduler) Stop

func (s *Scheduler) Stop()

Stop gracefully shuts down the scheduler, draining active work.

func (*Scheduler) Submit

func (s *Scheduler) Submit(ctx context.Context, req Request) (CompletionResult, error)

Submit enqueues a request and blocks until it completes or the context is canceled.

type Slot

type Slot struct {
	Request       Request
	GeneratedToks []int // Tokens produced so far (decode output).
	Done          bool  // True once the sequence is finished.
}

Slot tracks the state of one active sequence inside the scheduler.

type StepBatch

type StepBatch struct {
	Slots []*Slot
}

StepBatch is the ragged batch handed to the caller each step. It contains only active (non-done) slots with their actual token counts — no padding is ever added.

func (*StepBatch) TotalTokens

func (b *StepBatch) TotalTokens() int

TotalTokens returns the total number of tokens across all slots (prompt + generated). Because the batch is ragged this is a simple sum, not maxLen * batchSize.

type StepFunc

type StepFunc func(ctx context.Context, batch *StepBatch)

StepFunc is called once per decode step with the current ragged batch. The implementation should run one forward pass and append exactly one new token to each Slot.GeneratedToks. It must also set Slot.Done = true for any sequence that has finished (EOS or max tokens reached).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL