batcher

package

v1.25.2 Latest Latest Go to latest Published: Mar 26, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

Documentation ¶

Overview ¶

Package batcher implements a continuous batching scheduler for inference serving. (Stability: beta)

Package batcher implements a continuous batching scheduler for inference serving.

Unlike fixed-batch schedulers that pad all sequences to the same length and wait for the entire batch to finish, a continuous batching scheduler:

Assembles variable-length (ragged) batches each step with zero padding.
Evicts completed sequences immediately, freeing the slot for a new request.
Fills vacated slots from the pending queue without stalling active sequences.

This approach typically achieves 2x+ throughput over fixed batching at the same concurrency because GPU cycles are never wasted on padding tokens and slots are never blocked by the slowest sequence in the batch.

Index ¶

type CompletionResult
type Option
- func WithPollInterval(d time.Duration) Option
type Request
type Scheduler
- func New(maxBatchSize int, stepFn StepFunc, opts ...Option) *Scheduler
type Slot
type StepBatch
- func (b *StepBatch) TotalTokens() int
type StepFunc

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type CompletionResult ¶

type CompletionResult struct {
	RequestID string
	Tokens    []int // All generated tokens.
	Err       error
}

CompletionResult is delivered to the caller when a request finishes.

type Option ¶

type Option func(*Scheduler)

Option configures a Scheduler.

func WithPollInterval ¶

func WithPollInterval(d time.Duration) Option

WithPollInterval sets how often the scheduler checks for new work when idle.

type Request ¶

type Request struct {
	ID     string // Caller-assigned identifier.
	Tokens []int  // Input token IDs (prompt).

	// MaxNewTokens is the maximum number of decode steps for this request.
	MaxNewTokens int
}

Request represents an incoming inference request.

type Scheduler ¶

type Scheduler struct {
	// contains filtered or unexported fields
}

Scheduler implements continuous batching.

func New ¶

func New(maxBatchSize int, stepFn StepFunc, opts ...Option) *Scheduler

New creates a continuous batching scheduler.

maxBatchSize controls the maximum number of concurrent active slots. stepFn is invoked once per decode step with the current ragged batch.

func (*Scheduler) Start ¶

func (s *Scheduler) Start()

Start begins the scheduling loop in a background goroutine.

func (*Scheduler) Stop ¶

func (s *Scheduler) Stop()

Stop gracefully shuts down the scheduler, draining active work.

func (*Scheduler) Submit ¶

func (s *Scheduler) Submit(ctx context.Context, req Request) (CompletionResult, error)

Submit enqueues a request and blocks until it completes or the context is canceled.

type Slot ¶

type Slot struct {
	Request       Request
	GeneratedToks []int // Tokens produced so far (decode output).
	Done          bool  // True once the sequence is finished.
}

Slot tracks the state of one active sequence inside the scheduler.

type StepBatch ¶

type StepBatch struct {
	Slots []*Slot
}

StepBatch is the ragged batch handed to the caller each step. It contains only active (non-done) slots with their actual token counts — no padding is ever added.

func (*StepBatch) TotalTokens ¶

func (b *StepBatch) TotalTokens() int

TotalTokens returns the total number of tokens across all slots (prompt + generated). Because the batch is ragged this is a simple sum, not maxLen * batchSize.

type StepFunc ¶

type StepFunc func(ctx context.Context, batch *StepBatch)

StepFunc is called once per decode step with the current ragged batch. The implementation should run one forward pass and append exactly one new token to each Slot.GeneratedToks. It must also set Slot.Done = true for any sequence that has finished (EOS or max tokens reached).

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL