specialize

package
v0.12.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Overview

Package specialize ports cpython/Python/specialize.c. PEP 659 adaptive specialization rewrites adaptive opcodes (LOAD_ATTR, BINARY_OP, CALL, ...) into specialized variants on warmup, and falls back to the adaptive parent on shape mismatch. The 16-bit backoff counter in every adaptive instruction's first cache slot drives the rewrite cadence.

v0.11 lays down the foundation: backoff counter helpers, inline cache struct layouts, the deopt table, and a Quicken pass that stamps the warmup counter into every adaptive cache slot. The per-family specializer entry points (LoadAttr, BinaryOp, Call, ...) follow on top.

CPython: Python/specialize.c

Index

Constants

View Source
const (
	BackoffBits        = 4
	MaxBackoff         = 12
	UnreachableBackoff = 15
)

Backoff layout constants.

CPython: Include/internal/pycore_backoff.h:34

View Source
const (
	JumpBackwardInitialValue   = 4095
	JumpBackwardInitialBackoff = 12
)

Initial JUMP_BACKWARD counter shape, used by the Tier-2 trace projector to pick hot loops.

CPython: Include/internal/pycore_backoff.h:100

View Source
const (
	SideExitInitialValue   = 4095
	SideExitInitialBackoff = 12
)

Initial side-exit temperature.

CPython: Include/internal/pycore_backoff.h:113

View Source
const (
	InlineCacheEntriesLoadGlobal     = int(unsafe.Sizeof(LoadGlobalCache{})) / CodeUnitWidth
	InlineCacheEntriesBinaryOp       = int(unsafe.Sizeof(BinaryOpCache{})) / CodeUnitWidth
	InlineCacheEntriesUnpackSequence = int(unsafe.Sizeof(UnpackSequenceCache{})) / CodeUnitWidth
	InlineCacheEntriesCompareOp      = int(unsafe.Sizeof(CompareOpCache{})) / CodeUnitWidth
	InlineCacheEntriesLoadSuperAttr  = int(unsafe.Sizeof(SuperAttrCache{})) / CodeUnitWidth
	InlineCacheEntriesLoadAttr       = int(unsafe.Sizeof(LoadMethodCache{})) / CodeUnitWidth
	InlineCacheEntriesStoreAttr      = int(unsafe.Sizeof(AttrCache{})) / CodeUnitWidth
	InlineCacheEntriesCall           = int(unsafe.Sizeof(CallCache{})) / CodeUnitWidth
	InlineCacheEntriesCallKw         = int(unsafe.Sizeof(CallCache{})) / CodeUnitWidth
	InlineCacheEntriesStoreSubscr    = int(unsafe.Sizeof(StoreSubscrCache{})) / CodeUnitWidth
	InlineCacheEntriesForIter        = int(unsafe.Sizeof(ForIterCache{})) / CodeUnitWidth
	InlineCacheEntriesSend           = int(unsafe.Sizeof(SendCache{})) / CodeUnitWidth
	InlineCacheEntriesToBool         = int(unsafe.Sizeof(ToBoolCache{})) / CodeUnitWidth
	InlineCacheEntriesContainsOp     = int(unsafe.Sizeof(ContainsOpCache{})) / CodeUnitWidth
)

Inline cache widths in codeunits (CACHE_ENTRIES_<FAMILY>). These numbers are pinned by cache_test.go against unsafe.Sizeof so the Go structs can never silently drift from the C layout.

CPython: Include/internal/pycore_code.h INLINE_CACHE_ENTRIES_*

View Source
const CodeUnitWidth = 2

CodeUnitWidth is the width in bytes of one bytecode codeunit. All cache structs are sized as a whole number of codeunits.

Variables

This section is empty.

Functions

func BackoffCounterTriggers

func BackoffCounterTriggers(c BackoffCounter) bool

BackoffCounterTriggers reports whether the value is zero and the counter is not the unreachable sentinel.

CPython: Include/internal/pycore_backoff.h:91 backoff_counter_triggers

func BinaryOp

func BinaryOp(lhs, rhs objects.Object, code []byte, instr int, oparg int32, nextOp compile.Opcode, nextArg int32, locals []objects.Object)

BinaryOp specializes the BINARY_OP at instr based on the two operands and the NB_* oparg. nextOp / nextArg are the opcode and arg of the *following* codeunit (after the cache cells); the INPLACE_ADD_UNICODE arm peeks at that to detect the `s = s + ""` pattern and pick the in-place variant when the store target is the same local.

CPython: Python/specialize.c:2578 _Py_Specialize_BinaryOp

func CacheCell

func CacheCell(code []byte, instr, k int) uint16

CacheCell reads the kth cache codeunit (1-based: cell 1 is the counter slot, cell 2 is the next field, etc.). Per-family helpers use it to fetch type or dict versions out of the inline cache.

func CacheCount

func CacheCount(op compile.Opcode) int

CacheCount returns the number of trailing codeunits reserved as inline cache after op. Zero means op carries no cache.

func CacheU32

func CacheU32(code []byte, instr, k int) uint32

CacheU32 reads a uint32 split across cache cells k and k+1.

func Call

func Call(callable objects.Object, code []byte, instr int, oparg, nargs int32)

Call specializes the CALL at instr based on the callable on the stack and the (positional) argument count.

CPython: Python/specialize.c:2182 _Py_Specialize_Call

func CallKw

func CallKw(callable objects.Object, code []byte, instr int, nargs int32)

CallKw specializes the CALL_KW at instr based on the callable on the stack. nargs is the positional count (the keyword tuple itself rides on the stack).

CPython: Python/specialize.c:2223 _Py_Specialize_CallKw

func CompareOp

func CompareOp(lhs, rhs objects.Object, code []byte, instr int, oparg int32)

CompareOp specializes the COMPARE_OP at instr.

CPython: Python/specialize.c:2740 _Py_Specialize_CompareOp

func ContainsOp

func ContainsOp(container objects.Object, code []byte, instr int)

ContainsOp specializes the CONTAINS_OP at instr based on the container operand (the right-hand side of `x in container`).

CPython: Python/specialize.c:3108 _Py_Specialize_ContainsOp

func Deopt

func Deopt(op compile.Opcode) compile.Opcode

Deopt returns the adaptive parent of op. For an unspecialized opcode the result is op itself. The dispatch loop calls this when a specialized arm hits a shape mismatch and needs to fall back to the adaptive parent before re-specializing.

CPython: Include/internal/pycore_code.h _PyOpcode_Deopt access

func ForIter

func ForIter(iter objects.Object, code []byte, instr int, oparg int32)

ForIter specializes the FOR_ITER at instr based on the iterator on the stack. oparg is the codeunit's argument (used for the gen-arm jump-fits check).

CPython: Python/specialize.c:2909 _Py_Specialize_ForIter

func IsUnreachable

func IsUnreachable(c BackoffCounter) bool

IsUnreachable reports whether the counter is the never-trigger sentinel (value 0, backoff 15).

CPython: Include/internal/pycore_backoff.h:38 is_unreachable_backoff_counter

func LoadAttr

func LoadAttr(owner objects.Object, name *objects.Unicode, code []byte, instr int)

LoadAttr specializes the LOAD_ATTR at instr based on the owner and the attribute name being loaded.

CPython: Python/specialize.c:1344 _Py_Specialize_LoadAttr

func LoadGlobal

func LoadGlobal(globals, builtins objects.Object, code []byte, instr int, name *objects.Unicode)

LoadGlobal rewrites the LOAD_GLOBAL at instr to either LOAD_GLOBAL_MODULE or LOAD_GLOBAL_BUILTIN if name resolves cleanly in globals or builtins. On any miss the opcode falls back to its adaptive parent and the counter is rolled back to the next backoff.

CPython: Python/specialize.c:1775 _Py_Specialize_LoadGlobal

func LoadSuperAttr

func LoadSuperAttr(globalSuper, cls objects.Object, code []byte, instr int, loadMethod bool)

LoadSuperAttr specializes the LOAD_SUPER_ATTR at instr. globalSuper is the value the bytecode just looked up under the name `super`; cls is the second positional argument. loadMethod mirrors LOAD_SUPER_ATTR's "is this for a method call?" flag.

CPython: Python/specialize.c:827 _Py_Specialize_LoadSuperAttr

func Quicken

func Quicken(code []byte, enableCounters bool)

Quicken stamps initial counters into the adaptive cache cells of code. The buffer holds packed _Py_CODEUNIT pairs (op, arg) with reserved cache slots after every adaptive opcode. enableCounters is the flag CPython spells the same way: when false, every counter is set to the unreachable sentinel so dispatch never trips a specialize attempt (used by the disassembler when it materializes a code object that should not run).

CPython: Python/specialize.c:459 _PyCode_Quicken

func Send

func Send(receiver objects.Object, code []byte, instr int)

Send specializes the SEND at instr based on the receiver. Currently only generators and coroutines have a fast path.

CPython: Python/specialize.c:2964 _Py_Specialize_Send

func SetCacheCell

func SetCacheCell(code []byte, instr, k int, value uint16)

SetCacheCell writes the kth cache codeunit. Mirror of CacheCell.

func SetCacheU32

func SetCacheU32(code []byte, instr, k int, value uint32)

SetCacheU32 writes a uint32 split across cache cells k and k+1 (low 16 bits first, matching the C struct field order on little-endian targets, which is what CPython assumes).

func SetOparg

func SetOparg(code []byte, instr int, arg byte)

SetOparg rewrites the oparg byte at instr.

func SetOpcode

func SetOpcode(code []byte, instr int, op compile.Opcode) bool

SetOpcode rewrites the opcode at instr to op. Returns false when the slot already holds an INSTRUMENTED_* opcode (the GIL-disabled build's race-with-instrumentation path); the caller must abandon the specialize attempt.

CPython: Python/specialize.c:702 set_opcode

func Specialize

func Specialize(code []byte, instr int, specialized compile.Opcode)

Specialize rewrites the opcode at instr to specialized and stamps the counter cell with the cooldown shape so the next miss has time to settle before re-specializing. Mirrors CPython's static inline `specialize` helper.

CPython: Python/specialize.c:739 specialize

func StoreAttr

func StoreAttr(owner objects.Object, name *objects.Unicode, code []byte, instr int)

StoreAttr specializes the STORE_ATTR at instr based on the owner and the attribute name being stored. The cache layout is 4 codeunits: counter at cell 1, type version uint32 at cells 2..3, index uint16 at cell 4.

CPython: Python/specialize.c:1376 _Py_Specialize_StoreAttr

func StoreCounter

func StoreCounter(code []byte, instr int, value BackoffCounter)

StoreCounter writes value into the counter cell of the adaptive instruction at instr.

CPython: Python/specialize.c:723 set_counter

func StoreSubscr

func StoreSubscr(container, sub objects.Object, code []byte, instr int)

StoreSubscr specializes the STORE_SUBSCR at instr based on the container and subscript operands. Order matches CPython: list first (with bounds check on int subscript), then dict.

CPython: Python/specialize.c:1894 _Py_Specialize_StoreSubscr

func ToBool

func ToBool(value objects.Object, code []byte, instr int)

ToBool specializes the TO_BOOL at instr based on the operand. The inline cache layout is 3 codeunits: counter at cell 1, version uint32 at cells 2..3 (used by the TO_BOOL_ALWAYS_TRUE arm only).

CPython: Python/specialize.c:3034 _Py_Specialize_ToBool

func UnpackSequence

func UnpackSequence(seq objects.Object, code []byte, instr int, oparg int32)

UnpackSequence specializes the UNPACK_SEQUENCE at instr based on the sequence and target count (oparg).

CPython: Python/specialize.c:2802 _Py_Specialize_UnpackSequence

func Unspecialize

func Unspecialize(code []byte, instr int)

Unspecialize rewrites the opcode at instr back to its adaptive parent and restarts the backoff counter so the next attempt waits exponentially longer. Used by the dispatch loop on shape mismatch.

CPython: Python/specialize.c:753 unspecialize

Types

type AttrCache

type AttrCache struct {
	Counter BackoffCounter
	Version [2]uint16
	Index   uint16
}

AttrCache backs STORE_ATTR. CACHE_ENTRIES = 4.

CPython: Include/internal/pycore_code.h:102 _PyAttrCache

type BackoffCounter

type BackoffCounter struct {
	ValueAndBackoff uint16
}

BackoffCounter packs a 12-bit value above a 4-bit backoff field.

CPython: Include/internal/pycore_structs.h _Py_BackoffCounter

func AdaptiveCounterBackoff

func AdaptiveCounterBackoff(c BackoffCounter) BackoffCounter

AdaptiveCounterBackoff computes the next counter after a specialize miss. Mirrors RestartBackoffCounter; kept under a distinct name so the call sites read like the CPython source.

CPython: Python/specialize.c adaptive_counter_backoff

func AdaptiveCounterCooldown

func AdaptiveCounterCooldown() BackoffCounter

func AdaptiveCounterWarmup

func AdaptiveCounterWarmup() BackoffCounter

Adaptive counter shapes used by the specializer. Warmup is the shape Quicken stamps into every fresh cache slot. Cooldown is the shape used after a successful specialize so the next miss has time to settle before a re-specialize attempt.

CPython: Python/specialize.c top of file (adaptive_counter_warmup / adaptive_counter_cooldown helpers)

func AdvanceBackoffCounter

func AdvanceBackoffCounter(c BackoffCounter) BackoffCounter

AdvanceBackoffCounter ticks the value down by one. Called every time the matching adaptive opcode executes.

CPython: Include/internal/pycore_backoff.h:83 advance_backoff_counter

func ForgeBackoffCounter

func ForgeBackoffCounter(raw uint16) BackoffCounter

ForgeBackoffCounter wraps a raw 16-bit pattern. Used when reading an existing counter out of a bytecode cache cell.

CPython: Include/internal/pycore_backoff.h:54 forge_backoff_counter

func InitialJumpBackoffCounter

func InitialJumpBackoffCounter() BackoffCounter

InitialJumpBackoffCounter returns the seed for a JUMP_BACKWARD instruction's counter slot.

CPython: Include/internal/pycore_backoff.h:102 initial_jump_backoff_counter

func InitialSideExitBackoffCounter

func InitialSideExitBackoffCounter() BackoffCounter

InitialSideExitBackoffCounter returns the seed for a Tier-2 side-exit temperature counter.

CPython: Include/internal/pycore_backoff.h:116 initial_temperature_backoff_counter

func InitialUnreachableBackoffCounter

func InitialUnreachableBackoffCounter() BackoffCounter

InitialUnreachableBackoffCounter returns the never-trigger sentinel.

CPython: Include/internal/pycore_backoff.h:124 initial_unreachable_backoff_counter

func LoadCounter

func LoadCounter(code []byte, instr int) BackoffCounter

LoadCounter reads the BackoffCounter that lives in the first cache codeunit of an adaptive instruction. instr is the codeunit index of the opcode itself; the counter sits at instr+1.

CPython: Python/specialize.c:730 load_counter

func MakeBackoffCounter

func MakeBackoffCounter(value, backoff uint16) BackoffCounter

MakeBackoffCounter packs value and backoff into a fresh counter. value must fit in 12 bits, backoff must fit in 4.

CPython: Include/internal/pycore_backoff.h:44 make_backoff_counter

func PauseBackoffCounter

func PauseBackoffCounter(c BackoffCounter) BackoffCounter

PauseBackoffCounter bumps the value by 1<<BackoffBits to push the next trigger out by one tick without changing the backoff field. Used when a specialize attempt should be retried later without escalating the backoff.

CPython: Include/internal/pycore_backoff.h:75 pause_backoff_counter

func RestartBackoffCounter

func RestartBackoffCounter(c BackoffCounter) BackoffCounter

RestartBackoffCounter resets a counter after a specialize miss. The backoff field grows by one (capped at MaxBackoff) and the value is reseeded to 2**backoff - 1.

CPython: Include/internal/pycore_backoff.h:62 restart_backoff_counter

type BinaryOpCache

type BinaryOpCache struct {
	Counter       BackoffCounter
	ExternalCache [4]uint16
}

BinaryOpCache backs BINARY_OP. CACHE_ENTRIES = 5.

CPython: Include/internal/pycore_code.h:76 _PyBinaryOpCache

type CallCache

type CallCache struct {
	Counter     BackoffCounter
	FuncVersion [2]uint16
}

CallCache backs CALL and CALL_KW. CACHE_ENTRIES = 3.

CPython: Include/internal/pycore_code.h:124 _PyCallCache

type CompareOpCache

type CompareOpCache struct {
	Counter BackoffCounter
}

CompareOpCache backs COMPARE_OP. CACHE_ENTRIES = 1.

CPython: Include/internal/pycore_code.h:90 _PyCompareOpCache

type ContainsOpCache

type ContainsOpCache struct {
	Counter BackoffCounter
}

ContainsOpCache backs CONTAINS_OP. CACHE_ENTRIES = 1.

CPython: Include/internal/pycore_code.h:157 _PyContainsOpCache

type ForIterCache

type ForIterCache struct {
	Counter BackoffCounter
}

ForIterCache backs FOR_ITER. CACHE_ENTRIES = 1.

CPython: Include/internal/pycore_code.h:138 _PyForIterCache

type LoadGlobalCache

type LoadGlobalCache struct {
	Counter            BackoffCounter
	ModuleKeysVersion  uint16
	BuiltinKeysVersion uint16
	Index              uint16
}

LoadGlobalCache backs LOAD_GLOBAL. CACHE_ENTRIES = 4.

CPython: Include/internal/pycore_code.h:67 _PyLoadGlobalCache

type LoadMethodCache

type LoadMethodCache struct {
	Counter     BackoffCounter
	TypeVersion [2]uint16
	Keys        [2]uint16
	Descr       [4]uint16
}

LoadMethodCache backs LOAD_ATTR. CACHE_ENTRIES = 10. The widest adaptive cache; LOAD_ATTR specializes into both attribute lookups and unbound-method dispatch, hence the type/keys versions plus the four-slot descr field.

The C layout uses a union for keys_version / dict_offset; in Go we keep two uint16 fields because the in-memory width is identical and the union arm is selected at the call site.

CPython: Include/internal/pycore_code.h:108 _PyLoadMethodCache

type SendCache

type SendCache struct {
	Counter BackoffCounter
}

SendCache backs SEND. CACHE_ENTRIES = 1.

CPython: Include/internal/pycore_code.h:144 _PySendCache

type StoreSubscrCache

type StoreSubscrCache struct {
	Counter BackoffCounter
}

StoreSubscrCache backs STORE_SUBSCR. CACHE_ENTRIES = 1.

CPython: Include/internal/pycore_code.h:132 _PyStoreSubscrCache

type SuperAttrCache

type SuperAttrCache struct {
	Counter BackoffCounter
}

SuperAttrCache backs LOAD_SUPER_ATTR. CACHE_ENTRIES = 1.

CPython: Include/internal/pycore_code.h:96 _PySuperAttrCache

type ToBoolCache

type ToBoolCache struct {
	Counter BackoffCounter
	Version [2]uint16
}

ToBoolCache backs TO_BOOL. CACHE_ENTRIES = 3.

CPython: Include/internal/pycore_code.h:150 _PyToBoolCache

type UnpackSequenceCache

type UnpackSequenceCache struct {
	Counter BackoffCounter
}

UnpackSequenceCache backs UNPACK_SEQUENCE. CACHE_ENTRIES = 1.

CPython: Include/internal/pycore_code.h:83 _PyUnpackSequenceCache

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL