Documentation
¶
Overview ¶
Package specialize ports cpython/Python/specialize.c. PEP 659 adaptive specialization rewrites adaptive opcodes (LOAD_ATTR, BINARY_OP, CALL, ...) into specialized variants on warmup, and falls back to the adaptive parent on shape mismatch. The 16-bit backoff counter in every adaptive instruction's first cache slot drives the rewrite cadence.
v0.11 lays down the foundation: backoff counter helpers, inline cache struct layouts, the deopt table, and a Quicken pass that stamps the warmup counter into every adaptive cache slot. The per-family specializer entry points (LoadAttr, BinaryOp, Call, ...) follow on top.
CPython: Python/specialize.c
Index ¶
- Constants
- func BackoffCounterTriggers(c BackoffCounter) bool
- func BinaryOp(lhs, rhs objects.Object, code []byte, instr int, oparg int32, ...)
- func CacheCell(code []byte, instr, k int) uint16
- func CacheCount(op compile.Opcode) int
- func CacheU32(code []byte, instr, k int) uint32
- func Call(callable objects.Object, code []byte, instr int, oparg, nargs int32)
- func CallKw(callable objects.Object, code []byte, instr int, nargs int32)
- func CompareOp(lhs, rhs objects.Object, code []byte, instr int, oparg int32)
- func ContainsOp(container objects.Object, code []byte, instr int)
- func Deopt(op compile.Opcode) compile.Opcode
- func ForIter(iter objects.Object, code []byte, instr int, oparg int32)
- func IsUnreachable(c BackoffCounter) bool
- func LoadAttr(owner objects.Object, name *objects.Unicode, code []byte, instr int)
- func LoadGlobal(globals, builtins objects.Object, code []byte, instr int, ...)
- func LoadSuperAttr(globalSuper, cls objects.Object, code []byte, instr int, loadMethod bool)
- func Quicken(code []byte, enableCounters bool)
- func Send(receiver objects.Object, code []byte, instr int)
- func SetCacheCell(code []byte, instr, k int, value uint16)
- func SetCacheU32(code []byte, instr, k int, value uint32)
- func SetOparg(code []byte, instr int, arg byte)
- func SetOpcode(code []byte, instr int, op compile.Opcode) bool
- func Specialize(code []byte, instr int, specialized compile.Opcode)
- func StoreAttr(owner objects.Object, name *objects.Unicode, code []byte, instr int)
- func StoreCounter(code []byte, instr int, value BackoffCounter)
- func StoreSubscr(container, sub objects.Object, code []byte, instr int)
- func ToBool(value objects.Object, code []byte, instr int)
- func UnpackSequence(seq objects.Object, code []byte, instr int, oparg int32)
- func Unspecialize(code []byte, instr int)
- type AttrCache
- type BackoffCounter
- func AdaptiveCounterBackoff(c BackoffCounter) BackoffCounter
- func AdaptiveCounterCooldown() BackoffCounter
- func AdaptiveCounterWarmup() BackoffCounter
- func AdvanceBackoffCounter(c BackoffCounter) BackoffCounter
- func ForgeBackoffCounter(raw uint16) BackoffCounter
- func InitialJumpBackoffCounter() BackoffCounter
- func InitialSideExitBackoffCounter() BackoffCounter
- func InitialUnreachableBackoffCounter() BackoffCounter
- func LoadCounter(code []byte, instr int) BackoffCounter
- func MakeBackoffCounter(value, backoff uint16) BackoffCounter
- func PauseBackoffCounter(c BackoffCounter) BackoffCounter
- func RestartBackoffCounter(c BackoffCounter) BackoffCounter
- type BinaryOpCache
- type CallCache
- type CompareOpCache
- type ContainsOpCache
- type ForIterCache
- type LoadGlobalCache
- type LoadMethodCache
- type SendCache
- type StoreSubscrCache
- type SuperAttrCache
- type ToBoolCache
- type UnpackSequenceCache
Constants ¶
const ( BackoffBits = 4 MaxBackoff = 12 UnreachableBackoff = 15 )
Backoff layout constants.
CPython: Include/internal/pycore_backoff.h:34
const ( JumpBackwardInitialValue = 4095 JumpBackwardInitialBackoff = 12 )
Initial JUMP_BACKWARD counter shape, used by the Tier-2 trace projector to pick hot loops.
CPython: Include/internal/pycore_backoff.h:100
const ( SideExitInitialValue = 4095 SideExitInitialBackoff = 12 )
Initial side-exit temperature.
CPython: Include/internal/pycore_backoff.h:113
const ( InlineCacheEntriesLoadGlobal = int(unsafe.Sizeof(LoadGlobalCache{})) / CodeUnitWidth InlineCacheEntriesBinaryOp = int(unsafe.Sizeof(BinaryOpCache{})) / CodeUnitWidth InlineCacheEntriesUnpackSequence = int(unsafe.Sizeof(UnpackSequenceCache{})) / CodeUnitWidth InlineCacheEntriesCompareOp = int(unsafe.Sizeof(CompareOpCache{})) / CodeUnitWidth InlineCacheEntriesLoadSuperAttr = int(unsafe.Sizeof(SuperAttrCache{})) / CodeUnitWidth InlineCacheEntriesLoadAttr = int(unsafe.Sizeof(LoadMethodCache{})) / CodeUnitWidth InlineCacheEntriesStoreAttr = int(unsafe.Sizeof(AttrCache{})) / CodeUnitWidth InlineCacheEntriesCall = int(unsafe.Sizeof(CallCache{})) / CodeUnitWidth InlineCacheEntriesCallKw = int(unsafe.Sizeof(CallCache{})) / CodeUnitWidth InlineCacheEntriesStoreSubscr = int(unsafe.Sizeof(StoreSubscrCache{})) / CodeUnitWidth InlineCacheEntriesForIter = int(unsafe.Sizeof(ForIterCache{})) / CodeUnitWidth InlineCacheEntriesSend = int(unsafe.Sizeof(SendCache{})) / CodeUnitWidth InlineCacheEntriesToBool = int(unsafe.Sizeof(ToBoolCache{})) / CodeUnitWidth InlineCacheEntriesContainsOp = int(unsafe.Sizeof(ContainsOpCache{})) / CodeUnitWidth )
Inline cache widths in codeunits (CACHE_ENTRIES_<FAMILY>). These numbers are pinned by cache_test.go against unsafe.Sizeof so the Go structs can never silently drift from the C layout.
CPython: Include/internal/pycore_code.h INLINE_CACHE_ENTRIES_*
const CodeUnitWidth = 2
CodeUnitWidth is the width in bytes of one bytecode codeunit. All cache structs are sized as a whole number of codeunits.
Variables ¶
This section is empty.
Functions ¶
func BackoffCounterTriggers ¶
func BackoffCounterTriggers(c BackoffCounter) bool
BackoffCounterTriggers reports whether the value is zero and the counter is not the unreachable sentinel.
CPython: Include/internal/pycore_backoff.h:91 backoff_counter_triggers
func BinaryOp ¶
func BinaryOp(lhs, rhs objects.Object, code []byte, instr int, oparg int32, nextOp compile.Opcode, nextArg int32, locals []objects.Object)
BinaryOp specializes the BINARY_OP at instr based on the two operands and the NB_* oparg. nextOp / nextArg are the opcode and arg of the *following* codeunit (after the cache cells); the INPLACE_ADD_UNICODE arm peeks at that to detect the `s = s + ""` pattern and pick the in-place variant when the store target is the same local.
CPython: Python/specialize.c:2578 _Py_Specialize_BinaryOp
func CacheCell ¶
CacheCell reads the kth cache codeunit (1-based: cell 1 is the counter slot, cell 2 is the next field, etc.). Per-family helpers use it to fetch type or dict versions out of the inline cache.
func CacheCount ¶
CacheCount returns the number of trailing codeunits reserved as inline cache after op. Zero means op carries no cache.
func Call ¶
Call specializes the CALL at instr based on the callable on the stack and the (positional) argument count.
CPython: Python/specialize.c:2182 _Py_Specialize_Call
func CallKw ¶
CallKw specializes the CALL_KW at instr based on the callable on the stack. nargs is the positional count (the keyword tuple itself rides on the stack).
CPython: Python/specialize.c:2223 _Py_Specialize_CallKw
func CompareOp ¶
CompareOp specializes the COMPARE_OP at instr.
CPython: Python/specialize.c:2740 _Py_Specialize_CompareOp
func ContainsOp ¶
ContainsOp specializes the CONTAINS_OP at instr based on the container operand (the right-hand side of `x in container`).
CPython: Python/specialize.c:3108 _Py_Specialize_ContainsOp
func Deopt ¶
Deopt returns the adaptive parent of op. For an unspecialized opcode the result is op itself. The dispatch loop calls this when a specialized arm hits a shape mismatch and needs to fall back to the adaptive parent before re-specializing.
CPython: Include/internal/pycore_code.h _PyOpcode_Deopt access
func ForIter ¶
ForIter specializes the FOR_ITER at instr based on the iterator on the stack. oparg is the codeunit's argument (used for the gen-arm jump-fits check).
CPython: Python/specialize.c:2909 _Py_Specialize_ForIter
func IsUnreachable ¶
func IsUnreachable(c BackoffCounter) bool
IsUnreachable reports whether the counter is the never-trigger sentinel (value 0, backoff 15).
CPython: Include/internal/pycore_backoff.h:38 is_unreachable_backoff_counter
func LoadAttr ¶
LoadAttr specializes the LOAD_ATTR at instr based on the owner and the attribute name being loaded.
CPython: Python/specialize.c:1344 _Py_Specialize_LoadAttr
func LoadGlobal ¶
LoadGlobal rewrites the LOAD_GLOBAL at instr to either LOAD_GLOBAL_MODULE or LOAD_GLOBAL_BUILTIN if name resolves cleanly in globals or builtins. On any miss the opcode falls back to its adaptive parent and the counter is rolled back to the next backoff.
CPython: Python/specialize.c:1775 _Py_Specialize_LoadGlobal
func LoadSuperAttr ¶
LoadSuperAttr specializes the LOAD_SUPER_ATTR at instr. globalSuper is the value the bytecode just looked up under the name `super`; cls is the second positional argument. loadMethod mirrors LOAD_SUPER_ATTR's "is this for a method call?" flag.
CPython: Python/specialize.c:827 _Py_Specialize_LoadSuperAttr
func Quicken ¶
Quicken stamps initial counters into the adaptive cache cells of code. The buffer holds packed _Py_CODEUNIT pairs (op, arg) with reserved cache slots after every adaptive opcode. enableCounters is the flag CPython spells the same way: when false, every counter is set to the unreachable sentinel so dispatch never trips a specialize attempt (used by the disassembler when it materializes a code object that should not run).
CPython: Python/specialize.c:459 _PyCode_Quicken
func Send ¶
Send specializes the SEND at instr based on the receiver. Currently only generators and coroutines have a fast path.
CPython: Python/specialize.c:2964 _Py_Specialize_Send
func SetCacheCell ¶
SetCacheCell writes the kth cache codeunit. Mirror of CacheCell.
func SetCacheU32 ¶
SetCacheU32 writes a uint32 split across cache cells k and k+1 (low 16 bits first, matching the C struct field order on little-endian targets, which is what CPython assumes).
func SetOpcode ¶
SetOpcode rewrites the opcode at instr to op. Returns false when the slot already holds an INSTRUMENTED_* opcode (the GIL-disabled build's race-with-instrumentation path); the caller must abandon the specialize attempt.
CPython: Python/specialize.c:702 set_opcode
func Specialize ¶
Specialize rewrites the opcode at instr to specialized and stamps the counter cell with the cooldown shape so the next miss has time to settle before re-specializing. Mirrors CPython's static inline `specialize` helper.
CPython: Python/specialize.c:739 specialize
func StoreAttr ¶
StoreAttr specializes the STORE_ATTR at instr based on the owner and the attribute name being stored. The cache layout is 4 codeunits: counter at cell 1, type version uint32 at cells 2..3, index uint16 at cell 4.
CPython: Python/specialize.c:1376 _Py_Specialize_StoreAttr
func StoreCounter ¶
func StoreCounter(code []byte, instr int, value BackoffCounter)
StoreCounter writes value into the counter cell of the adaptive instruction at instr.
CPython: Python/specialize.c:723 set_counter
func StoreSubscr ¶
StoreSubscr specializes the STORE_SUBSCR at instr based on the container and subscript operands. Order matches CPython: list first (with bounds check on int subscript), then dict.
CPython: Python/specialize.c:1894 _Py_Specialize_StoreSubscr
func ToBool ¶
ToBool specializes the TO_BOOL at instr based on the operand. The inline cache layout is 3 codeunits: counter at cell 1, version uint32 at cells 2..3 (used by the TO_BOOL_ALWAYS_TRUE arm only).
CPython: Python/specialize.c:3034 _Py_Specialize_ToBool
func UnpackSequence ¶
UnpackSequence specializes the UNPACK_SEQUENCE at instr based on the sequence and target count (oparg).
CPython: Python/specialize.c:2802 _Py_Specialize_UnpackSequence
func Unspecialize ¶
Unspecialize rewrites the opcode at instr back to its adaptive parent and restarts the backoff counter so the next attempt waits exponentially longer. Used by the dispatch loop on shape mismatch.
CPython: Python/specialize.c:753 unspecialize
Types ¶
type AttrCache ¶
type AttrCache struct {
Counter BackoffCounter
Version [2]uint16
Index uint16
}
AttrCache backs STORE_ATTR. CACHE_ENTRIES = 4.
CPython: Include/internal/pycore_code.h:102 _PyAttrCache
type BackoffCounter ¶
type BackoffCounter struct {
ValueAndBackoff uint16
}
BackoffCounter packs a 12-bit value above a 4-bit backoff field.
CPython: Include/internal/pycore_structs.h _Py_BackoffCounter
func AdaptiveCounterBackoff ¶
func AdaptiveCounterBackoff(c BackoffCounter) BackoffCounter
AdaptiveCounterBackoff computes the next counter after a specialize miss. Mirrors RestartBackoffCounter; kept under a distinct name so the call sites read like the CPython source.
CPython: Python/specialize.c adaptive_counter_backoff
func AdaptiveCounterCooldown ¶
func AdaptiveCounterCooldown() BackoffCounter
func AdaptiveCounterWarmup ¶
func AdaptiveCounterWarmup() BackoffCounter
Adaptive counter shapes used by the specializer. Warmup is the shape Quicken stamps into every fresh cache slot. Cooldown is the shape used after a successful specialize so the next miss has time to settle before a re-specialize attempt.
CPython: Python/specialize.c top of file (adaptive_counter_warmup / adaptive_counter_cooldown helpers)
func AdvanceBackoffCounter ¶
func AdvanceBackoffCounter(c BackoffCounter) BackoffCounter
AdvanceBackoffCounter ticks the value down by one. Called every time the matching adaptive opcode executes.
CPython: Include/internal/pycore_backoff.h:83 advance_backoff_counter
func ForgeBackoffCounter ¶
func ForgeBackoffCounter(raw uint16) BackoffCounter
ForgeBackoffCounter wraps a raw 16-bit pattern. Used when reading an existing counter out of a bytecode cache cell.
CPython: Include/internal/pycore_backoff.h:54 forge_backoff_counter
func InitialJumpBackoffCounter ¶
func InitialJumpBackoffCounter() BackoffCounter
InitialJumpBackoffCounter returns the seed for a JUMP_BACKWARD instruction's counter slot.
CPython: Include/internal/pycore_backoff.h:102 initial_jump_backoff_counter
func InitialSideExitBackoffCounter ¶
func InitialSideExitBackoffCounter() BackoffCounter
InitialSideExitBackoffCounter returns the seed for a Tier-2 side-exit temperature counter.
CPython: Include/internal/pycore_backoff.h:116 initial_temperature_backoff_counter
func InitialUnreachableBackoffCounter ¶
func InitialUnreachableBackoffCounter() BackoffCounter
InitialUnreachableBackoffCounter returns the never-trigger sentinel.
CPython: Include/internal/pycore_backoff.h:124 initial_unreachable_backoff_counter
func LoadCounter ¶
func LoadCounter(code []byte, instr int) BackoffCounter
LoadCounter reads the BackoffCounter that lives in the first cache codeunit of an adaptive instruction. instr is the codeunit index of the opcode itself; the counter sits at instr+1.
CPython: Python/specialize.c:730 load_counter
func MakeBackoffCounter ¶
func MakeBackoffCounter(value, backoff uint16) BackoffCounter
MakeBackoffCounter packs value and backoff into a fresh counter. value must fit in 12 bits, backoff must fit in 4.
CPython: Include/internal/pycore_backoff.h:44 make_backoff_counter
func PauseBackoffCounter ¶
func PauseBackoffCounter(c BackoffCounter) BackoffCounter
PauseBackoffCounter bumps the value by 1<<BackoffBits to push the next trigger out by one tick without changing the backoff field. Used when a specialize attempt should be retried later without escalating the backoff.
CPython: Include/internal/pycore_backoff.h:75 pause_backoff_counter
func RestartBackoffCounter ¶
func RestartBackoffCounter(c BackoffCounter) BackoffCounter
RestartBackoffCounter resets a counter after a specialize miss. The backoff field grows by one (capped at MaxBackoff) and the value is reseeded to 2**backoff - 1.
CPython: Include/internal/pycore_backoff.h:62 restart_backoff_counter
type BinaryOpCache ¶
type BinaryOpCache struct {
Counter BackoffCounter
ExternalCache [4]uint16
}
BinaryOpCache backs BINARY_OP. CACHE_ENTRIES = 5.
CPython: Include/internal/pycore_code.h:76 _PyBinaryOpCache
type CallCache ¶
type CallCache struct {
Counter BackoffCounter
FuncVersion [2]uint16
}
CallCache backs CALL and CALL_KW. CACHE_ENTRIES = 3.
CPython: Include/internal/pycore_code.h:124 _PyCallCache
type CompareOpCache ¶
type CompareOpCache struct {
Counter BackoffCounter
}
CompareOpCache backs COMPARE_OP. CACHE_ENTRIES = 1.
CPython: Include/internal/pycore_code.h:90 _PyCompareOpCache
type ContainsOpCache ¶
type ContainsOpCache struct {
Counter BackoffCounter
}
ContainsOpCache backs CONTAINS_OP. CACHE_ENTRIES = 1.
CPython: Include/internal/pycore_code.h:157 _PyContainsOpCache
type ForIterCache ¶
type ForIterCache struct {
Counter BackoffCounter
}
ForIterCache backs FOR_ITER. CACHE_ENTRIES = 1.
CPython: Include/internal/pycore_code.h:138 _PyForIterCache
type LoadGlobalCache ¶
type LoadGlobalCache struct {
Counter BackoffCounter
ModuleKeysVersion uint16
BuiltinKeysVersion uint16
Index uint16
}
LoadGlobalCache backs LOAD_GLOBAL. CACHE_ENTRIES = 4.
CPython: Include/internal/pycore_code.h:67 _PyLoadGlobalCache
type LoadMethodCache ¶
type LoadMethodCache struct {
Counter BackoffCounter
TypeVersion [2]uint16
Keys [2]uint16
Descr [4]uint16
}
LoadMethodCache backs LOAD_ATTR. CACHE_ENTRIES = 10. The widest adaptive cache; LOAD_ATTR specializes into both attribute lookups and unbound-method dispatch, hence the type/keys versions plus the four-slot descr field.
The C layout uses a union for keys_version / dict_offset; in Go we keep two uint16 fields because the in-memory width is identical and the union arm is selected at the call site.
CPython: Include/internal/pycore_code.h:108 _PyLoadMethodCache
type SendCache ¶
type SendCache struct {
Counter BackoffCounter
}
SendCache backs SEND. CACHE_ENTRIES = 1.
CPython: Include/internal/pycore_code.h:144 _PySendCache
type StoreSubscrCache ¶
type StoreSubscrCache struct {
Counter BackoffCounter
}
StoreSubscrCache backs STORE_SUBSCR. CACHE_ENTRIES = 1.
CPython: Include/internal/pycore_code.h:132 _PyStoreSubscrCache
type SuperAttrCache ¶
type SuperAttrCache struct {
Counter BackoffCounter
}
SuperAttrCache backs LOAD_SUPER_ATTR. CACHE_ENTRIES = 1.
CPython: Include/internal/pycore_code.h:96 _PySuperAttrCache
type ToBoolCache ¶
type ToBoolCache struct {
Counter BackoffCounter
Version [2]uint16
}
ToBoolCache backs TO_BOOL. CACHE_ENTRIES = 3.
CPython: Include/internal/pycore_code.h:150 _PyToBoolCache
type UnpackSequenceCache ¶
type UnpackSequenceCache struct {
Counter BackoffCounter
}
UnpackSequenceCache backs UNPACK_SEQUENCE. CACHE_ENTRIES = 1.
CPython: Include/internal/pycore_code.h:83 _PyUnpackSequenceCache