Documentation
¶
Overview ¶
Package perfreader captures PERF_RECORD_SAMPLE events via perf_event_open with REGS_USER + STACK_USER so userspace can DWARF-unwind the raw stack. It is the input stage of the planned --unwind {fp,dwarf,auto} pipeline; DWARF unwinding and symbolization happen in adjacent packages.
Index ¶
Constants ¶
const ( PerfRegX86AX = 0 PerfRegX86BX = 1 PerfRegX86CX = 2 PerfRegX86DX = 3 PerfRegX86SI = 4 PerfRegX86DI = 5 PerfRegX86BP = 6 PerfRegX86SP = 7 PerfRegX86IP = 8 PerfRegX86Flags = 9 PerfRegX86CS = 10 PerfRegX86SS = 11 PerfRegX86DS = 12 PerfRegX86ES = 13 PerfRegX86FS = 14 PerfRegX86GS = 15 PerfRegX86R8 = 16 PerfRegX86R9 = 17 PerfRegX86R10 = 18 PerfRegX86R11 = 19 PerfRegX86R12 = 20 PerfRegX86R13 = 21 PerfRegX86R14 = 22 PerfRegX86R15 = 23 )
Register indices from linux/perf_regs.h (arch/x86/include/uapi/asm/perf_regs.h). These are the positions in the perf_event sample's regs[] array and must match the order the kernel uses when populating PERF_SAMPLE_REGS_USER.
const SampleRegsUser = uint64(0) | (1 << PerfRegX86AX) | (1 << PerfRegX86BX) | (1 << PerfRegX86CX) | (1 << PerfRegX86DX) | (1 << PerfRegX86SI) | (1 << PerfRegX86DI) | (1 << PerfRegX86BP) | (1 << PerfRegX86SP) | (1 << PerfRegX86IP) | (1 << PerfRegX86R8) | (1 << PerfRegX86R9) | (1 << PerfRegX86R10) | (1 << PerfRegX86R11) | (1 << PerfRegX86R12) | (1 << PerfRegX86R13) | (1 << PerfRegX86R14) | (1 << PerfRegX86R15)
SampleRegsUser is the bitmask of registers we ask the kernel to capture per sample. Includes the minimum needed for DWARF CFI unwinding on x86_64 (IP, SP, BP) plus the general-purpose registers libunwind needs to restore frame state (AX..DI, R8..R15). Flags/segment registers are excluded — DWARF never restores them and they'd just waste ring-buffer bandwidth.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Config ¶
type Config struct {
PID int
CPU int // -1 for any CPU (only valid with PID != -1)
SampleFreq uint64 // samples per second (Hz)
StackBytes uint32 // bytes of user stack to copy per sample
RingPages int // power-of-two data pages in the ring buffer (excl. metadata page)
}
Config parameterizes a Reader. StackBytes must be a multiple of 8 and no larger than 65528 (kernel limit). 8192 is a reasonable default covering typical stack depths without blowing up ring-buffer bandwidth.
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns a sensible Config scaffold. Caller must set PID / CPU.
type Lost ¶
Lost indicates the kernel dropped N samples due to ring-buffer overflow. Emitted separately from Sample via the reader's event channel.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader owns one perf_event fd and its mmap'd ring buffer. A profiler creates one Reader per CPU (or one per PID+CPU combination) and pumps events from Events() until Close().
func NewReader ¶
NewReader opens a perf_event sampling CPU_CLOCK at cfg.SampleFreq Hz, attached to cfg.PID (if >= 0). Requires CAP_PERFMON and CAP_BPF at minimum; caller is expected to have those or equivalent.
func (*Reader) FD ¶
FD exposes the underlying perf_event file descriptor so callers can poll it with epoll/select when integrating into a larger event loop.
func (*Reader) ReadNext ¶
ReadNext drains pending records from the ring buffer, invoking cb for each PERF_RECORD_SAMPLE it parses. Records of types other than SAMPLE (mmap/comm/lost/etc.) are handled internally or ignored.
Returns the number of records consumed (all types, including ignored). Returns 0, nil if the ring is empty — caller should poll the FD and retry.
type Sample ¶
type Sample struct {
IP uint64 // the sampled instruction pointer
PID uint32 // process ID (tgid) of the sampled task
TID uint32 // thread ID of the sampled task
Time uint64 // nanoseconds since boot, monotonic clock
Callchain []uint64 // kernel-walked FP callchain; includes sentinels like PERF_CONTEXT_USER
ABI uint64 // PERF_SAMPLE_REGS_ABI_{NONE,32,64}
Regs []uint64 // captured user registers, order defined by SampleRegsUser
StackAddr uint64 // RSP/SP at sample time — base address of Stack bytes
Stack []byte // raw stack memory starting at StackAddr
}
Sample is one parsed PERF_RECORD_SAMPLE. Fields mirror the kernel's sample format in the order and subset this package requests via Sample_type: PERF_SAMPLE_IP | TID | TIME | CALLCHAIN | REGS_USER | STACK_USER.
Any field the kernel skipped (e.g. REGS_USER when the sampled task was in kernel mode and no user regs were available) will be zero-valued; callers must check ABI / StackSize before trusting Regs / Stack.