perf

package module
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2023 License: MIT Imports: 16 Imported by: 16

README

Perf

GoDoc

This package is a Go library for interacting with the perf subsystem in Linux. I had trouble finding a golang perf library so I decided to write this by using the linux's perf as a reference. This library allows you to do things like see how many CPU instructions a function takes (roughly), profile a process for various hardware events, and other interesting things. Note that because the Go scheduler can schedule a goroutine across many OS threads it becomes rather difficult to get an exact profile of an individual goroutine. However, a few tricks can be used; first a call to runtime.LockOSThread to lock the current goroutine to an OS thread. Second a call to unix.SchedSetaffinity, with a CPU set mask set. Note that if the pid argument is set 0 the calling thread is used (the thread that was just locked). Before using this library you should probably read the perf_event_open man page which this library uses heavily. See this kernel guide for a tutorial how to use perf and some of the limitations.

Use Cases

If you are looking to interact with the perf subsystem directly with perf_event_open syscall than this library is most likely for you. A large number of the utility methods in this package should only be used for testing and/or debugging performance issues. This is due to the nature of the Go runtime being extremely tricky to profile on the goroutine level, with the exception of a long running worker goroutine locked to an OS thread. Eventually this library could be used to implement many of the features of perf but in pure Go. Currently this library is used in node_exporter as well as perf_exporter, which is a Prometheus exporter for perf related metrics.

Caveats

  • Some utility functions will call runtime.LockOSThread for you, they will also unlock the thread after profiling. Note using these utility functions will incur significant overhead (~4ms).
  • Overflow handling is not implemented.

Setup

Most likely you will need to tweak some system settings unless you are running as root. From man perf_event_open:

   perf_event related configuration files
       Files in /proc/sys/kernel/

           /proc/sys/kernel/perf_event_paranoid
                  The perf_event_paranoid file can be set to restrict access to the performance counters.

                  2   allow only user-space measurements (default since Linux 4.6).
                  1   allow both kernel and user measurements (default before Linux 4.6).
                  0   allow access to CPU-specific data but not raw tracepoint samples.
                  -1  no restrictions.

                  The existence of the perf_event_paranoid file is the official method for determining if a kernel supports perf_event_open().

           /proc/sys/kernel/perf_event_max_sample_rate
                  This sets the maximum sample rate.  Setting this too high can allow users to sample at a rate that impacts overall machine performance and potentially lock up the machine.  The default value is 100000  (samples  per
                  second).

           /proc/sys/kernel/perf_event_max_stack
                  This file sets the maximum depth of stack frame entries reported when generating a call trace.

           /proc/sys/kernel/perf_event_mlock_kb
                  Maximum number of pages an unprivileged user can mlock(2).  The default is 516 (kB).

Example

Say you wanted to see how many CPU instructions a particular function took:

package main

import (
	"fmt"
	"log"
	"github.com/hodgesds/perf-utils"
)

func foo() error {
	var total int
	for i:=0;i<1000;i++ {
		total++
	}
	return nil
}

func main() {
	profileValue, err := perf.CPUInstructions(foo)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("CPU instructions: %+v\n", profileValue)
}

Benchmarks

To profile a single function call there is an overhead of ~0.4ms.

$ go test  -bench=BenchmarkCPUCycles .
goos: linux
goarch: amd64
pkg: github.com/hodgesds/perf-utils
BenchmarkCPUCycles-8        3000            397924 ns/op              32 B/op          1 allocs/op
PASS
ok      github.com/hodgesds/perf-utils  1.255s

The Profiler interface has low overhead and suitable for many use cases:

$ go test  -bench=BenchmarkProfiler .
goos: linux
goarch: amd64
pkg: github.com/hodgesds/perf-utils
BenchmarkProfiler-8      3000000               488 ns/op              32 B/op          1 allocs/op
PASS
ok      github.com/hodgesds/perf-utils  1.981s

The RunBenchmarks helper function can be used to run as function as a benchmark and report results from PerfEventAttrs:

func BenchmarkRunBenchmarks(b *testing.B) {

	eventAttrs := []unix.PerfEventAttr{
		CPUInstructionsEventAttr(),
		CPUCyclesEventAttr(),
	}
	RunBenchmarks(
		b,
		func(b *testing.B) {
			for n := 1; n < b.N; n++ {
				a := 42
				for i := 0; i < 1000; i++ {
					a += i
				}
			}
		},
		BenchLock|BenchStrict,
		eventAttrs...,
	)
}

go test  -bench=BenchmarkRunBenchmarks
goos: linux
goarch: amd64
pkg: github.com/hodgesds/iouring-go/go/src/github.com/hodgesds/perf-utils
BenchmarkRunBenchmarks-8         3119304               388 ns/op              1336 hw_cycles/op             3314 hw_instr/op            0 B/op          0 allocs/op

If you want to run a benchmark tracepoints (ie perf list or cat /sys/kernel/debug/tracing/available_events) you can use the BenchmarkTracepoints helper:

func BenchmarkBenchmarkTracepoints(b *testing.B) {
	tracepoints := []string{
		"syscalls:sys_enter_getrusage",
	}
	BenchmarkTracepoints(
		b,
		func(b *testing.B) {
			for n := 1; n < b.N; n++ {
				unix.Getrusage(0, &unix.Rusage{})
			}
		},
		BenchLock|Benchtrict,
		tracepoints...,
	)
}

go test -bench=.
goos: linux
goarch: amd64
pkg: github.com/hodgesds/perf-utils
BenchmarkProfiler-8                              1983320               596 ns/op              32 B/op          1 allocs/op
BenchmarkCPUCycles-8                                2335            484068 ns/op              32 B/op          1 allocs/op
BenchmarkThreadLocking-8                        253319848                4.70 ns/op            0 B/op          0 allocs/op
BenchmarkRunBenchmarks-8                         1906320               627 ns/op              1023 hw_cycles/op       3007 hw_instr/op
BenchmarkRunBenchmarksLocked-8                   1903527               632 ns/op              1025 hw_cycles/op       3007 hw_instr/op
BenchmarkBenchmarkTracepointsLocked-8             986607              1221 ns/op                 2.00 syscalls:sys_enter_getrusage/op          0 B/op          0 allocs/op
BenchmarkBenchmarkTracepoints-8                   906022              1258 ns/op                 2.00 syscalls:sys_enter_getrusage/op          0 B/op          0 allocs/op

BPF Support

BPF is supported by using the BPFProfiler which is available via the ProfileTracepoint function. To use BPF you need to create the BPF program and then call AttachBPF with the file descriptor of the BPF program.

Misc

Originally I set out to use go generate to build Go structs that were compatible with perf, I found a really good article on how to do so. Eventually, after digging through some of the /x/sys/unix code I found pretty much what I was needed. However, I think if you are interested in interacting with the kernel it is a worthwhile read.

Documentation

Index

Constants

View Source
const (
	// AllCacheProfilers is used to try to configure all cache profilers.
	AllCacheProfilers          CacheProfilerType = 0
	L1DataReadHitProfiler      CacheProfilerType = 1 << iota
	L1DataReadMissProfiler     CacheProfilerType = 1 << iota
	L1DataWriteHitProfiler     CacheProfilerType = 1 << iota
	L1InstrReadMissProfiler    CacheProfilerType = 1 << iota
	L1InstrReadHitProfiler     CacheProfilerType = 1 << iota
	LLReadHitProfiler          CacheProfilerType = 1 << iota
	LLReadMissProfiler         CacheProfilerType = 1 << iota
	LLWriteHitProfiler         CacheProfilerType = 1 << iota
	LLWriteMissProfiler        CacheProfilerType = 1 << iota
	DataTLBReadHitProfiler     CacheProfilerType = 1 << iota
	DataTLBReadMissProfiler    CacheProfilerType = 1 << iota
	DataTLBWriteHitProfiler    CacheProfilerType = 1 << iota
	DataTLBWriteMissProfiler   CacheProfilerType = 1 << iota
	InstrTLBReadHitProfiler    CacheProfilerType = 1 << iota
	InstrTLBReadMissProfiler   CacheProfilerType = 1 << iota
	BPUReadHitProfiler         CacheProfilerType = 1 << iota
	BPUReadMissProfiler        CacheProfilerType = 1 << iota
	NodeCacheReadHitProfiler   CacheProfilerType = 1 << iota
	NodeCacheReadMissProfiler  CacheProfilerType = 1 << iota
	NodeCacheWriteHitProfiler  CacheProfilerType = 1 << iota
	NodeCacheWriteMissProfiler CacheProfilerType = 1 << iota

	// L1DataReadHit is a constant...
	L1DataReadHit = (unix.PERF_COUNT_HW_CACHE_L1D) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// L1DataReadMiss is a constant...
	L1DataReadMiss = (unix.PERF_COUNT_HW_CACHE_L1D) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)
	// L1DataWriteHit is a constant...
	L1DataWriteHit = (unix.PERF_COUNT_HW_CACHE_L1D) | (unix.PERF_COUNT_HW_CACHE_OP_WRITE << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// L1InstrReadMiss is a constant...
	L1InstrReadMiss = (unix.PERF_COUNT_HW_CACHE_L1I) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)
	// L1InstrReadHit is a constant...
	L1InstrReadHit = (unix.PERF_COUNT_HW_CACHE_L1I) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)

	// LLReadHit is a constant...
	LLReadHit = (unix.PERF_COUNT_HW_CACHE_LL) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// LLReadMiss is a constant...
	LLReadMiss = (unix.PERF_COUNT_HW_CACHE_LL) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)
	// LLWriteHit is a constant...
	LLWriteHit = (unix.PERF_COUNT_HW_CACHE_LL) | (unix.PERF_COUNT_HW_CACHE_OP_WRITE << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// LLWriteMiss is a constant...
	LLWriteMiss = (unix.PERF_COUNT_HW_CACHE_LL) | (unix.PERF_COUNT_HW_CACHE_OP_WRITE << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)

	// DataTLBReadHit is a constant...
	DataTLBReadHit = (unix.PERF_COUNT_HW_CACHE_DTLB) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// DataTLBReadMiss is a constant...
	DataTLBReadMiss = (unix.PERF_COUNT_HW_CACHE_DTLB) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)
	// DataTLBWriteHit is a constant...
	DataTLBWriteHit = (unix.PERF_COUNT_HW_CACHE_DTLB) | (unix.PERF_COUNT_HW_CACHE_OP_WRITE << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// DataTLBWriteMiss is a constant...
	DataTLBWriteMiss = (unix.PERF_COUNT_HW_CACHE_DTLB) | (unix.PERF_COUNT_HW_CACHE_OP_WRITE << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)

	// InstrTLBReadHit is a constant...
	InstrTLBReadHit = (unix.PERF_COUNT_HW_CACHE_ITLB) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// InstrTLBReadMiss is a constant...
	InstrTLBReadMiss = (unix.PERF_COUNT_HW_CACHE_ITLB) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)

	// BPUReadHit is a constant...
	BPUReadHit = (unix.PERF_COUNT_HW_CACHE_BPU) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// BPUReadMiss is a constant...
	BPUReadMiss = (unix.PERF_COUNT_HW_CACHE_BPU) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)

	// NodeCacheReadHit is a constant...
	NodeCacheReadHit = (unix.PERF_COUNT_HW_CACHE_NODE) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// NodeCacheReadMiss is a constant...
	NodeCacheReadMiss = (unix.PERF_COUNT_HW_CACHE_NODE) | (unix.PERF_COUNT_HW_CACHE_OP_READ << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)
	// NodeCacheWriteHit is a constant...
	NodeCacheWriteHit = (unix.PERF_COUNT_HW_CACHE_NODE) | (unix.PERF_COUNT_HW_CACHE_OP_WRITE << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16)
	// NodeCacheWriteMiss is a constant...
	NodeCacheWriteMiss = (unix.PERF_COUNT_HW_CACHE_NODE) | (unix.PERF_COUNT_HW_CACHE_OP_WRITE << 8) | (unix.PERF_COUNT_HW_CACHE_RESULT_MISS << 16)
)
View Source
const (
	// DebugFS is the filesystem type for debugfs.
	DebugFS = "debugfs"

	// TraceFS is the filesystem type for tracefs.
	TraceFS = "tracefs"

	// ProcMounts is the mount point for file systems in procfs.
	ProcMounts = "/proc/mounts"

	// PerfMaxStack is the mount point for the max perf event size.
	PerfMaxStack = "/proc/sys/kernel/perf_event_max_stack"

	// PerfMaxContexts is a sysfs mount that contains the max perf contexts.
	PerfMaxContexts = "/proc/sys/kernel/perf_event_max_contexts_per_stack"

	// SyscallsDir is a constant of the default tracing event syscalls directory.
	SyscallsDir = "/sys/kernel/debug/tracing/events/syscalls/"

	// TracingDir is a constant of the default tracing directory.
	TracingDir = "/sys/kernel/debug/tracing"
)
View Source
const (
	// PERF_SAMPLE_IDENTIFIER is not defined in x/sys/unix.
	PERF_SAMPLE_IDENTIFIER = 1 << 16

	// PERF_IOC_FLAG_GROUP is not defined in x/sys/unix.
	PERF_IOC_FLAG_GROUP = 1 << 0
)
View Source
const (
	// MSRBaseDir is the base dir for MSRs.
	MSRBaseDir = "/dev/cpu"
)
View Source
const (
	// PERF_TYPE_TRACEPOINT is a kernel tracepoint.
	PERF_TYPE_TRACEPOINT = 2
)
View Source
const (
	PMUEventBaseDir = "/sys/bus/event_source/devices"
)

Variables

View Source
var (
	// ErrNoProfiler is returned when no profiler is available for profiling.
	ErrNoProfiler = fmt.Errorf("No profiler available")

	// ProfileValuePool is a sync.Pool of ProfileValue structs.
	ProfileValuePool = sync.Pool{
		New: func() interface{} {
			return &ProfileValue{}
		},
	}
)
View Source
var ErrNoLeader = fmt.Errorf("No leader defined")

ErrNoLeader is returned when a leader of a GroupProfiler is not defined.

View Source
var (
	// ErrNoMount is when there is no such mount.
	ErrNoMount = fmt.Errorf("no such mount")
)
View Source
var (
	// EventAttrSize is the size of a PerfEventAttr
	EventAttrSize = uint32(unsafe.Sizeof(unix.PerfEventAttr{}))
)

Functions

func AlignmentFaultsEventAttr added in v0.0.4

func AlignmentFaultsEventAttr() unix.PerfEventAttr

AlignmentFaultsEventAttr returns a unix.PerfEventAttr configured for AlignmentFaults.

func AvailableEvents added in v0.0.2

func AvailableEvents() (map[string][]string, error)

AvailableEvents returns a mapping of available subsystems and their corresponding list of available events.

func AvailablePMUs added in v0.3.0

func AvailablePMUs() (map[string]int, error)

AvailablePMUs returns a mapping of available PMUs from /sys/bus/event_sources/devices to the PMU event type (number).

func AvailableSubsystems added in v0.0.7

func AvailableSubsystems() ([]string, error)

AvailableSubsystems returns a slice of available subsystems.

func AvailableTracers added in v0.0.2

func AvailableTracers() ([]string, error)

AvailableTracers returns the list of available tracers.

func BPUEventAttr added in v0.0.4

func BPUEventAttr(op, result int) unix.PerfEventAttr

BPUEventAttr returns a unix.PerfEventAttr configured for BPU events.

func BenchmarkTracepoints added in v0.1.1

func BenchmarkTracepoints(
	b *testing.B,
	f func(b *testing.B),
	options BenchOpt,
	tracepoints ...string,
)

BenchmarkTracepoints runs benchmark and counts the

func BusCyclesEventAttr added in v0.0.4

func BusCyclesEventAttr() unix.PerfEventAttr

BusCyclesEventAttr returns a unix.PerfEventAttr configured for BusCycles.

func CPUClockEventAttr added in v0.0.4

func CPUClockEventAttr() unix.PerfEventAttr

CPUClockEventAttr returns a unix.PerfEventAttr configured for CPUClock.

func CPUCyclesEventAttr added in v0.0.4

func CPUCyclesEventAttr() unix.PerfEventAttr

CPUCyclesEventAttr returns a unix.PerfEventAttr configured for CPUCycles.

func CPUInstructionsEventAttr added in v0.0.4

func CPUInstructionsEventAttr() unix.PerfEventAttr

CPUInstructionsEventAttr returns a unix.PerfEventAttr configured for CPUInstructions.

func CPUMigrationsEventAttr added in v0.0.4

func CPUMigrationsEventAttr() unix.PerfEventAttr

CPUMigrationsEventAttr returns a unix.PerfEventAttr configured for CPUMigrations.

func CPURefCyclesEventAttr added in v0.0.4

func CPURefCyclesEventAttr() unix.PerfEventAttr

CPURefCyclesEventAttr returns a unix.PerfEventAttr configured for CPURefCycles.

func CPUTaskClockEventAttr added in v0.0.4

func CPUTaskClockEventAttr() unix.PerfEventAttr

CPUTaskClockEventAttr returns a unix.PerfEventAttr configured for CPUTaskClock.

func CacheMissEventAttr added in v0.0.4

func CacheMissEventAttr() unix.PerfEventAttr

CacheMissEventAttr returns a unix.PerfEventAttr configured for CacheMisses.

func CacheRefEventAttr added in v0.0.4

func CacheRefEventAttr() unix.PerfEventAttr

CacheRefEventAttr returns a unix.PerfEventAttr configured for CacheRef.

func ContextSwitchesEventAttr added in v0.0.4

func ContextSwitchesEventAttr() unix.PerfEventAttr

ContextSwitchesEventAttr returns a unix.PerfEventAttr configured for ContextSwitches.

func CurrentTracer added in v0.0.2

func CurrentTracer() (string, error)

CurrentTracer returns the current tracer.

func DataTLBEventAttr added in v0.0.4

func DataTLBEventAttr(op, result int) unix.PerfEventAttr

DataTLBEventAttr returns a unix.PerfEventAttr configured for DataTLB.

func DebugFSMount

func DebugFSMount() (string, error)

DebugFSMount returns the first found mount point of a debugfs file system.

func EmulationFaultsEventAttr added in v0.0.4

func EmulationFaultsEventAttr() unix.PerfEventAttr

EmulationFaultsEventAttr returns a unix.PerfEventAttr configured for EmulationFaults.

func EventAttrString added in v0.0.9

func EventAttrString(eventAttr *unix.PerfEventAttr) string

EventAttrString returns a short string representation of a unix.PerfEventAttr.

func GetFSMount

func GetFSMount(mountType string) ([]string, error)

GetFSMount is a helper function to get a mount file system type.

func GetTracepointConfig added in v0.0.7

func GetTracepointConfig(subsystem, event string) (uint64, error)

GetTracepointConfig is used to get the configuration for a trace event.

func InstructionTLBEventAttr added in v0.0.4

func InstructionTLBEventAttr(op, result int) unix.PerfEventAttr

InstructionTLBEventAttr returns a unix.PerfEventAttr configured for InstructionTLB.

func L1DataEventAttr added in v0.0.4

func L1DataEventAttr(op, result int) unix.PerfEventAttr

L1DataEventAttr returns a unix.PerfEventAttr configured for L1Data.

func L1InstructionsEventAttr added in v0.0.4

func L1InstructionsEventAttr(op, result int) unix.PerfEventAttr

L1InstructionsEventAttr returns a unix.PerfEventAttr configured for L1Instructions.

func LLCacheEventAttr added in v0.0.4

func LLCacheEventAttr(op, result int) unix.PerfEventAttr

LLCacheEventAttr returns a unix.PerfEventAttr configured for LLCache.

func LockThread added in v0.0.8

func LockThread(core int) (func(), error)

LockThread locks an goroutine to an OS thread and then sets the affinity of the thread to a processor core.

func MSRPaths added in v0.3.0

func MSRPaths() ([]string, error)

MSRPaths returns the set of MSR paths.

func MajorPageFaultsEventAttr added in v0.0.4

func MajorPageFaultsEventAttr() unix.PerfEventAttr

MajorPageFaultsEventAttr returns a unix.PerfEventAttr configured for MajorPageFaults.

func MaxOpenFiles added in v0.5.1

func MaxOpenFiles() (uint64, error)

MaxOpenFiles returns the RLIMIT_NOFILE from getrlimit.

func MinorPageFaultsEventAttr added in v0.0.4

func MinorPageFaultsEventAttr() unix.PerfEventAttr

MinorPageFaultsEventAttr returns a unix.PerfEventAttr configured for MinorPageFaults.

func NodeCacheEventAttr added in v0.0.4

func NodeCacheEventAttr(op, result int) unix.PerfEventAttr

NodeCacheEventAttr returns a unix.PerfEventAttr configured for NUMA cache operations.

func PageFaultsEventAttr added in v0.0.4

func PageFaultsEventAttr() unix.PerfEventAttr

PageFaultsEventAttr returns a unix.PerfEventAttr configured for PageFaults.

func RunBenchmarks added in v0.0.9

func RunBenchmarks(
	b *testing.B,
	f func(b *testing.B),
	options BenchOpt,
	eventAttrs ...unix.PerfEventAttr,
)

RunBenchmarks runs a series of benchmarks for a set of PerfEventAttrs.

func StalledBackendCyclesEventAttr added in v0.0.4

func StalledBackendCyclesEventAttr() unix.PerfEventAttr

StalledBackendCyclesEventAttr returns a unix.PerfEventAttr configured for StalledBackendCycles.

func StalledFrontendCyclesEventAttr added in v0.0.4

func StalledFrontendCyclesEventAttr() unix.PerfEventAttr

StalledFrontendCyclesEventAttr returns a unix.PerfEventAttr configured for StalledFrontendCycles.

func TraceFSMount

func TraceFSMount() (string, error)

TraceFSMount returns the first found mount point of a tracefs file system.

func TracepointEventAttr added in v0.0.7

func TracepointEventAttr(subsystem, event string) (*unix.PerfEventAttr, error)

TracepointEventAttr is used to return an PerfEventAttr for a trace event.

Types

type BPFProfiler added in v0.0.3

type BPFProfiler interface {
	Profiler
	AttachBPF(int) error
}

BPFProfiler is a Profiler that allows attaching a Berkeley Packet Filter (BPF) program to an existing kprobe tracepoint event. You need CAP_SYS_ADMIN privileges to use this interface. See: https://lwn.net/Articles/683504/

func ProfileTracepoint added in v0.0.3

func ProfileTracepoint(subsystem, event string, pid, cpu int, opts ...int) (BPFProfiler, error)

ProfileTracepoint is used to profile a kernel tracepoint event for a specific PID. Events can be listed with `perf list` for Tracepoint Events or in the /sys/kernel/debug/tracing/events directory with the kind being the directory and the event being the subdirectory.

type BenchOpt added in v0.2.0

type BenchOpt uint8

BenchOpt is a benchmark option.

const (
	// BenchLock is used to lock a benchmark to a goroutine.
	BenchLock BenchOpt = 1 << iota
	// BenchStrict is used to fail a benchmark if one or more events can be
	// profiled.
	BenchStrict
)

type CacheProfile

type CacheProfile struct {
	L1DataReadHit      *uint64 `json:"l1_data_read_hit,omitempty"`
	L1DataReadMiss     *uint64 `json:"l1_data_read_miss,omitempty"`
	L1DataWriteHit     *uint64 `json:"l1_data_write_hit,omitempty"`
	L1InstrReadMiss    *uint64 `json:"l1_instr_read_miss,omitempty"`
	LastLevelReadHit   *uint64 `json:"last_level_read_hit,omitempty"`
	LastLevelReadMiss  *uint64 `json:"last_level_read_miss,omitempty"`
	LastLevelWriteHit  *uint64 `json:"last_level_write_hit,omitempty"`
	LastLevelWriteMiss *uint64 `json:"last_level_write_miss,omitempty"`
	DataTLBReadHit     *uint64 `json:"data_tlb_read_hit,omitempty"`
	DataTLBReadMiss    *uint64 `json:"data_tlb_read_miss,omitempty"`
	DataTLBWriteHit    *uint64 `json:"data_tlb_write_hit,omitempty"`
	DataTLBWriteMiss   *uint64 `json:"data_tlb_write_miss,omitempty"`
	InstrTLBReadHit    *uint64 `json:"instr_tlb_read_hit,omitempty"`
	InstrTLBReadMiss   *uint64 `json:"instr_tlb_read_miss,omitempty"`
	BPUReadHit         *uint64 `json:"bpu_read_hit,omitempty"`
	BPUReadMiss        *uint64 `json:"bpu_read_miss,omitempty"`
	NodeReadHit        *uint64 `json:"node_read_hit,omitempty"`
	NodeReadMiss       *uint64 `json:"node_read_miss,omitempty"`
	NodeWriteHit       *uint64 `json:"node_write_hit,omitempty"`
	NodeWriteMiss      *uint64 `json:"node_write_miss,omitempty"`
	TimeEnabled        *uint64 `json:"time_enabled,omitempty"`
	TimeRunning        *uint64 `json:"time_running,omitempty"`
}

CacheProfile is returned by a CacheProfiler.

func (*CacheProfile) Reset added in v0.5.0

func (p *CacheProfile) Reset()

Reset sets all values to defaults and will nil any pointers.

type CacheProfiler

type CacheProfiler interface {
	Start() error
	Reset() error
	Stop() error
	Close() error
	Profile(*CacheProfile) error
	HasProfilers() bool
}

CacheProfiler is a cache profiler.

func NewCacheProfiler

func NewCacheProfiler(pid, cpu int, profilerSet CacheProfilerType, opts ...int) (CacheProfiler, error)

NewCacheProfiler returns a new cache profiler.

type CacheProfilerType added in v0.5.0

type CacheProfilerType int

type GroupProfileValue added in v0.0.4

type GroupProfileValue struct {
	Events      uint64
	TimeEnabled uint64
	TimeRunning uint64
	Values      []uint64
}

GroupProfileValue is returned from a GroupProfiler.

type GroupProfiler added in v0.0.4

type GroupProfiler interface {
	Start() error
	Reset() error
	Stop() error
	Close() error
	HasProfilers() bool
	Profile(*GroupProfileValue) error
}

GroupProfiler is used to setup a group profiler.

func NewGroupProfiler added in v0.0.4

func NewGroupProfiler(pid, cpu, opts int, eventAttrs ...unix.PerfEventAttr) (GroupProfiler, error)

NewGroupProfiler returns a GroupProfiler.

type HardwareProfile

type HardwareProfile struct {
	CPUCycles             *uint64 `json:"cpu_cycles,omitempty"`
	Instructions          *uint64 `json:"instructions,omitempty"`
	CacheRefs             *uint64 `json:"cache_refs,omitempty"`
	CacheMisses           *uint64 `json:"cache_misses,omitempty"`
	BranchInstr           *uint64 `json:"branch_instr,omitempty"`
	BranchMisses          *uint64 `json:"branch_misses,omitempty"`
	BusCycles             *uint64 `json:"bus_cycles,omitempty"`
	StalledCyclesFrontend *uint64 `json:"stalled_cycles_frontend,omitempty"`
	StalledCyclesBackend  *uint64 `json:"stalled_cycles_backend,omitempty"`
	RefCPUCycles          *uint64 `json:"ref_cpu_cycles,omitempty"`
	TimeEnabled           *uint64 `json:"time_enabled,omitempty"`
	TimeRunning           *uint64 `json:"time_running,omitempty"`
}

HardwareProfile is returned by a HardwareProfiler. Depending on kernel configuration some fields may return nil.

func (*HardwareProfile) Reset added in v0.5.0

func (p *HardwareProfile) Reset()

Reset sets all values to defaults and will nil any pointers.

type HardwareProfiler

type HardwareProfiler interface {
	Start() error
	Reset() error
	Stop() error
	Close() error
	Profile(*HardwareProfile) error
	HasProfilers() bool
}

HardwareProfiler is a hardware profiler.

func NewHardwareProfiler

func NewHardwareProfiler(pid, cpu int, profilerSet HardwareProfilerType, opts ...int) (HardwareProfiler, error)

NewHardwareProfiler returns a new hardware profiler.

type HardwareProfilerType added in v0.5.0

type HardwareProfilerType int
const (
	AllHardwareProfilers          HardwareProfilerType = 0
	CpuCyclesProfiler             HardwareProfilerType = 1 << iota
	CpuInstrProfiler              HardwareProfilerType = 1 << iota
	CacheRefProfiler              HardwareProfilerType = 1 << iota
	CacheMissesProfiler           HardwareProfilerType = 1 << iota
	BranchInstrProfiler           HardwareProfilerType = 1 << iota
	BranchMissesProfiler          HardwareProfilerType = 1 << iota
	BusCyclesProfiler             HardwareProfilerType = 1 << iota
	StalledCyclesBackendProfiler  HardwareProfilerType = 1 << iota
	StalledCyclesFrontendProfiler HardwareProfilerType = 1 << iota
	RefCpuCyclesProfiler          HardwareProfilerType = 1 << iota
)

type MSR added in v0.3.0

type MSR struct {
	// contains filtered or unexported fields
}

MSR represents a Model Specific Register

func MSRs added in v0.3.0

func MSRs(flag int, perm os.FileMode, onErr func(error)) []*MSR

MSRs attemps to return all available MSRs.

func NewMSR added in v0.3.0

func NewMSR(path string, flag int, perm os.FileMode) (*MSR, error)

NewMSR returns a MSR.

func (*MSR) Close added in v0.3.0

func (m *MSR) Close() error

Close is used to close the MSR.

func (*MSR) Read added in v0.3.0

func (m *MSR) Read(off int64, buf []byte) error

Read is used to read a MSR value.

type ProfileValue

type ProfileValue struct {
	Value       uint64
	TimeEnabled uint64
	TimeRunning uint64
}

ProfileValue is a value returned by a profiler.

func AlignmentFaults

func AlignmentFaults(f func() error) (*ProfileValue, error)

AlignmentFaults is used to profile a function and return the number of alignment faults. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func BPU added in v0.0.2

func BPU(op, result int, f func() error) (*ProfileValue, error)

BPU is used to profile a function for the Branch Predictor Unit. Use PERF_COUNT_HW_CACHE_OP_READ, PERF_COUNT_HW_CACHE_OP_WRITE, or PERF_COUNT_HW_CACHE_OP_PREFETCH for the opt and PERF_COUNT_HW_CACHE_RESULT_ACCESS or PERF_COUNT_HW_CACHE_RESULT_MISS for the result. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func BusCycles

func BusCycles(f func() error) (*ProfileValue, error)

BusCycles is used to profile a function and return the number of bus cycles. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func CPUClock

func CPUClock(f func() error) (*ProfileValue, error)

CPUClock is used to profile a function and return the CPU clock timer. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func CPUCycles

func CPUCycles(f func() error) (*ProfileValue, error)

CPUCycles is used to profile a function and return the number of CPU cycles. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func CPUInstructions

func CPUInstructions(f func() error) (*ProfileValue, error)

CPUInstructions is used to profile a function and return the number of CPU instructions. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func CPUMigrations

func CPUMigrations(f func() error) (*ProfileValue, error)

CPUMigrations is used to profile a function and return the number of times the thread has been migrated to a new CPU. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func CPURefCycles

func CPURefCycles(f func() error) (*ProfileValue, error)

CPURefCycles is used to profile a function and return the number of CPU references cycles which are not affected by frequency scaling. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func CPUTaskClock

func CPUTaskClock(f func() error) (*ProfileValue, error)

CPUTaskClock is used to profile a function and return the CPU clock timer for the running task. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func CacheMiss

func CacheMiss(f func() error) (*ProfileValue, error)

CacheMiss is used to profile a function and return the number of cache misses. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func CacheRef

func CacheRef(f func() error) (*ProfileValue, error)

CacheRef is used to profile a function and return the number of cache references. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func ContextSwitches

func ContextSwitches(f func() error) (*ProfileValue, error)

ContextSwitches is used to profile a function and return the number of context switches. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func DataTLB added in v0.0.2

func DataTLB(op, result int, f func() error) (*ProfileValue, error)

DataTLB is used to profile the data TLB. Use PERF_COUNT_HW_CACHE_OP_READ, PERF_COUNT_HW_CACHE_OP_WRITE, or PERF_COUNT_HW_CACHE_OP_PREFETCH for the opt and PERF_COUNT_HW_CACHE_RESULT_ACCESS or PERF_COUNT_HW_CACHE_RESULT_MISS for the result. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func EmulationFaults

func EmulationFaults(f func() error) (*ProfileValue, error)

EmulationFaults is used to profile a function and return the number of emulation faults. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func InstructionTLB added in v0.0.2

func InstructionTLB(op, result int, f func() error) (*ProfileValue, error)

InstructionTLB is used to profile the instruction TLB. Use PERF_COUNT_HW_CACHE_OP_READ, PERF_COUNT_HW_CACHE_OP_WRITE, or PERF_COUNT_HW_CACHE_OP_PREFETCH for the opt and PERF_COUNT_HW_CACHE_RESULT_ACCESS or PERF_COUNT_HW_CACHE_RESULT_MISS for the result. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func L1Data added in v0.0.2

func L1Data(op, result int, f func() error) (*ProfileValue, error)

L1Data is used to profile a function and the L1 data cache faults. Use PERF_COUNT_HW_CACHE_OP_READ, PERF_COUNT_HW_CACHE_OP_WRITE, or PERF_COUNT_HW_CACHE_OP_PREFETCH for the opt and PERF_COUNT_HW_CACHE_RESULT_ACCESS or PERF_COUNT_HW_CACHE_RESULT_MISS for the result. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func L1Instructions added in v0.0.2

func L1Instructions(op, result int, f func() error) (*ProfileValue, error)

L1Instructions is used to profile a function for the instruction level L1 cache. Use PERF_COUNT_HW_CACHE_OP_READ, PERF_COUNT_HW_CACHE_OP_WRITE, or PERF_COUNT_HW_CACHE_OP_PREFETCH for the opt and PERF_COUNT_HW_CACHE_RESULT_ACCESS or PERF_COUNT_HW_CACHE_RESULT_MISS for the result. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func LLCache added in v0.0.2

func LLCache(op, result int, f func() error) (*ProfileValue, error)

LLCache is used to profile a function and return the number of emulation PERF_COUNT_HW_CACHE_OP_READ, PERF_COUNT_HW_CACHE_OP_WRITE, or PERF_COUNT_HW_CACHE_OP_PREFETCH for the opt and PERF_COUNT_HW_CACHE_RESULT_ACCESS or PERF_COUNT_HW_CACHE_RESULT_MISS for the result. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func MajorPageFaults

func MajorPageFaults(f func() error) (*ProfileValue, error)

MajorPageFaults is used to profile a function and return the number of major page faults. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func MinorPageFaults

func MinorPageFaults(f func() error) (*ProfileValue, error)

MinorPageFaults is used to profile a function and return the number of minor page faults. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func NodeCache added in v0.0.2

func NodeCache(op, result int, f func() error) (*ProfileValue, error)

NodeCache is used to profile a function for NUMA operations. Use Use PERF_COUNT_HW_CACHE_OP_READ, PERF_COUNT_HW_CACHE_OP_WRITE, or PERF_COUNT_HW_CACHE_OP_PREFETCH for the opt and PERF_COUNT_HW_CACHE_RESULT_ACCESS or PERF_COUNT_HW_CACHE_RESULT_MISS for the result. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func PageFaults

func PageFaults(f func() error) (*ProfileValue, error)

PageFaults is used to profile a function and return the number of page faults. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func StalledBackendCycles

func StalledBackendCycles(f func() error) (*ProfileValue, error)

StalledBackendCycles is used to profile a function and return the number of stalled backend cycles. Note that it will call runtime.LockOSThread to ensure accurate profilng.

func StalledFrontendCycles

func StalledFrontendCycles(f func() error) (*ProfileValue, error)

StalledFrontendCycles is used to profile a function and return the number of stalled frontend cycles. Note that it will call runtime.LockOSThread to ensure accurate profilng.

type Profiler

type Profiler interface {
	Start() error
	Reset() error
	Stop() error
	Close() error
	Profile(*ProfileValue) error
}

Profiler is a profiler.

func NewAlignFaultsProfiler

func NewAlignFaultsProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewAlignFaultsProfiler returns a Profiler that profiles the number of alignment faults.

func NewBPUProfiler

func NewBPUProfiler(pid, cpu, op, result int, opts ...int) (Profiler, error)

NewBPUProfiler returns a Profiler that profiles the BPU (branch prediction unit).

func NewBranchInstrProfiler

func NewBranchInstrProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewBranchInstrProfiler returns a Profiler that profiles branch instructions.

func NewBranchMissesProfiler

func NewBranchMissesProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewBranchMissesProfiler returns a Profiler that profiles branch misses.

func NewBusCyclesProfiler

func NewBusCyclesProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewBusCyclesProfiler returns a Profiler that profiles bus cycles.

func NewCPUClockProfiler

func NewCPUClockProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewCPUClockProfiler returns a Profiler that profiles CPU clock speed.

func NewCPUCycleProfiler

func NewCPUCycleProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewCPUCycleProfiler returns a Profiler that profiles CPU cycles.

func NewCPUMigrationsProfiler

func NewCPUMigrationsProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewCPUMigrationsProfiler returns a Profiler that profiles the number of times the process has migrated to a new CPU.

func NewCacheMissesProfiler

func NewCacheMissesProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewCacheMissesProfiler returns a Profiler that profiles cache misses.

func NewCacheRefProfiler

func NewCacheRefProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewCacheRefProfiler returns a Profiler that profiles cache references.

func NewCtxSwitchesProfiler

func NewCtxSwitchesProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewCtxSwitchesProfiler returns a Profiler that profiles the number of context switches.

func NewDataTLBProfiler

func NewDataTLBProfiler(pid, cpu, op, result int, opts ...int) (Profiler, error)

NewDataTLBProfiler returns a Profiler that profiles the data TLB.

func NewEmulationFaultsProfiler

func NewEmulationFaultsProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewEmulationFaultsProfiler returns a Profiler that profiles the number of alignment faults.

func NewInstrProfiler

func NewInstrProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewInstrProfiler returns a Profiler that profiles CPU instructions.

func NewInstrTLBProfiler

func NewInstrTLBProfiler(pid, cpu, op, result int, opts ...int) (Profiler, error)

NewInstrTLBProfiler returns a Profiler that profiles the instruction TLB.

func NewL1DataProfiler

func NewL1DataProfiler(pid, cpu, op, result int, opts ...int) (Profiler, error)

NewL1DataProfiler returns a Profiler that profiles L1 cache data.

func NewL1InstrProfiler

func NewL1InstrProfiler(pid, cpu, op, result int, opts ...int) (Profiler, error)

NewL1InstrProfiler returns a Profiler that profiles L1 instruction data.

func NewLLCacheProfiler

func NewLLCacheProfiler(pid, cpu, op, result int, opts ...int) (Profiler, error)

NewLLCacheProfiler returns a Profiler that profiles last level cache.

func NewMajorFaultsProfiler

func NewMajorFaultsProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewMajorFaultsProfiler returns a Profiler that profiles the number of major page faults.

func NewMinorFaultsProfiler

func NewMinorFaultsProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewMinorFaultsProfiler returns a Profiler that profiles the number of minor page faults.

func NewNodeCacheProfiler

func NewNodeCacheProfiler(pid, cpu, op, result int, opts ...int) (Profiler, error)

NewNodeCacheProfiler returns a Profiler that profiles the node cache accesses.

func NewPageFaultProfiler

func NewPageFaultProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewPageFaultProfiler returns a Profiler that profiles the number of page faults.

func NewProfiler

func NewProfiler(profilerType uint32, config uint64, pid, cpu int, opts ...int) (Profiler, error)

NewProfiler creates a new hardware profiler. It does not support grouping.

func NewRefCPUCyclesProfiler

func NewRefCPUCyclesProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewRefCPUCyclesProfiler returns a Profiler that profiles CPU cycles, it is not affected by frequency scaling.

func NewStalledCyclesBackProfiler

func NewStalledCyclesBackProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewStalledCyclesBackProfiler returns a Profiler that profiles stalled backend cycles.

func NewStalledCyclesFrontProfiler

func NewStalledCyclesFrontProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewStalledCyclesFrontProfiler returns a Profiler that profiles stalled frontend cycles.

func NewTaskClockProfiler

func NewTaskClockProfiler(pid, cpu int, opts ...int) (Profiler, error)

NewTaskClockProfiler returns a Profiler that profiles clock count of the running task.

type SoftwareProfile

type SoftwareProfile struct {
	CPUClock        *uint64 `json:"cpu_clock,omitempty"`
	TaskClock       *uint64 `json:"task_clock,omitempty"`
	PageFaults      *uint64 `json:"page_faults,omitempty"`
	ContextSwitches *uint64 `json:"context_switches,omitempty"`
	CPUMigrations   *uint64 `json:"cpu_migrations,omitempty"`
	MinorPageFaults *uint64 `json:"minor_page_faults,omitempty"`
	MajorPageFaults *uint64 `json:"major_page_faults,omitempty"`
	AlignmentFaults *uint64 `json:"alignment_faults,omitempty"`
	EmulationFaults *uint64 `json:"emulation_faults,omitempty"`
	TimeEnabled     *uint64 `json:"time_enabled,omitempty"`
	TimeRunning     *uint64 `json:"time_running,omitempty"`
}

SoftwareProfile is returned by a SoftwareProfiler.

func (*SoftwareProfile) Reset added in v0.5.0

func (p *SoftwareProfile) Reset()

Reset sets all values to defaults and will nil any pointers.

type SoftwareProfiler

type SoftwareProfiler interface {
	Start() error
	Reset() error
	Stop() error
	Close() error
	Profile(*SoftwareProfile) error
	HasProfilers() bool
}

SoftwareProfiler is a software profiler.

func NewSoftwareProfiler

func NewSoftwareProfiler(pid, cpu int, profilerSet SoftwareProfilerType, opts ...int) (SoftwareProfiler, error)

NewSoftwareProfiler returns a new software profiler.

type SoftwareProfilerType added in v0.5.0

type SoftwareProfilerType int
const (
	AllSoftwareProfilers  SoftwareProfilerType = 0
	CpuClockProfiler      SoftwareProfilerType = 1 << iota
	TaskClockProfiler     SoftwareProfilerType = 1 << iota
	PageFaultProfiler     SoftwareProfilerType = 1 << iota
	ContextSwitchProfiler SoftwareProfilerType = 1 << iota
	CpuMigrationProfiler  SoftwareProfilerType = 1 << iota
	MinorFaultProfiler    SoftwareProfilerType = 1 << iota
	MajorFaultProfiler    SoftwareProfilerType = 1 << iota
	AlignFaultProfiler    SoftwareProfilerType = 1 << iota
	EmuFaultProfiler      SoftwareProfilerType = 1 << iota
)

Directories

Path Synopsis
msr
zen

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL