Command memanim creates an animation of memory accesses over time from a "perf mem record" profile. In the animation, the address space is compacted to remove pages that have no recorded references and then mapped on to Hilbert curve so that nearby accesses appear nearby in 2-D space. It is then broken in to panels showing all accesses, L2-and-up accesses, etc.
The simplest way to record a memory load profile is "perf mem record <cmd>".
To record only load latency events over a threshold number of cycles, use the following command on Sandy Bridge or later:
perf record -W -d -e cpu/event=0xcd,umask=0x1,ldlat=<thresh>/pp <cmd>
The minimum (and default) latency threshold is 3 cycles.
At a reasonably high latency threshold, such as 50 cycles, it's possible to crank up to recording every single load with, e.g., --count 1 -m 1024.
To collect only user-space loads, change pp to ppu.