What can performance counters do for memory subsystem analysis?
Abstract
Nowadays, all major processors provide a set of performance counters which
capture micro-architectural level information, such as the number of elapsed
cycles, cache misses, or instructions executed. Counters can be found in
processor cores, processor die, chipsets, or in I/O cards. They can provide a
wealth of information as to how the hardware is being used by software. Many
processors now support events to measure precisely and with very limited overhead,
the traffic between a core and the memory subsystem. It is possible to
compute average load latency and bus bandwidth utilization. This valuable
information can be used to improve code quality and placement of threads to
maximize hardware utilization.
capture micro-architectural level information, such as the number of elapsed
cycles, cache misses, or instructions executed. Counters can be found in
processor cores, processor die, chipsets, or in I/O cards. They can provide a
wealth of information as to how the hardware is being used by software. Many
processors now support events to measure precisely and with very limited overhead,
the traffic between a core and the memory subsystem. It is possible to
compute average load latency and bus bandwidth utilization. This valuable
information can be used to improve code quality and placement of threads to
maximize hardware utilization.