Message ID | 20240103031409.2504051-1-dapeng1.mi@linux.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | pmu test bugs fix and improvements | expand |
Kindly ping ... On 1/3/2024 11:13 AM, Dapeng Mi wrote: > When running pmu test on Sapphire Rapids, we found sometimes pmu test > reports the following failures. > > 1. FAIL: Intel: all counters > 2. FAIL: Intel: core cycles-0 > 3. FAIL: Intel: llc misses-4 > > Further investigation shows these failures are all false alarms rather > than real vPMU issues. > > The failure 1 is caused by a bug in check_counters_many() which defines > a cnt[] array with length 10. On Sapphire Rapids KVM supports 8 GP > counters and 3 fixed counters, obviously the total counter number (11) > of Sapphire Rapids exceed current cnt[] length 10, it would cause a out > of memory access and lead to the "all counters" false alarm. Patch > 02~03 would fix this issue. > > The failure 2 is caused by pipeline and cache warm-up latency. > Currently "core cycles" is the first executed event. When the measured > loop() program is executed at the first time, cache hierarchy and pipeline > are needed to warm up. All these warm-up work consumes so much cycles > that it exceeds the predefined upper boundary and cause the failure. > Patch 04 fixes this issue. > > The failure 3 is caused by 0 llc misses count. It's possible and > reasonable that there is no llc misses happened for such simple loop() > asm blob especially along with larger and larger LLC size on new > processors. Patch 09 would fix this issue by introducing clflush > instruction to force LLC miss. > > Besides above bug fixes, this patch series also includes several > optimizations. > > One important optimization (patch 07~08) is to move > GLOBAL_CTRL enabling/disabling into the loop asm blob, so the precise > count for instructions and branches events can be measured and the > verification can be done against the precise count instead of the rough > count range. This improves the verification accuracy. > > Another important optimization (patch 10~11) is to leverage IBPB command > to force to trigger a branch miss, so the lower boundary of branch miss > event can be set to 1 instead of the ambiguous 0. This eliminates the > ambiguity brought from 0. > > All these changes are tested on Intel Sapphire Rapids server platform > and the pmu test passes. Since I have no AMD platforms on my hand, these > changes are not verified on AMD platforms yet. If someone can help to > verify these changes on AMD platforms, it's welcome and appreciated. > > Changes: > v2 -> v3: > fix "core cycles" failure, > introduce precise verification for instructions/branches, > leverage IBPB command to optimize branch misses verification, > drop v2 introduced slots event verification > v1 -> v2: > introduce clflush to optimize llc misses verification > introduce rdrand to optimize branch misses verification > > History: > v2: https://lore.kernel.org/lkml/20231031092921.2885109-1-dapeng1.mi@linux.intel.com/ > v1: https://lore.kernel.org/lkml/20231024075748.1675382-1-dapeng1.mi@linux.intel.com/ > > Dapeng Mi (10): > x86: pmu: Enlarge cnt[] length to 64 in check_counters_many() > x86: pmu: Add asserts to warn inconsistent fixed events and counters > x86: pmu: Switch instructions and core cycles events sequence > x86: pmu: Refine fixed_events[] names > x86: pmu: Remove blank line and redundant space > x86: pmu: Enable and disable PMCs in loop() asm blob > x86: pmu: Improve instruction and branches events verification > x86: pmu: Improve LLC misses event verification > x86: pmu: Add IBPB indirect jump asm blob > x86: pmu: Improve branch misses event verification > > Xiong Zhang (1): > x86: pmu: Remove duplicate code in pmu_init() > > lib/x86/pmu.c | 5 -- > x86/pmu.c | 201 ++++++++++++++++++++++++++++++++++++++++++-------- > 2 files changed, 171 insertions(+), 35 deletions(-) >