mbox series

[0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests

Message ID cover.1724970211.git.reinette.chatre@intel.com (mailing list archive)
Headers show
Series selftests/resctrl: Support diverse platforms with MBM and MBA tests | expand

Message

Reinette Chatre Aug. 29, 2024, 10:52 p.m. UTC
The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory
Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald
Rapids systems. The test failures result from the following two
properties of these systems:
1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl
   MBA and MBM selftests measure memory traffic for which a hardcoded
   250MB buffer has been sufficient so far. On platforms with L3 cache
   larger than the buffer, the buffer fits in the L3 cache and thus
   no/very little memory traffic is generated during the "memory
   bandwidth" tests.
2) Some platform features, for example RAS features or memory
   performance features that generate memory traffic may drive accesses
   that are counted differently by performance counters and MBM
   respectively, for instance generating "overhead" traffic which is not
   counted against any specific RMID. Until now these counting
   differences have always been "in the noise". On Emerald Rapids
   systems the maximum MBA throttling (10% memory bandwidth)
   throttles memory bandwidth to where memory accesses by these other
   platform features push the memory bandwidth difference between
   memory controller performance counters and resctrl (MBM) beyond the
   tests' hardcoded tolerance.

Make the tests more robust against platform variations:
1) Let the buffer used by memory bandwidth tests be guided by the size
   of the L3 cache.
2) Larger buffers require longer initialization time before the buffer can
   be used to measurement. Rework the tests to ensure that buffer
   initialization is complete before measurements start.
3) Do not compare performance counters and MBM measurements at low
   bandwidth. The value of "low" is hardcoded to 750MiB based on
   measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake
   systems. This limit is not applicable to AMD systems since it
   only applies to the MBA and MBM tests that are isolated to Intel.

[1]
https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-platinum-8592-processor-320m-cache-1-9-ghz.html

Reinette Chatre (6):
  selftests/resctrl: Fix sparse warnings
  selftests/resctrl: Ensure measurements skip initialization of default
    benchmark
  selftests/resctrl: Simplify benchmark parameter passing
  selftests/resctrl: Use cache size to determine "fill_buf" buffer size
  selftests/resctrl: Do not compare performance counters and resctrl at
    low bandwidth
  selftests/resctrl: Keep results from first test run

 tools/testing/selftests/resctrl/cmt_test.c    |  33 +--
 tools/testing/selftests/resctrl/fill_buf.c    |  19 +-
 tools/testing/selftests/resctrl/mba_test.c    |  26 +-
 tools/testing/selftests/resctrl/mbm_test.c    |  25 +-
 tools/testing/selftests/resctrl/resctrl.h     |  57 +++--
 .../testing/selftests/resctrl/resctrl_tests.c |  15 +-
 tools/testing/selftests/resctrl/resctrl_val.c | 223 +++++-------------
 7 files changed, 152 insertions(+), 246 deletions(-)