diff mbox series

[5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests

Message ID 20230911111930.16088-6-ilpo.jarvinen@linux.intel.com (mailing list archive)
State Accepted
Commit ef43c30858754d99373a63dff33280a9969b49bc
Headers show
Series selftests/resctrl: Fixes to failing tests | expand

Commit Message

Ilpo Järvinen Sept. 11, 2023, 11:19 a.m. UTC
5% difference upper bound for success is a bit on the low side for the
MBA and MBM tests. Some platforms produce outliers that are slightly
above that, typically 6-7%.

Relaxing the MBA/MBM success bound to 8% removes most of the failures
due those frequent outliers.

Fixes: 06bd03a57f8c ("selftests/resctrl: Fix MBA/MBM results reporting format")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 tools/testing/selftests/resctrl/mba_test.c | 2 +-
 tools/testing/selftests/resctrl/mbm_test.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Reinette Chatre Sept. 12, 2023, 10:10 p.m. UTC | #1
Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> 5% difference upper bound for success is a bit on the low side for the

"a bit on the low side" is very vague.

> MBA and MBM tests. Some platforms produce outliers that are slightly
> above that, typically 6-7%.
> 
> Relaxing the MBA/MBM success bound to 8% removes most of the failures
> due those frequent outliers.

This description needs more context on what issue is being solved here.
What does the % difference represent? How was new percentage determined?

Did you investigate why there are differences between platforms? From
what I understand these tests measure memory bandwidth using perf and
resctrl and then compare the difference. Are there interesting things
about the platforms on which the difference is higher than 5%? Could
those be systems with multiple sockets (and thus multiple PMUs that need
to be setup, reset, and read)? Can the reading of the counters be improved
instead of relaxing the success criteria? A quick comparison between
get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference
is not surprising ... note how the PMU counters are started and reset
(potentially on multiple sockets) at every iteration while the resctrl
counters keep rolling and new values are just subtracted from previous.

Reinette
Ilpo Järvinen Sept. 13, 2023, 11:43 a.m. UTC | #2
On Tue, 12 Sep 2023, Reinette Chatre wrote:
> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> > 5% difference upper bound for success is a bit on the low side for the
> 
> "a bit on the low side" is very vague.

The commit that introduced that 5% bound plainly admitted it's "randomly 
chosen value". At least that wasn't vague, I guess. :-)

So what I'm trying to do here is to have "randomly chosen value" replaced 
with a value that seems to work well enough based on measurements on 
a large set of platforms.

Personally, I don't care much about this, I can just ignore the failures 
due to outliers (and also reports about failing MBA/MBM test if somebody 
ever sends one to me), but if I'd be one running automated tests it would 
be annoying to have a problem like this unaddressed.

> > MBA and MBM tests. Some platforms produce outliers that are slightly
> > above that, typically 6-7%.
> > 
> > Relaxing the MBA/MBM success bound to 8% removes most of the failures
> > due those frequent outliers.
> 
> This description needs more context on what issue is being solved here.
> What does the % difference represent? How was new percentage determined?
> 
> Did you investigate why there are differences between platforms? From
> what I understand these tests measure memory bandwidth using perf and
> resctrl and then compare the difference. Are there interesting things 
> about the platforms on which the difference is higher than 5%?

Not really I think. The number just isn't that stable to always remain 
below 5% (even if it usually does).

Only systematic thing I've come across is that if I play with the read 
pattern for defeating the hw prefetcher (you've seen a patch earlier and 
it will be among the series I'll send after this one), it has an impact 
which looks more systematic across all MBM/MBA tests. But it's not what 
I'm trying now address with this patch.

> Could
> those be systems with multiple sockets (and thus multiple PMUs that need
> to be setup, reset, and read)? Can the reading of the counters be improved
> instead of relaxing the success criteria? A quick comparison between
> get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference
> is not surprising ... note how the PMU counters are started and reset
> (potentially on multiple sockets) at every iteration while the resctrl
> counters keep rolling and new values are just subtracted from previous.

Perhaps, I can try to look into it (add to my todo list so I won't 
forget). But in the meantime, this new value is picked using a criteria 
that looks better than "randomly chosen value". If I ever manage to 
address the outliers, the bound could be lowered again.

I'll update the changelog to explain things better.
Reinette Chatre Sept. 13, 2023, 9 p.m. UTC | #3
Hi Ilpo,

On 9/13/2023 4:43 AM, Ilpo Järvinen wrote:
> On Tue, 12 Sep 2023, Reinette Chatre wrote:
>> On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
>>> 5% difference upper bound for success is a bit on the low side for the
>>
>> "a bit on the low side" is very vague.
> 
> The commit that introduced that 5% bound plainly admitted it's "randomly 
> chosen value". At least that wasn't vague, I guess. :-)
> 
> So what I'm trying to do here is to have "randomly chosen value" replaced 
> with a value that seems to work well enough based on measurements on 
> a large set of platforms.

Already a better motivation for this change. Your cover letter also hints
at this but this description does not mention that this is not just
another number pulled from the air but indeed one that is based on
significant testing on a large variety of systems. This description can
surely mention all the work you did that ended with proposing this new
number, no?

> 
> Personally, I don't care much about this, I can just ignore the failures 
> due to outliers (and also reports about failing MBA/MBM test if somebody 
> ever sends one to me), but if I'd be one running automated tests it would 
> be annoying to have a problem like this unaddressed.

In no way was I suggesting that this should not be addressed.

> 
>>> MBA and MBM tests. Some platforms produce outliers that are slightly
>>> above that, typically 6-7%.
>>>
>>> Relaxing the MBA/MBM success bound to 8% removes most of the failures
>>> due those frequent outliers.
>>
>> This description needs more context on what issue is being solved here.
>> What does the % difference represent? How was new percentage determined?
>>
>> Did you investigate why there are differences between platforms? From
>> what I understand these tests measure memory bandwidth using perf and
>> resctrl and then compare the difference. Are there interesting things 
>> about the platforms on which the difference is higher than 5%?
> 
> Not really I think. The number just isn't that stable to always remain 
> below 5% (even if it usually does).
> 
> Only systematic thing I've come across is that if I play with the read 
> pattern for defeating the hw prefetcher (you've seen a patch earlier and 
> it will be among the series I'll send after this one), it has an impact 
> which looks more systematic across all MBM/MBA tests. But it's not what 
> I'm trying now address with this patch.
> 
>> Could
>> those be systems with multiple sockets (and thus multiple PMUs that need
>> to be setup, reset, and read)? Can the reading of the counters be improved
>> instead of relaxing the success criteria? A quick comparison between
>> get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference
>> is not surprising ... note how the PMU counters are started and reset
>> (potentially on multiple sockets) at every iteration while the resctrl
>> counters keep rolling and new values are just subtracted from previous.
> 
> Perhaps, I can try to look into it (add to my todo list so I won't 
> forget). But in the meantime, this new value is picked using a criteria 
> that looks better than "randomly chosen value". If I ever manage to 
> address the outliers, the bound could be lowered again.
> 
> I'll update the changelog to explain things better.
> 
> 
ok, thank you.

Reinette
diff mbox series

Patch

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index cf8284dadcb2..d3bf4368341e 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -12,7 +12,7 @@ 
 
 #define RESULT_FILE_NAME	"result_mba"
 #define NUM_OF_RUNS		5
-#define MAX_DIFF_PERCENT	5
+#define MAX_DIFF_PERCENT	8
 #define ALLOCATION_MAX		100
 #define ALLOCATION_MIN		10
 #define ALLOCATION_STEP		10
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 1ae131a2e246..d3c0d30c676a 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -11,7 +11,7 @@ 
 #include "resctrl.h"
 
 #define RESULT_FILE_NAME	"result_mbm"
-#define MAX_DIFF_PERCENT	5
+#define MAX_DIFF_PERCENT	8
 #define NUM_OF_RUNS		5
 
 static int