diff mbox series

[v3] selftests/resctrl: Print a message if the result of MBM&CMT tests is failed on Intel cpu

Message ID 20220304103957.487660-1-tan.shaopeng@jp.fujitsu.com (mailing list archive)
State Changes Requested
Headers show
Series [v3] selftests/resctrl: Print a message if the result of MBM&CMT tests is failed on Intel cpu | expand

Commit Message

Shaopeng Tan March 4, 2022, 10:39 a.m. UTC
According to "Intel Resource Director Technology (Intel RDT) on 
2nd Generation Intel Xeon Scalable Processors Reference Manual",
When the Intel Sub-NUMA Clustering(SNC) feature is enabled,
Intel CMT and MBM counters may not be accurate.

However, there does not seem to be an architectural way to detect
if SNC is enabled.

If the result of MBM&CMT test fails on Intel CPU,
print a message to let users know a possible cause of failure.

Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
---
Hello,

In PATCH V2, I tried to detect whether SNC is enabled by NUMA info and
cpuinfo(socket_num), but it is not reliable and no future-proof.

I just print a message to let users know a possible cause of "not ok",
When CMT or MBM test runs on Intel CPU, and the result is "not ok".

This patch is based on v5.16.

 tools/testing/selftests/resctrl/resctrl_tests.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Shuah Khan March 8, 2022, 9:38 p.m. UTC | #1
On 3/4/22 3:39 AM, Shaopeng Tan wrote:
> According to "Intel Resource Director Technology (Intel RDT) on
> 2nd Generation Intel Xeon Scalable Processors Reference Manual",
> When the Intel Sub-NUMA Clustering(SNC) feature is enabled,
> Intel CMT and MBM counters may not be accurate.
> 
> However, there does not seem to be an architectural way to detect
> if SNC is enabled.
> 
> If the result of MBM&CMT test fails on Intel CPU,
> print a message to let users know a possible cause of failure.
> 
> Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> ---
> Hello,
> 
> In PATCH V2, I tried to detect whether SNC is enabled by NUMA info and
> cpuinfo(socket_num), but it is not reliable and no future-proof.
> 
> I just print a message to let users know a possible cause of "not ok",
> When CMT or MBM test runs on Intel CPU, and the result is "not ok".
> 
> This patch is based on v5.16.

Also need to be rebased on mainline latest

> 
>   tools/testing/selftests/resctrl/resctrl_tests.c | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
> index 973f09a66e1e..ec2bdce7b85f 100644
> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> @@ -14,8 +14,9 @@
>   #define BENCHMARK_ARG_SIZE	64
>   
>   bool is_amd;
> +bool is_intel;
>   

Why is this a global? I am not seeing a reason. These detect_*()s
could be moved to resctrl.h and get rid of globals.

Instead of adding intel check to detect_amd() add detect_intel()
or is_intel() and have ut return true of it detects intel.


> -void detect_amd(void)
> +void detect_vendor(void)
>   {
>   	FILE *inf = fopen("/proc/cpuinfo", "r");
>   	char *res;
> @@ -29,6 +30,7 @@ void detect_amd(void)
>   		char *s = strchr(res, ':');
>   
>   		is_amd = s && !strcmp(s, ": AuthenticAMD\n");
> +		is_intel = s && !strcmp(s, ": GenuineIntel\n");
>   		free(res);
>   	}
>   	fclose(inf);
> @@ -70,6 +72,8 @@ static void run_mbm_test(bool has_ben, char **benchmark_cmd, int span,
>   		sprintf(benchmark_cmd[5], "%s", MBA_STR);
>   	res = mbm_bw_change(span, cpu_no, bw_report, benchmark_cmd);
>   	ksft_test_result(!res, "MBM: bw change\n");
> +	if (is_intel && res)
> +		ksft_print_msg("Intel CMT and MBM counters may be inaccurate when Sub-NUMA Clustering (SNC) is enabled. Ensure SNC is disabled in the BIOS if this system supports SNC.\n");

This message is rather long. Please make it concise.

>   	mbm_test_cleanup();
>   }
>   
> @@ -106,6 +110,8 @@ static void run_cmt_test(bool has_ben, char **benchmark_cmd, int cpu_no)
>   		sprintf(benchmark_cmd[5], "%s", CMT_STR);
>   	res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
>   	ksft_test_result(!res, "CMT: test\n");
> +	if (is_intel && res)
> +		ksft_print_msg("Intel CMT and MBM counters may be inaccurate when Sub-NUMA Clustering (SNC) is enabled. Ensure SNC is disabled in the BIOS if this system supports SNC.\n");

This message is rather long. Please make it concise.

>   	cmt_test_cleanup();
>   }
>   
> @@ -207,8 +213,8 @@ int main(int argc, char **argv)
>   	if (geteuid() != 0)
>   		return ksft_exit_fail_msg("Not running as root, abort testing.\n");
>   
> -	/* Detect AMD vendor */
> -	detect_amd();
> +	/* Detect AMD/INTEL vendor */
> +	detect_vendor();
>   
>   	if (has_ben) {
>   		/* Extract benchmark command from command line. */
> 

thanks,
-- Shuah
Shaopeng Tan (Fujitsu) March 10, 2022, 8:18 a.m. UTC | #2
Hi Shuah,

> On 3/4/22 3:39 AM, Shaopeng Tan wrote:
> > According to "Intel Resource Director Technology (Intel RDT) on 2nd
> > Generation Intel Xeon Scalable Processors Reference Manual", When the
> > Intel Sub-NUMA Clustering(SNC) feature is enabled, Intel CMT and MBM
> > counters may not be accurate.
> >
> > However, there does not seem to be an architectural way to detect if
> > SNC is enabled.
> >
> > If the result of MBM&CMT test fails on Intel CPU, print a message to
> > let users know a possible cause of failure.
> >
> > Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
> > ---
> > Hello,
> >
> > In PATCH V2, I tried to detect whether SNC is enabled by NUMA info and
> > cpuinfo(socket_num), but it is not reliable and no future-proof.
> >
> > I just print a message to let users know a possible cause of "not ok",
> > When CMT or MBM test runs on Intel CPU, and the result is "not ok".
> >
> > This patch is based on v5.16.
> 
> Also need to be rebased on mainline latest

I will rebased on mainline latest in next version.

> >
> >   tools/testing/selftests/resctrl/resctrl_tests.c | 12 +++++++++---
> >   1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c
> > b/tools/testing/selftests/resctrl/resctrl_tests.c
> > index 973f09a66e1e..ec2bdce7b85f 100644
> > --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> > +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> > @@ -14,8 +14,9 @@
> >   #define BENCHMARK_ARG_SIZE	64
> >
> >   bool is_amd;
> > +bool is_intel;
> >
> 
> Why is this a global? I am not seeing a reason. These detect_*()s could be
> moved to resctrl.h and get rid of globals.
> 
> Instead of adding intel check to detect_amd() add detect_intel() or is_intel()
> and have ut return true of it detects intel.

"is_amd" and "is_intel" are called many times,
in this way, detect_vendor is called only once.

Do you mean in the following way?
"/proc/cpuinfo" will be open/closed many times,
since is_amd()/is_intel() is called in serval places.
So, I think current is better. Do you have any thoughts.
----
bool detect_vendor_str(const char *vendor_str )
{
        FILE *inf = fopen("/proc/cpuinfo", "r");
        char *res;
        bool found = false;

        if (!inf)
                return found;

        res = fgrep(inf, "vendor_id");

        if (res) {
                char *s = strchr(res, ':');

                found = s && !strcmp(s,vendor_str);
                free(res);
        }
        fclose(inf);

        return found;
}

bool is_amd()
{
        return detect_vendor_str(": AuthenticAMD\n");
}
bool is_intel()
{
        return detect_vendor_str(": GenuineIntel\n");
}
----

> > -void detect_amd(void)
> > +void detect_vendor(void)
> >   {
> >   	FILE *inf = fopen("/proc/cpuinfo", "r");
> >   	char *res;
> > @@ -29,6 +30,7 @@ void detect_amd(void)
> >   		char *s = strchr(res, ':');
> >
> >   		is_amd = s && !strcmp(s, ": AuthenticAMD\n");
> > +		is_intel = s && !strcmp(s, ": GenuineIntel\n");
> >   		free(res);
> >   	}
> >   	fclose(inf);
> > @@ -70,6 +72,8 @@ static void run_mbm_test(bool has_ben, char
> **benchmark_cmd, int span,
> >   		sprintf(benchmark_cmd[5], "%s", MBA_STR);
> >   	res = mbm_bw_change(span, cpu_no, bw_report, benchmark_cmd);
> >   	ksft_test_result(!res, "MBM: bw change\n");
> > +	if (is_intel && res)
> > +		ksft_print_msg("Intel CMT and MBM counters may be
> inaccurate when
> > +Sub-NUMA Clustering (SNC) is enabled. Ensure SNC is disabled in the
> > +BIOS if this system supports SNC.\n");
> 
> This message is rather long. Please make it concise.

I will use the following message in next version.
"Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration."

> >   	mbm_test_cleanup();
> >   }
> >
> > @@ -106,6 +110,8 @@ static void run_cmt_test(bool has_ben, char
> **benchmark_cmd, int cpu_no)
> >   		sprintf(benchmark_cmd[5], "%s", CMT_STR);
> >   	res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
> >   	ksft_test_result(!res, "CMT: test\n");
> > +	if (is_intel && res)
> > +		ksft_print_msg("Intel CMT and MBM counters may be
> inaccurate when
> > +Sub-NUMA Clustering (SNC) is enabled. Ensure SNC is disabled in the
> > +BIOS if this system supports SNC.\n");
> 
> This message is rather long. Please make it concise.

"Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration."

> >   	cmt_test_cleanup();
> >   }
> >
> > @@ -207,8 +213,8 @@ int main(int argc, char **argv)
> >   	if (geteuid() != 0)
> >   		return ksft_exit_fail_msg("Not running as root, abort
> > testing.\n");
> >
> > -	/* Detect AMD vendor */
> > -	detect_amd();
> > +	/* Detect AMD/INTEL vendor */
> > +	detect_vendor();
> >
> >   	if (has_ben) {
> >   		/* Extract benchmark command from command line. */
> >

Best regards,
Tan Shaopeng
Shuah Khan March 10, 2022, 8:32 p.m. UTC | #3
On 3/10/22 1:18 AM, tan.shaopeng@fujitsu.com wrote:
> Hi Shuah,
> 
>> On 3/4/22 3:39 AM, Shaopeng Tan wrote:

>> Also need to be rebased on mainline latest
> 
> I will rebased on mainline latest in next version.
> 

>> Why is this a global? I am not seeing a reason. These detect_*()s could be
>> moved to resctrl.h and get rid of globals.
>>
>> Instead of adding intel check to detect_amd() add detect_intel() or is_intel()
>> and have ut return true of it detects intel.
> 
> "is_amd" and "is_intel" are called many times,
> in this way, detect_vendor is called only once.
> 
You can do the lookup once and save the value to return for
subsequent calls instead of using global values is_amd
and is_intel.

thanks,
-- Shuah
diff mbox series

Patch

diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
index 973f09a66e1e..ec2bdce7b85f 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -14,8 +14,9 @@ 
 #define BENCHMARK_ARG_SIZE	64
 
 bool is_amd;
+bool is_intel;
 
-void detect_amd(void)
+void detect_vendor(void)
 {
 	FILE *inf = fopen("/proc/cpuinfo", "r");
 	char *res;
@@ -29,6 +30,7 @@  void detect_amd(void)
 		char *s = strchr(res, ':');
 
 		is_amd = s && !strcmp(s, ": AuthenticAMD\n");
+		is_intel = s && !strcmp(s, ": GenuineIntel\n");
 		free(res);
 	}
 	fclose(inf);
@@ -70,6 +72,8 @@  static void run_mbm_test(bool has_ben, char **benchmark_cmd, int span,
 		sprintf(benchmark_cmd[5], "%s", MBA_STR);
 	res = mbm_bw_change(span, cpu_no, bw_report, benchmark_cmd);
 	ksft_test_result(!res, "MBM: bw change\n");
+	if (is_intel && res)
+		ksft_print_msg("Intel CMT and MBM counters may be inaccurate when Sub-NUMA Clustering (SNC) is enabled. Ensure SNC is disabled in the BIOS if this system supports SNC.\n");
 	mbm_test_cleanup();
 }
 
@@ -106,6 +110,8 @@  static void run_cmt_test(bool has_ben, char **benchmark_cmd, int cpu_no)
 		sprintf(benchmark_cmd[5], "%s", CMT_STR);
 	res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
 	ksft_test_result(!res, "CMT: test\n");
+	if (is_intel && res)
+		ksft_print_msg("Intel CMT and MBM counters may be inaccurate when Sub-NUMA Clustering (SNC) is enabled. Ensure SNC is disabled in the BIOS if this system supports SNC.\n");
 	cmt_test_cleanup();
 }
 
@@ -207,8 +213,8 @@  int main(int argc, char **argv)
 	if (geteuid() != 0)
 		return ksft_exit_fail_msg("Not running as root, abort testing.\n");
 
-	/* Detect AMD vendor */
-	detect_amd();
+	/* Detect AMD/INTEL vendor */
+	detect_vendor();
 
 	if (has_ben) {
 		/* Extract benchmark command from command line. */