diff mbox series

x86/mce: correct cpu_missing reporting in mce_timed_out

Message ID 20211104074431.1091525-1-zhangzl2013@126.com (mailing list archive)
State New, archived
Headers show
Series x86/mce: correct cpu_missing reporting in mce_timed_out | expand

Commit Message

Zhaolong Zhang Nov. 4, 2021, 7:44 a.m. UTC
set cpu_missing before mce_panic() so that it prints correct msg.

Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
---
 arch/x86/kernel/cpu/mce/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Borislav Petkov Nov. 4, 2021, 9:13 a.m. UTC | #1
On Thu, Nov 04, 2021 at 03:44:31PM +0800, Zhaolong Zhang wrote:
> set cpu_missing before mce_panic() so that it prints correct msg.
> 
> Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 50a3e455cded..ccefe131ab55 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -903,13 +903,13 @@ static int mce_timed_out(u64 *t, const char *msg)
>  	if (!mca_cfg.monarch_timeout)
>  		goto out;
>  	if ((s64)*t < SPINUNIT) {
> +		cpu_missing = 1;
>  		if (mca_cfg.tolerant <= 1) {
>  			if (cpumask_and(&mce_missing_cpus, cpu_online_mask, &mce_missing_cpus))
>  				pr_emerg("CPUs not responding to MCE broadcast (may include false positives): %*pbl\n",
>  					 cpumask_pr_args(&mce_missing_cpus));
>  			mce_panic(msg, NULL, NULL);
>  		}
> -		cpu_missing = 1;
>  		return 1;
>  	}
>  	*t -= SPINUNIT;
> -- 

Frankly, we might just as well kill that cpu_missing thing because we
already say that some CPUs are not responding.

And that "Some CPUs didn't answer in synchronization" is not really
telling me a whole lot.

Tony, do you see any real need to keep it?

Thx.
Tony Luck Nov. 4, 2021, 3:47 p.m. UTC | #2
> Frankly, we might just as well kill that cpu_missing thing because we
> already say that some CPUs are not responding.

Yes. The more recent commit:

7bb39313cd62 ("x86/mce: Make mce_timed_out() identify holdout CPUs")

tries to provide the more detailed message about *which* CPUs are missing

> Tony, do you see any real need to keep it?

I think cpu_missing can be dropped.

-Tony
Borislav Petkov Nov. 4, 2021, 6:02 p.m. UTC | #3
On Thu, Nov 04, 2021 at 03:47:36PM +0000, Luck, Tony wrote:
> > Frankly, we might just as well kill that cpu_missing thing because we
> > already say that some CPUs are not responding.
> 
> Yes. The more recent commit:
> 
> 7bb39313cd62 ("x86/mce: Make mce_timed_out() identify holdout CPUs")
> 
> tries to provide the more detailed message about *which* CPUs are missing

Exactly.

> I think cpu_missing can be dropped.

Zhaolong, you could send a patch doing that, instead.

Thx.
Zhaolong Zhang Nov. 5, 2021, 2:19 a.m. UTC | #4
At 2021-11-05 02:02:36, "Borislav Petkov" <bp@alien8.de> wrote:
>On Thu, Nov 04, 2021 at 03:47:36PM +0000, Luck, Tony wrote:
>> > Frankly, we might just as well kill that cpu_missing thing because we
>> > already say that some CPUs are not responding.
>> 
>> Yes. The more recent commit:
>> 
>> 7bb39313cd62 ("x86/mce: Make mce_timed_out() identify holdout CPUs")
>> 
>> tries to provide the more detailed message about *which* CPUs are missing
>
>Exactly.
>
>> I think cpu_missing can be dropped.
>
>Zhaolong, you could send a patch doing that, instead.

Thanks for the reply. Let me see how to do it properly.

Regards,
Zhaolong

>
>Thx.
>
>-- 
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 50a3e455cded..ccefe131ab55 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -903,13 +903,13 @@  static int mce_timed_out(u64 *t, const char *msg)
 	if (!mca_cfg.monarch_timeout)
 		goto out;
 	if ((s64)*t < SPINUNIT) {
+		cpu_missing = 1;
 		if (mca_cfg.tolerant <= 1) {
 			if (cpumask_and(&mce_missing_cpus, cpu_online_mask, &mce_missing_cpus))
 				pr_emerg("CPUs not responding to MCE broadcast (may include false positives): %*pbl\n",
 					 cpumask_pr_args(&mce_missing_cpus));
 			mce_panic(msg, NULL, NULL);
 		}
-		cpu_missing = 1;
 		return 1;
 	}
 	*t -= SPINUNIT;