Message ID | 20211104074431.1091525-1-zhangzl2013@126.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/mce: correct cpu_missing reporting in mce_timed_out | expand |
On Thu, Nov 04, 2021 at 03:44:31PM +0800, Zhaolong Zhang wrote: > set cpu_missing before mce_panic() so that it prints correct msg. > > Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> > --- > arch/x86/kernel/cpu/mce/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 50a3e455cded..ccefe131ab55 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -903,13 +903,13 @@ static int mce_timed_out(u64 *t, const char *msg) > if (!mca_cfg.monarch_timeout) > goto out; > if ((s64)*t < SPINUNIT) { > + cpu_missing = 1; > if (mca_cfg.tolerant <= 1) { > if (cpumask_and(&mce_missing_cpus, cpu_online_mask, &mce_missing_cpus)) > pr_emerg("CPUs not responding to MCE broadcast (may include false positives): %*pbl\n", > cpumask_pr_args(&mce_missing_cpus)); > mce_panic(msg, NULL, NULL); > } > - cpu_missing = 1; > return 1; > } > *t -= SPINUNIT; > -- Frankly, we might just as well kill that cpu_missing thing because we already say that some CPUs are not responding. And that "Some CPUs didn't answer in synchronization" is not really telling me a whole lot. Tony, do you see any real need to keep it? Thx.
> Frankly, we might just as well kill that cpu_missing thing because we > already say that some CPUs are not responding. Yes. The more recent commit: 7bb39313cd62 ("x86/mce: Make mce_timed_out() identify holdout CPUs") tries to provide the more detailed message about *which* CPUs are missing > Tony, do you see any real need to keep it? I think cpu_missing can be dropped. -Tony
On Thu, Nov 04, 2021 at 03:47:36PM +0000, Luck, Tony wrote: > > Frankly, we might just as well kill that cpu_missing thing because we > > already say that some CPUs are not responding. > > Yes. The more recent commit: > > 7bb39313cd62 ("x86/mce: Make mce_timed_out() identify holdout CPUs") > > tries to provide the more detailed message about *which* CPUs are missing Exactly. > I think cpu_missing can be dropped. Zhaolong, you could send a patch doing that, instead. Thx.
At 2021-11-05 02:02:36, "Borislav Petkov" <bp@alien8.de> wrote: >On Thu, Nov 04, 2021 at 03:47:36PM +0000, Luck, Tony wrote: >> > Frankly, we might just as well kill that cpu_missing thing because we >> > already say that some CPUs are not responding. >> >> Yes. The more recent commit: >> >> 7bb39313cd62 ("x86/mce: Make mce_timed_out() identify holdout CPUs") >> >> tries to provide the more detailed message about *which* CPUs are missing > >Exactly. > >> I think cpu_missing can be dropped. > >Zhaolong, you could send a patch doing that, instead. Thanks for the reply. Let me see how to do it properly. Regards, Zhaolong > >Thx. > >-- >Regards/Gruss, > Boris. > >https://people.kernel.org/tglx/notes-about-netiquette
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 50a3e455cded..ccefe131ab55 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -903,13 +903,13 @@ static int mce_timed_out(u64 *t, const char *msg) if (!mca_cfg.monarch_timeout) goto out; if ((s64)*t < SPINUNIT) { + cpu_missing = 1; if (mca_cfg.tolerant <= 1) { if (cpumask_and(&mce_missing_cpus, cpu_online_mask, &mce_missing_cpus)) pr_emerg("CPUs not responding to MCE broadcast (may include false positives): %*pbl\n", cpumask_pr_args(&mce_missing_cpus)); mce_panic(msg, NULL, NULL); } - cpu_missing = 1; return 1; } *t -= SPINUNIT;
set cpu_missing before mce_panic() so that it prints correct msg. Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> --- arch/x86/kernel/cpu/mce/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)