diff mbox series

[1/4] x86/mce: do not overwrite no_way_out if mce_end() fails

Message ID 20201118151552.1412-2-gabriele.paoloni@intel.com (mailing list archive)
State New, archived
Headers show
Series x86/MCE: some minor fixes | expand

Commit Message

Paoloni, Gabriele Nov. 18, 2020, 3:15 p.m. UTC
Currently if mce_end() fails no_way_out is set equal to worst.
worst is the worst severirty that was found in the MCA banks
associated to the current CPU; however at this point no_way_out
could be already set by mca_start() by looking at all severities
of all CPUs that entered the MCE handler.
if mce_end() fails we first check if no_way_out is already set and
if so we stick to it, otherwise we use the local worst value

Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Borislav Petkov Nov. 20, 2020, 5:07 p.m. UTC | #1
On Wed, Nov 18, 2020 at 03:15:49PM +0000, Gabriele Paoloni wrote:
> Currently if mce_end() fails no_way_out is set equal to worst.
> worst is the worst severirty that was found in the MCA banks
		     ^^^^^^^^^

Please introduce a spellchecker into your patch creation workflow.

> associated to the current CPU; however at this point no_way_out
	     ^
	     with


> could be already set by mca_start() by looking at all severities

I think you mean "could have been already set" here

> of all CPUs that entered the MCE handler.
> if mce_end() fails we first check if no_way_out is already set and

Please use passive voice in your commit message: no "we" or "I", etc.

Also, pls start new sentences with a capital letter and end them with a
fullstop.

> if so we stick to it, otherwise we use the local worst value

So basically you're trying to say here that no_way_out might have been
already set and other CPUs could overwrite it and that should not
happen.

Is that what you mean?

> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 4102b866e7c0..b990892c6766 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1385,7 +1385,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  	 */
>  	if (!lmce) {
>  		if (mce_end(order) < 0)
> -			no_way_out = worst >= MCE_PANIC_SEVERITY;
> +			no_way_out = no_way_out ? no_way_out : worst >= MCE_PANIC_SEVERITY;

I had to stare at this a bit to figure out what you're doing. So how
about simplifying this:

			if (!no_way_out)
				no_way_out = worst >= MCE_PANIC_SEVERITY;

?

Thx.
Paoloni, Gabriele Nov. 20, 2020, 5:31 p.m. UTC | #2
Hi Boris

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Friday, November 20, 2020 6:08 PM
> To: Paoloni, Gabriele <gabriele.paoloni@intel.com>
> Cc: Luck, Tony <tony.luck@intel.com>; tglx@linutronix.de;
> mingo@redhat.com; x86@kernel.org; hpa@zytor.com; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> safety@lists.elisa.tech
> Subject: Re: [PATCH 1/4] x86/mce: do not overwrite no_way_out if
> mce_end() fails
> 
> On Wed, Nov 18, 2020 at 03:15:49PM +0000, Gabriele Paoloni wrote:
> > Currently if mce_end() fails no_way_out is set equal to worst.
> > worst is the worst severirty that was found in the MCA banks
> 		     ^^^^^^^^^
> 
> Please introduce a spellchecker into your patch creation workflow.
> 
> > associated to the current CPU; however at this point no_way_out
> 	     ^
> 	     with
> 
> 
> > could be already set by mca_start() by looking at all severities
> 
> I think you mean "could have been already set" here
> 
> > of all CPUs that entered the MCE handler.
> > if mce_end() fails we first check if no_way_out is already set and
> 
> Please use passive voice in your commit message: no "we" or "I", etc.
> 
> Also, pls start new sentences with a capital letter and end them with a
> fullstop.

Sorry about the grammar errors above, I'll pay more attention in future

> 
> > if so we stick to it, otherwise we use the local worst value
> 
> So basically you're trying to say here that no_way_out might have been
> already set and other CPUs could overwrite it and that should not
> happen.
> 
> Is that what you mean?

I mean that on this CPU thread at this point mce_start() already cached
global_nwo and hence could accumulate fatal severities of other CPUs.

Now here if mce_end() fails we only consider the local 'worst' severity
and we overwrite those already cached.

> 
> > Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/mce/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/core.c
> b/arch/x86/kernel/cpu/mce/core.c
> > index 4102b866e7c0..b990892c6766 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1385,7 +1385,7 @@ noinstr void do_machine_check(struct pt_regs
> *regs)
> >  	 */
> >  	if (!lmce) {
> >  		if (mce_end(order) < 0)
> > -			no_way_out = worst >= MCE_PANIC_SEVERITY;
> > +			no_way_out = no_way_out ? no_way_out : worst >=
> MCE_PANIC_SEVERITY;
> 
> I had to stare at this a bit to figure out what you're doing. So how
> about simplifying this:
> 
> 			if (!no_way_out)
> 				no_way_out = worst >=

Yes that works as well improving readability.

If ok I will fix the grammar and rewrite this code in v2.

Many Thanks
Gab

> MCE_PANIC_SEVERITY;
> 
> ?
> 
> Thx.
> 
> --
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#180): https://lists.elisa.tech/g/linux-safety/message/180
Mute This Topic: https://lists.elisa.tech/mt/78342501/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-
Borislav Petkov Nov. 20, 2020, 5:32 p.m. UTC | #3
On Wed, Nov 18, 2020 at 03:15:49PM +0000, Gabriele Paoloni wrote:
> Currently if mce_end() fails no_way_out is set equal to worst.
> worst is the worst severirty that was found in the MCA banks
> associated to the current CPU; however at this point no_way_out
> could be already set by mca_start() by looking at all severities
> of all CPUs that entered the MCE handler.
> if mce_end() fails we first check if no_way_out is already set and
> if so we stick to it, otherwise we use the local worst value
> 
> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>

Also, this very likely wants Cc: stable, I'd say, considering the
severity.

Thx.
Borislav Petkov Nov. 20, 2020, 5:33 p.m. UTC | #4
On Fri, Nov 20, 2020 at 05:31:32PM +0000, Paoloni, Gabriele wrote:
> I mean that on this CPU thread at this point mce_start() already cached
> global_nwo and hence could accumulate fatal severities of other CPUs.
> 
> Now here if mce_end() fails we only consider the local 'worst' severity
> and we overwrite those already cached.

Yap, we're on the same page. :)

> If ok I will fix the grammar and rewrite this code in v2.

Sure, lemme go through the rest first.

Thx.
Paoloni, Gabriele Nov. 20, 2020, 5:35 p.m. UTC | #5
[...]

> Also, this very likely wants Cc: stable, I'd say, considering the
> severity.

Sure, will add stable in v2.

Thanks
Gab

> 
> Thx.
> 
> --
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4 
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale  04236760155
Repertorio Economico Amministrativo n. 997124 
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di 
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#181): https://lists.elisa.tech/g/linux-safety/message/181
Mute This Topic: https://lists.elisa.tech/mt/78342501/5278000
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [linux-safety@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-
Borislav Petkov Nov. 23, 2020, 2:35 p.m. UTC | #6
On Fri, Nov 20, 2020 at 06:33:42PM +0100, Borislav Petkov wrote:
> Sure, lemme go through the rest first.

Done, thx.
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4102b866e7c0..b990892c6766 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1385,7 +1385,7 @@  noinstr void do_machine_check(struct pt_regs *regs)
 	 */
 	if (!lmce) {
 		if (mce_end(order) < 0)
-			no_way_out = worst >= MCE_PANIC_SEVERITY;
+			no_way_out = no_way_out ? no_way_out : worst >= MCE_PANIC_SEVERITY;
 	} else {
 		/*
 		 * If there was a fatal machine check we should have