diff mbox

[v2,2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

Message ID 20130619175728.2852.73156.stgit@localhost.localdomain (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Naveen N. Rao June 19, 2013, 5:57 p.m. UTC
Add a boot option to disable firmware first mode for corrected errors.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 Documentation/x86/x86_64/boot-options.txt |    5 +++++
 arch/x86/include/asm/acpi.h               |    2 ++
 arch/x86/kernel/acpi/boot.c               |    5 +++++
 drivers/acpi/apei/hest.c                  |    3 ++-
 4 files changed, 14 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Borislav Petkov June 19, 2013, 6:04 p.m. UTC | #1
On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote:
> Add a boot option to disable firmware first mode for corrected errors.
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  Documentation/x86/x86_64/boot-options.txt |    5 +++++
>  arch/x86/include/asm/acpi.h               |    2 ++
>  arch/x86/kernel/acpi/boot.c               |    5 +++++
>  drivers/acpi/apei/hest.c                  |    3 ++-
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
> index e9e8ddb..1228b22 100644
> --- a/Documentation/x86/x86_64/boot-options.txt
> +++ b/Documentation/x86/x86_64/boot-options.txt
> @@ -176,6 +176,11 @@ ACPI
>  
>    acpi=noirq	Don't route interrupts
>  
> +  acpi=nocmcff	Disable firmware first mode for corrected errors. This
> +		disables parsing the HEST CMC error source to check if
> +		firmware has set the FF flag. This may result in
> +		duplicate corrected error reports.

Interesting, why? Why would we even need such an option? My impression
is, if ACPI tells us FF, MCE code doesn't poll those banks anymore. So
where do the duplicated reports come from?

Thanks.
Naveen N. Rao June 19, 2013, 6:17 p.m. UTC | #2
On 06/19/2013 11:34 PM, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote:
>> Add a boot option to disable firmware first mode for corrected errors.
>>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> ---
>>   Documentation/x86/x86_64/boot-options.txt |    5 +++++
>>   arch/x86/include/asm/acpi.h               |    2 ++
>>   arch/x86/kernel/acpi/boot.c               |    5 +++++
>>   drivers/acpi/apei/hest.c                  |    3 ++-
>>   4 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
>> index e9e8ddb..1228b22 100644
>> --- a/Documentation/x86/x86_64/boot-options.txt
>> +++ b/Documentation/x86/x86_64/boot-options.txt
>> @@ -176,6 +176,11 @@ ACPI
>>
>>     acpi=noirq	Don't route interrupts
>>
>> +  acpi=nocmcff	Disable firmware first mode for corrected errors. This
>> +		disables parsing the HEST CMC error source to check if
>> +		firmware has set the FF flag. This may result in
>> +		duplicate corrected error reports.
>
> Interesting, why? Why would we even need such an option? My impression
> is, if ACPI tells us FF, MCE code doesn't poll those banks anymore. So
> where do the duplicated reports come from?

This option is a way to revert to the existing behavior where we 
continue to enable CMCI/poll. If we enable this option, then we will 
receive error events from CMCI/polling as well as from the firmware 
through GHES.


Thanks,
Naveen

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Luck June 19, 2013, 6:19 p.m. UTC | #3
> Interesting, why? Why would we even need such an option? My impression

> is, if ACPI tells us FF, MCE code doesn't poll those banks anymore. So

> where do the duplicated reports come from?


The option is only disabling the Linux side of firmware first ... the BIOS
will still be doing it and generating records to feed to the OS using APEI.

So Linux may see the error in a bank and report it, and BIOS may report
the same error.  Though I'd expect that to be rare as whoever saw it first
would most likely clear the bank before the other could see it.

I asked for the option because I'm nervous about just skipping some banks
on the say-so of the BIOS ... what if the BIOS did something wrong. This
option gives us a way to return to the way things were before this patch.

These parts are now looking good ... but we still need to tackle what Linux
does when it does get the CPER record.  I suspect we need to preserve
the existing "fake an mcelog entry with just the address" on old platforms,
but need to do something smarter on new ones.

-Tony
Borislav Petkov June 19, 2013, 6:36 p.m. UTC | #4
On Wed, Jun 19, 2013 at 06:19:25PM +0000, Luck, Tony wrote:
> > Interesting, why? Why would we even need such an option? My impression
> > is, if ACPI tells us FF, MCE code doesn't poll those banks anymore. So
> > where do the duplicated reports come from?
> 
> The option is only disabling the Linux side of firmware first ... the BIOS
> will still be doing it and generating records to feed to the OS using APEI.
> 
> So Linux may see the error in a bank and report it, and BIOS may report
> the same error.  Though I'd expect that to be rare as whoever saw it first
> would most likely clear the bank before the other could see it.
> 
> I asked for the option because I'm nervous about just skipping some banks
> on the say-so of the BIOS ... what if the BIOS did something wrong. This
> option gives us a way to return to the way things were before this patch.

Yeah, the code I saw only disables the banks in the HEST:

	mce_disable_ce_bank(mc_bank->bank_number)

and leaving the rest in poll mode. But I agree, we need this as a
fallback if BIOS is doing other crack smoking exercises and thus we want
to ignore FF completely.

> These parts are now looking good ... but we still need to tackle what
> Linux does when it does get the CPER record. I suspect we need to
> preserve the existing "fake an mcelog entry with just the address" on
> old platforms, but need to do something smarter on new ones.

Why, fill out struct mce and do mce_log(mce) does not suffice?

I'll take a look at the rest of the stuff tomorrow, on a clear head.

Thanks.
Tony Luck June 19, 2013, 7:05 p.m. UTC | #5
> Why, fill out struct mce and do mce_log(mce) does not suffice?


There is (or should be) a lot more interesting stuff in the CPER than just the address. Stuff
that we don't have fields for in the existing mcelog structure.  We also need to treat filtered
records from modern APEI implementations a bit differently from the old stuff.  The original
user of this code was Westmere-EX, which used it as a workaround for a missing address in
MCi_ADDR for corrected errors.  So in that scenario we had every error being reported and
mcelog(8) deamon doing the threshold analysis to decide when to take action.

In this new modern world - Naveen wants to have the BIOS decide the threshold, so we'd
like Linux to take some action as soon as it sees just one CPER.

-Tony
Borislav Petkov June 19, 2013, 8:14 p.m. UTC | #6
On Wed, Jun 19, 2013 at 07:05:16PM +0000, Luck, Tony wrote:
> > Why, fill out struct mce and do mce_log(mce) does not suffice?
> 
> There is (or should be)

Ha!

> a lot more interesting stuff in the CPER than just the address. Stuff
> that we don't have fields for in the existing mcelog structure. We
> also need to treat filtered records from modern APEI implementations a
> bit differently from the old stuff.

Great, the CPER record is described in the UEFI spec. Those BIOS people
are all like a mafia.

Ok, seriously: so the situation should still be fine, FF reported errors
get the CPER format while the rest, the "old" MCE format.

cper.c is doing printk so I'm guessing it would need to get its own
tracepoint and carry that to userspace.

Concerning the RAS daemon, Robert and I are making good progress so once
we have the persistent events in perf, we can read that tracepoint in
userspace and do whatever we want with the error info.

> The original user of this code was Westmere-EX, which used it as a
> workaround for a missing address in MCi_ADDR for corrected errors.
> So in that scenario we had every error being reported and mcelog(8)
> deamon doing the threshold analysis to decide when to take action.
>
> In this new modern world - Naveen wants to have the BIOS decide the
> threshold, so we'd like Linux to take some action as soon as it sees
> just one CPER.

Why would Linux have to intervene if it is doing FF - wasn't the deal
behind Firmware First for the firmware to get the error first and handle
accordingly?
Tony Luck June 19, 2013, 8:33 p.m. UTC | #7
>> There is (or should be)

>

> Ha!


Oh ye of little faith - I'm sure the BIOS will get this right this time :-)


> Ok, seriously: so the situation should still be fine, FF reported errors

> get the CPER format while the rest, the "old" MCE format.

>

> cper.c is doing printk so I'm guessing it would need to get its own

> tracepoint and carry that to userspace.


Yes - a tracepoint is the right answer here for all the new stuff.

> Concerning the RAS daemon, Robert and I are making good progress so once

> we have the persistent events in perf, we can read that tracepoint in

> userspace and do whatever we want with the error info.


Mauro has a rasdaemon in progress
       git://git.fedorahosted.org/rasdaemon.git
just picks up perf/events and logs to a sqlite database.

>> In this new modern world - Naveen wants to have the BIOS decide the

>> threshold, so we'd like Linux to take some action as soon as it sees

>> just one CPER.

>

> Why would Linux have to intervene if it is doing FF - wasn't the deal

> behind Firmware First for the firmware to get the error first and handle

> accordingly?


Because Linux can do runtime things that the BIOS can't - like offline a 4K page.
Idea here is that BIOS does whatever the OEM thinks is the right level of
threshholding - not bothering the OS with petty details of random corrected
erorrs that mean nothing.  But if there is some repeated error (like a stuck bit)
then the BIOS can provide a CPER to the OS telling it that it would be a good idea
to stop using that page.

And this is where the semantics of a CPER change between the original WSM-EX
implementation ... where Linux expects to see all the errors and do its own
thresholding only taking a page offline if it sees a lot of CPER refer to the same
page; and now - where the BIOS does the counting and tells Linux just once to
take the page offline.

-Tony
Borislav Petkov June 19, 2013, 9:07 p.m. UTC | #8
On Wed, Jun 19, 2013 at 08:33:49PM +0000, Luck, Tony wrote:
> >> There is (or should be)
> >
> > Ha!
> 
> Oh ye of little faith - I'm sure the BIOS will get this right this time :-)
> 
> 
> > Ok, seriously: so the situation should still be fine, FF reported errors
> > get the CPER format while the rest, the "old" MCE format.
> >
> > cper.c is doing printk so I'm guessing it would need to get its own
> > tracepoint and carry that to userspace.
> 
> Yes - a tracepoint is the right answer here for all the new stuff.
> 
> > Concerning the RAS daemon, Robert and I are making good progress so once
> > we have the persistent events in perf, we can read that tracepoint in
> > userspace and do whatever we want with the error info.
> 
> Mauro has a rasdaemon in progress
>        git://git.fedorahosted.org/rasdaemon.git
> just picks up perf/events and logs to a sqlite database.

Actually it uses ftrace's facilities but it is a tracepoint in the end.

And I asked him nicely not to call it rasdaemon because I already have a
RAS daemon but hey, whatever. The more confusion, the better.

> Because Linux can do runtime things that the BIOS can't - like offline
> a 4K page. Idea here is that BIOS does whatever the OEM thinks is the
> right level of threshholding - not bothering the OS with petty details
> of random corrected erorrs that mean nothing. But if there is some
> repeated error (like a stuck bit) then the BIOS can provide a CPER
> to the OS telling it that it would be a good idea to stop using that
> page.

Ok, where is that semantics? What in a CPER record does say "this error
should tell you that you need to offline the containing page and I'm
telling you this exactly only once"? Error Severity 0, i.e. Recoverable?

> And this is where the semantics of a CPER change between the original
> WSM-EX implementation ... where Linux expects to see all the errors
> and do its own thresholding only taking a page offline if it sees a
> lot of CPER refer to the same page; and now - where the BIOS does the
> counting and tells Linux just once to take the page offline.

Ok, we're talking about the S in RAS now. Do we have error recovery
strategies specified anywhere? Are they per-platform or generic? Is this
CPER strategy above, for example, only valid for some platforms or for
all APEI-using hardware?

Questions over questions...
Tony Luck June 19, 2013, 9:28 p.m. UTC | #9
> Ok, where is that semantics? What in a CPER record does say "this error

> should tell you that you need to offline the containing page and I'm

> telling you this exactly only once"? Error Severity 0, i.e. Recoverable?


Naveen - this one is for you (or for your BIOS team).  Can you get us a sample
CPER that you plan to provide when the BIOS decides that its threshold has
been exceeded?  How will it be different from what old WSM-EX platforms
were sending to us?  Hopefully the answer is encoded in the CPER record
and not in some code we have to put in Linux to say "if (IBMplatform) do_thing_1(); else ... "

> Ok, we're talking about the S in RAS now. Do we have error recovery

> strategies specified anywhere? Are they per-platform or generic? Is this

> CPER strategy above, for example, only valid for some platforms or for

> all APEI-using hardware?


mcelog(8) daemon has been doing this for years ... but it used the "predictive
failure analysis" buzzwords that were popular way back then (today the
marketing people seem to prefer "self healing" ). Whatever the name, the
concept is the same ... take some set of corrected event reports and infer
from them that something worse may happen soon, and use that information
to try to avoid the (possibly) impending crash.

> Questions over questions...


Questions are good - they help fill out gaps

-Tony
Borislav Petkov June 19, 2013, 9:41 p.m. UTC | #10
On Wed, Jun 19, 2013 at 09:28:50PM +0000, Luck, Tony wrote:
> > Ok, where is that semantics? What in a CPER record does say "this error
> > should tell you that you need to offline the containing page and I'm
> > telling you this exactly only once"? Error Severity 0, i.e. Recoverable?
> 
> Naveen - this one is for you (or for your BIOS team).  Can you get us a sample
> CPER that you plan to provide when the BIOS decides that its threshold has
> been exceeded?  How will it be different from what old WSM-EX platforms
> were sending to us?  Hopefully the answer is encoded in the CPER record
> and not in some code we have to put in Linux to say "if (IBMplatform) do_thing_1(); else ... "

If we're going to be vendor-specific (which I'm pretty sure we will),
we'd gonna have to export knobs to userspace which each vendor can tweak
for themselves. Otherwise we'd get the DMI quirks ugliness all over
again and we better nip it in the bud, while we can.

> > Ok, we're talking about the S in RAS now. Do we have error recovery
> > strategies specified anywhere? Are they per-platform or generic? Is this
> > CPER strategy above, for example, only valid for some platforms or for
> > all APEI-using hardware?
> 
> mcelog(8) daemon has been doing this for years ... but it used the "predictive
> failure analysis" buzzwords that were popular way back then (today the
> marketing people seem to prefer "self healing" ). Whatever the name, the
> concept is the same ... take some set of corrected event reports and infer
> from them that something worse may happen soon, and use that information
> to try to avoid the (possibly) impending crash.

Ok, so some sort of userspace is enforcing policy based on collected
data/heuristics.

The above question about what to do *without* going to userspace and
back is maybe more interesting and we'd need a clean design there...
we'll see.

> > Questions over questions...
> 
> Questions are good - they help fill out gaps

I know - that's why I'm trying to poke holes early...
Tony Luck June 19, 2013, 10:08 p.m. UTC | #11
> The above question about what to do *without* going to userspace and

> back is maybe more interesting and we'd need a clean design there...

> we'll see.


Yes - this case (where the BIOS did all the threshold math and made the decision)
should be one where Linux kernel could just implement the action directly.
Perhaps controlled by a knob to say whether we really trust the BIOS that much.

But we will also have cases where a smart user agent can correlate data
from multiple sources to identify the real root cause (e.g. some temperature
anomalies around the same time as some memory errors that occur at 10am
on the third Tuesday each month -> cause is air conditioner maintenance guy
that shuts down the a/c for 10 minutes to change the filter).

I'll leave writing an agent that smart as an exercise for the concerned data
center manager :-)

-Tony
Borislav Petkov June 20, 2013, 5:35 a.m. UTC | #12
On Wed, Jun 19, 2013 at 10:08:53PM +0000, Luck, Tony wrote:
> > The above question about what to do *without* going to userspace and
> > back is maybe more interesting and we'd need a clean design there...
> > we'll see.
> 
> Yes - this case (where the BIOS did all the threshold math and made the decision)
> should be one where Linux kernel could just implement the action directly.
> Perhaps controlled by a knob to say whether we really trust the BIOS that much.
> 
> But we will also have cases where a smart user agent can correlate data
> from multiple sources to identify the real root cause (e.g. some temperature
> anomalies around the same time as some memory errors that occur at 10am
> on the third Tuesday each month -> cause is air conditioner maintenance guy
> that shuts down the a/c for 10 minutes to change the filter).

Surely we cannot put that in the kernel. For that we'd need userspace to
decide and only turn knobs in the kernel.

> I'll leave writing an agent that smart as an exercise for the concerned data
> center manager :-)

Me too, as long as it stays in userspace and it only turns
knobs/interfaces in the kernel.
Borislav Petkov June 20, 2013, 7:48 a.m. UTC | #13
On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote:
> Add a boot option to disable firmware first mode for corrected errors.
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  Documentation/x86/x86_64/boot-options.txt |    5 +++++
>  arch/x86/include/asm/acpi.h               |    2 ++
>  arch/x86/kernel/acpi/boot.c               |    5 +++++
>  drivers/acpi/apei/hest.c                  |    3 ++-
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
> index e9e8ddb..1228b22 100644
> --- a/Documentation/x86/x86_64/boot-options.txt
> +++ b/Documentation/x86/x86_64/boot-options.txt
> @@ -176,6 +176,11 @@ ACPI
>  
>    acpi=noirq	Don't route interrupts
>  
> +  acpi=nocmcff	Disable firmware first mode for corrected errors. This
> +		disables parsing the HEST CMC error source to check if
> +		firmware has set the FF flag. This may result in
> +		duplicate corrected error reports.

Ok, this option should be called something more vendor-agnostic like

acpi=noff

for example.

CMCI is Intel-specific and Firmware First is a generic way to say that
firmware would like to look at the errors first.

AMD has similar thing called error thresholding so...
Naveen N. Rao June 20, 2013, 7:02 p.m. UTC | #14
On 06/20/2013 01:18 PM, Borislav Petkov wrote:
> On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote:
>> Add a boot option to disable firmware first mode for corrected errors.
>>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> ---
>>   Documentation/x86/x86_64/boot-options.txt |    5 +++++
>>   arch/x86/include/asm/acpi.h               |    2 ++
>>   arch/x86/kernel/acpi/boot.c               |    5 +++++
>>   drivers/acpi/apei/hest.c                  |    3 ++-
>>   4 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
>> index e9e8ddb..1228b22 100644
>> --- a/Documentation/x86/x86_64/boot-options.txt
>> +++ b/Documentation/x86/x86_64/boot-options.txt
>> @@ -176,6 +176,11 @@ ACPI
>>
>>     acpi=noirq	Don't route interrupts
>>
>> +  acpi=nocmcff	Disable firmware first mode for corrected errors. This
>> +		disables parsing the HEST CMC error source to check if
>> +		firmware has set the FF flag. This may result in
>> +		duplicate corrected error reports.
>
> Ok, this option should be called something more vendor-agnostic like
>
> acpi=noff
>
> for example.
>
> CMCI is Intel-specific and Firmware First is a generic way to say that
> firmware would like to look at the errors first.
>
> AMD has similar thing called error thresholding so...
>

nocmcff actually refers to CMC (Corrected Machine Check) Error source 
which is APEI terminology, so this is not actually Intel-specific and 
should apply equally well for AMD. I chose nocmcff over noff since there 
is also a firmware first flag for MCE Error source in APEI and this 
option only disables FF for CMC.

Thanks,
Naveen

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Naveen N. Rao June 20, 2013, 9:21 p.m. UTC | #15
On 06/20/2013 02:58 AM, Luck, Tony wrote:
>> Ok, where is that semantics? What in a CPER record does say "this error
>> should tell you that you need to offline the containing page and I'm
>> telling you this exactly only once"? Error Severity 0, i.e. Recoverable?
>
> Naveen - this one is for you (or for your BIOS team).  Can you get us a sample
> CPER that you plan to provide when the BIOS decides that its threshold has
> been exceeded?  How will it be different from what old WSM-EX platforms
> were sending to us?  Hopefully the answer is encoded in the CPER record
> and not in some code we have to put in Linux to say "if (IBMplatform) do_thing_1(); else ... "

Looking at the specs, there might be a few ways we can do this:
- One, Error threshold value of 1 in the Hardware Error Notification 
structure of CMC. This field is described as the number of error events 
before OS considers this as an error event. With a threshold value of 1, 
we are essentially asking the OS not to threshold further.
- Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a 
flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it 
looks like we could consider this as an indication to offline the page; 
though I am not sure if/how this relates to the threshold value above.

Thoughts?


Thanks,
Naveen

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Luck June 20, 2013, 10:11 p.m. UTC | #16
> - Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a 

> flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it 

> looks like we could consider this as an indication to offline the page; 

> though I am not sure if/how this relates to the threshold value above.


This one sounds to make sense ... the flag description sounds exactly what
we want - I won't feel embarrassed explaining to people why Linux takes
action when it sees a record like this.

-Tony
Borislav Petkov June 21, 2013, 7:27 a.m. UTC | #17
On Thu, Jun 20, 2013 at 10:11:27PM +0000, Luck, Tony wrote:
> > - Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a 
> > flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it 
> > looks like we could consider this as an indication to offline the page; 
> > though I am not sure if/how this relates to the threshold value above.
> 
> This one sounds to make sense ... the flag description sounds exactly what
> we want - I won't feel embarrassed explaining to people why Linux takes
> action when it sees a record like this.

Yep, and firmware can set that flag when it wants to, so decision when
to offline a page is left to the platform.
Naveen N. Rao June 21, 2013, 4:43 p.m. UTC | #18
On 06/21/2013 12:57 PM, Borislav Petkov wrote:
> On Thu, Jun 20, 2013 at 10:11:27PM +0000, Luck, Tony wrote:
>>> - Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a
>>> flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it
>>> looks like we could consider this as an indication to offline the page;
>>> though I am not sure if/how this relates to the threshold value above.
>>
>> This one sounds to make sense ... the flag description sounds exactly what
>> we want - I won't feel embarrassed explaining to people why Linux takes
>> action when it sees a record like this.
>
> Yep, and firmware can set that flag when it wants to, so decision when
> to offline a page is left to the platform.
>

Ok, I will work towards a patch which does this.

Thanks,
Naveen

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index e9e8ddb..1228b22 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -176,6 +176,11 @@  ACPI
 
   acpi=noirq	Don't route interrupts
 
+  acpi=nocmcff	Disable firmware first mode for corrected errors. This
+		disables parsing the HEST CMC error source to check if
+		firmware has set the FF flag. This may result in
+		duplicate corrected error reports.
+
 PCI
 
   pci=off		Don't use PCI
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..42db2b8 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -86,6 +86,7 @@  extern int acpi_pci_disabled;
 extern int acpi_skip_timer_override;
 extern int acpi_use_timer_override;
 extern int acpi_fix_pin2_polarity;
+extern int acpi_disable_cmcff;
 
 extern u8 acpi_sci_flags;
 extern int acpi_sci_override_gsi;
@@ -168,6 +169,7 @@  static inline void arch_acpi_set_pdc_bits(u32 *buf)
 
 #define acpi_lapic 0
 #define acpi_ioapic 0
+#define acpi_disable_cmcff 0
 static inline void acpi_noirq_set(void) { }
 static inline void acpi_disable_pci(void) { }
 static inline void disable_acpi(void) { }
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 230c8ea..d1998d5 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -66,6 +66,7 @@  EXPORT_SYMBOL(acpi_pci_disabled);
 int acpi_lapic;
 int acpi_ioapic;
 int acpi_strict;
+int acpi_disable_cmcff;
 
 u8 acpi_sci_flags __initdata;
 int acpi_sci_override_gsi __initdata;
@@ -1619,6 +1620,10 @@  static int __init parse_acpi(char *arg)
 	/* "acpi=copy_dsdt" copys DSDT */
 	else if (strcmp(arg, "copy_dsdt") == 0) {
 		acpi_gbl_copy_dsdt_locally = 1;
+	}
+	/* "acpi=nocmcff" disables FF mode for corrected errors */
+	else if (strcmp(arg, "nocmcff") == 0) {
+		acpi_disable_cmcff = 1;
 	} else {
 		/* Core will printk when we return error. */
 		return -EINVAL;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index d8c69ba..f8b1115 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -261,7 +261,8 @@  void __init acpi_hest_init(void)
 		goto err;
 	}
 
-	apei_hest_parse(hest_parse_cmc, NULL);
+	if (!acpi_disable_cmcff)
+		apei_hest_parse(hest_parse_cmc, NULL);
 
 	if (!ghes_disable) {
 		rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count);