diff mbox series

[v7,10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

Message ID 20181203180613.228133-11-james.morse@arm.com (mailing list archive)
State New, archived
Headers show
Series APEI in_nmi() rework and SDEI wire-up | expand

Commit Message

James Morse Dec. 3, 2018, 6:05 p.m. UTC
ACPI has a GHESv2 which is used on hardware reduced platforms to
explicitly acknowledge that the memory for CPER records has been
consumed. This lets an external agent know it can re-use this
memory for something else.

Previously notify_nmi and the estatus queue didn't do this as
they were never used on hardware reduced platforms. Once we move
notify_sea over to use the estatus queue, it may become necessary.

Add the call. This is safe for use in NMI context as the
read_ack_register is pre-mapped by ghes_new() before the
ghes can be added to an RCU list, and then found by the
notification handler.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Borislav Petkov Dec. 11, 2018, 6:36 p.m. UTC | #1
On Mon, Dec 03, 2018 at 06:05:58PM +0000, James Morse wrote:
> ACPI has a GHESv2 which is used on hardware reduced platforms to
> explicitly acknowledge that the memory for CPER records has been
> consumed. This lets an external agent know it can re-use this
> memory for something else.
> 
> Previously notify_nmi and the estatus queue didn't do this as
> they were never used on hardware reduced platforms. Once we move
> notify_sea over to use the estatus queue, it may become necessary.
> 
> Add the call. This is safe for use in NMI context as the
> read_ack_register is pre-mapped by ghes_new() before the
> ghes can be added to an RCU list, and then found by the
> notification handler.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/acpi/apei/ghes.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 366dbdd41ef3..15d94373ba72 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -926,6 +926,10 @@ static int _in_nmi_notify_one(struct ghes *ghes)
>  	__process_error(ghes);
>  	ghes_clear_estatus(ghes, buf_paddr);
>  
> +	if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))

Since ghes_ack_error() is always prepended with this check, you could
push it down into the function:

ghes_ack_error(ghes)
...

	if (!is_hest_type_generic_v2(ghes))
		return 0;

and simplify the two callsites :)
James Morse Jan. 10, 2019, 6:22 p.m. UTC | #2
Hi Boris,

(CC: +Tyler)

On 11/12/2018 18:36, Borislav Petkov wrote:
> On Mon, Dec 03, 2018 at 06:05:58PM +0000, James Morse wrote:
>> ACPI has a GHESv2 which is used on hardware reduced platforms to
>> explicitly acknowledge that the memory for CPER records has been
>> consumed. This lets an external agent know it can re-use this
>> memory for something else.
>>
>> Previously notify_nmi and the estatus queue didn't do this as
>> they were never used on hardware reduced platforms. Once we move
>> notify_sea over to use the estatus queue, it may become necessary.
>>
>> Add the call. This is safe for use in NMI context as the
>> read_ack_register is pre-mapped by ghes_new() before the
>> ghes can be added to an RCU list, and then found by the
>> notification handler.

>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 366dbdd41ef3..15d94373ba72 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -926,6 +926,10 @@ static int _in_nmi_notify_one(struct ghes *ghes)
>>  	__process_error(ghes);
>>  	ghes_clear_estatus(ghes, buf_paddr);
>>  
>> +	if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
> 
> Since ghes_ack_error() is always prepended with this check, you could
> push it down into the function:
> 
> ghes_ack_error(ghes)
> ...
> 
> 	if (!is_hest_type_generic_v2(ghes))
> 		return 0;
> 
> and simplify the two callsites :)

Great idea! ...

.. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from
ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.

Most of the error sources discard the result, the worst thing I can find is
ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
really handle the IRQ. They're registered as SHARED, but I don't have an example
of what goes wrong next.

I think this will also stop the spurious handling code kicking in to shut it up
if its broken and screaming. Unlikely, but not impossible.

Fixed in a prior patch, with Boris' suggestion, ghes_proc()s tail ends up look
like this:
----------------------%<----------------------
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0321d9420b1e..8d1f9930b159 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -700,18 +708,11 @@ static int ghes_proc(struct ghes *ghes)

 out:
        ghes_clear_estatus(ghes, buf_paddr);
+       if (rc != -ENOENT)
+               rc_ack = ghes_ack_error(ghes);

-       if (rc == -ENOENT)
-               return rc;
-
-       /*
-        * GHESv2 type HEST entries introduce support for error acknowledgment,
-        * so only acknowledge the error if this support is present.
-        */
-       if (is_hest_type_generic_v2(ghes))
-               return ghes_ack_error(ghes->generic_v2);
-
-       return rc;
+       /* If rc and rc_ack failed, return the first one */
+       return rc ? rc : rc_ack;
 }
----------------------%<----------------------


Thanks,

James
Tyler Baicar Jan. 10, 2019, 9:01 p.m. UTC | #3
On Thu, Jan 10, 2019 at 1:23 PM James Morse <james.morse@arm.com> wrote:
> >>
> >> +    if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
> >
> > Since ghes_ack_error() is always prepended with this check, you could
> > push it down into the function:
> >
> > ghes_ack_error(ghes)
> > ...
> >
> >       if (!is_hest_type_generic_v2(ghes))
> >               return 0;
> >
> > and simplify the two callsites :)
>
> Great idea! ...
>
> .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from
> ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.
>
> Most of the error sources discard the result, the worst thing I can find is
> ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
> really handle the IRQ. They're registered as SHARED, but I don't have an example
> of what goes wrong next.
>
> I think this will also stop the spurious handling code kicking in to shut it up
> if its broken and screaming. Unlikely, but not impossible.
>
> Fixed in a prior patch, with Boris' suggestion, ghes_proc()s tail ends up look
> like this:
> ----------------------%<----------------------
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0321d9420b1e..8d1f9930b159 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -700,18 +708,11 @@ static int ghes_proc(struct ghes *ghes)
>
>  out:
>         ghes_clear_estatus(ghes, buf_paddr);
> +       if (rc != -ENOENT)
> +               rc_ack = ghes_ack_error(ghes);
>
> -       if (rc == -ENOENT)
> -               return rc;
> -
> -       /*
> -        * GHESv2 type HEST entries introduce support for error acknowledgment,
> -        * so only acknowledge the error if this support is present.
> -        */
> -       if (is_hest_type_generic_v2(ghes))
> -               return ghes_ack_error(ghes->generic_v2);
> -
> -       return rc;
> +       /* If rc and rc_ack failed, return the first one */
> +       return rc ? rc : rc_ack;
>  }
> ----------------------%<----------------------
>

Looks good to me, I guess there's no harm in acking invalid error status blocks.

T
Borislav Petkov Jan. 11, 2019, 12:03 p.m. UTC | #4
On Thu, Jan 10, 2019 at 04:01:27PM -0500, Tyler Baicar wrote:
> On Thu, Jan 10, 2019 at 1:23 PM James Morse <james.morse@arm.com> wrote:
> > >>
> > >> +    if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
> > >
> > > Since ghes_ack_error() is always prepended with this check, you could
> > > push it down into the function:
> > >
> > > ghes_ack_error(ghes)
> > > ...
> > >
> > >       if (!is_hest_type_generic_v2(ghes))
> > >               return 0;
> > >
> > > and simplify the two callsites :)
> >
> > Great idea! ...
> >
> > .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from
> > ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.
> >
> > Most of the error sources discard the result, the worst thing I can find is
> > ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
> > really handle the IRQ. They're registered as SHARED, but I don't have an example
> > of what goes wrong next.
> >
> > I think this will also stop the spurious handling code kicking in to shut it up
> > if its broken and screaming. Unlikely, but not impossible.
> >
> > Fixed in a prior patch, with Boris' suggestion, ghes_proc()s tail ends up look
> > like this:
> > ----------------------%<----------------------
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > index 0321d9420b1e..8d1f9930b159 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -700,18 +708,11 @@ static int ghes_proc(struct ghes *ghes)
> >
> >  out:
> >         ghes_clear_estatus(ghes, buf_paddr);
> > +       if (rc != -ENOENT)
> > +               rc_ack = ghes_ack_error(ghes);
> >
> > -       if (rc == -ENOENT)
> > -               return rc;
> > -
> > -       /*
> > -        * GHESv2 type HEST entries introduce support for error acknowledgment,
> > -        * so only acknowledge the error if this support is present.
> > -        */
> > -       if (is_hest_type_generic_v2(ghes))
> > -               return ghes_ack_error(ghes->generic_v2);
> > -
> > -       return rc;
> > +       /* If rc and rc_ack failed, return the first one */
> > +       return rc ? rc : rc_ack;
> >  }
> > ----------------------%<----------------------
> >
> 
> Looks good to me, I guess there's no harm in acking invalid error status blocks.

Err, why?

I don't know what the firmware glue does on ARM but if I'd have to
remain logical - which is hard to do with firmware - the proper thing to
do would be this:

	rc = ghes_read_estatus(ghes, &buf_paddr);
	if (rc) {
		ghes_reset_hardware();
	}

	/* clear estatus and bla bla */

	/* Now, I'm in the success case: */
	 ghes_ack_error();


This way, you have the error path clear of something unexpected happened
when reading the hardware, obvious and separated. ghes_reset_hardware()
clears the registers and does the necessary steps to put the hardware in
good state again so that it can report the next error.

And the success path simply acks the error and does possibly the same
thing. The naming of the functions is important though, to denote what
gets called when.

This way you handle all the cases just fine. No looking at the error
type and blabla.

Right?
Tyler Baicar Jan. 11, 2019, 3:32 p.m. UTC | #5
On Fri, Jan 11, 2019 at 7:03 AM Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Jan 10, 2019 at 04:01:27PM -0500, Tyler Baicar wrote:
> > On Thu, Jan 10, 2019 at 1:23 PM James Morse <james.morse@arm.com> wrote:
> > > >>
> > > >> +    if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
> > > >
> > > > Since ghes_ack_error() is always prepended with this check, you could
> > > > push it down into the function:
> > > >
> > > > ghes_ack_error(ghes)
> > > > ...
> > > >
> > > >       if (!is_hest_type_generic_v2(ghes))
> > > >               return 0;
> > > >
> > > > and simplify the two callsites :)
> > >
> > > Great idea! ...
> > >
> > > .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from
> > > ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.
> > >
> > > Most of the error sources discard the result, the worst thing I can find is
> > > ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
> > > really handle the IRQ. They're registered as SHARED, but I don't have an example
> > > of what goes wrong next.
> > >
> > > I think this will also stop the spurious handling code kicking in to shut it up
> > > if its broken and screaming. Unlikely, but not impossible.
> > >
> > > Fixed in a prior patch, with Boris' suggestion, ghes_proc()s tail ends up look
> > > like this:
> > > ----------------------%<----------------------
> > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > > index 0321d9420b1e..8d1f9930b159 100644
> > > --- a/drivers/acpi/apei/ghes.c
> > > +++ b/drivers/acpi/apei/ghes.c
> > > @@ -700,18 +708,11 @@ static int ghes_proc(struct ghes *ghes)
> > >
> > >  out:
> > >         ghes_clear_estatus(ghes, buf_paddr);
> > > +       if (rc != -ENOENT)
> > > +               rc_ack = ghes_ack_error(ghes);
> > >
> > > -       if (rc == -ENOENT)
> > > -               return rc;
> > > -
> > > -       /*
> > > -        * GHESv2 type HEST entries introduce support for error acknowledgment,
> > > -        * so only acknowledge the error if this support is present.
> > > -        */
> > > -       if (is_hest_type_generic_v2(ghes))
> > > -               return ghes_ack_error(ghes->generic_v2);
> > > -
> > > -       return rc;
> > > +       /* If rc and rc_ack failed, return the first one */
> > > +       return rc ? rc : rc_ack;
> > >  }
> > > ----------------------%<----------------------
> > >
> >
> > Looks good to me, I guess there's no harm in acking invalid error status blocks.
>
> Err, why?

If ghes_read_estatus() fails, then either there was no error populated or the
error status block was invalid. If the error status block is invalid, then the
kernel doesn't know what happened in hardware.

I originally thought this was changing what's acked, but it's just changing the
return value of ghes_proc() when ghes_read_estatus() returns -EIO.

> I don't know what the firmware glue does on ARM but if I'd have to
> remain logical - which is hard to do with firmware - the proper thing to
> do would be this:
>
>         rc = ghes_read_estatus(ghes, &buf_paddr);
>         if (rc) {
>                 ghes_reset_hardware();

The kernel would have no way of knowing what to do here.

>         }
>
>         /* clear estatus and bla bla */
>
>         /* Now, I'm in the success case: */
>          ghes_ack_error();
>
>
> This way, you have the error path clear of something unexpected happened
> when reading the hardware, obvious and separated. ghes_reset_hardware()
> clears the registers and does the necessary steps to put the hardware in
> good state again so that it can report the next error.
>
> And the success path simply acks the error and does possibly the same
> thing. The naming of the functions is important though, to denote what
> gets called when.
>
> This way you handle all the cases just fine. No looking at the error
> type and blabla.
>
> Right?
Borislav Petkov Jan. 11, 2019, 5:45 p.m. UTC | #6
On Fri, Jan 11, 2019 at 10:32:23AM -0500, Tyler Baicar wrote:
> The kernel would have no way of knowing what to do here.

What do you mean, there's no way of knowing what to do? It needs to
clear registers so that the next error can get reported properly.

Or of the status read failed and it doesn't need to do anything, then it
shouldn't.

Whatever it is, the kernel either needs to do something in the error
case to clean up, or nothing if the firmware doesn't need anything done
in the error case; *or* ack the error in the success case.

This should all be written down somewhere in that GHES v2
spec/doc/writeup whatever, explaining what the OS is supposed to do to
signal the error has been read by the OS.
James Morse Jan. 11, 2019, 6:09 p.m. UTC | #7
Hi guys,

On 11/01/2019 15:32, Tyler Baicar wrote:
> On Fri, Jan 11, 2019 at 7:03 AM Borislav Petkov <bp@alien8.de> wrote:
>> On Thu, Jan 10, 2019 at 04:01:27PM -0500, Tyler Baicar wrote:
>>> On Thu, Jan 10, 2019 at 1:23 PM James Morse <james.morse@arm.com> wrote:
>>>>>>
>>>>>> +    if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
>>>>>
>>>>> Since ghes_ack_error() is always prepended with this check, you could
>>>>> push it down into the function:
>>>>>
>>>>> ghes_ack_error(ghes)
>>>>> ...
>>>>>
>>>>>       if (!is_hest_type_generic_v2(ghes))
>>>>>               return 0;
>>>>>
>>>>> and simplify the two callsites :)
>>>>
>>>> Great idea! ...
>>>>
>>>> .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from
>>>> ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.
>>>>
>>>> Most of the error sources discard the result, the worst thing I can find is
>>>> ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
>>>> really handle the IRQ. They're registered as SHARED, but I don't have an example
>>>> of what goes wrong next.
>>>>
>>>> I think this will also stop the spurious handling code kicking in to shut it up
>>>> if its broken and screaming. Unlikely, but not impossible.

[....]

>>> Looks good to me, I guess there's no harm in acking invalid error status blocks.

Great, I didn't miss something nasty...


>> Err, why?
> 
> If ghes_read_estatus() fails, then either there was no error populated or the
> error status block was invalid.
> If the error status block is invalid, then the kernel doesn't know what happened
> in hardware.

What do we mean by 'hardware' here? We're receiving a corrupt report of
something via memory.
The GHESv2 ack just means we're done with the memory. I think it exists because
the external-agent can't peek into the CPU to see if its returned from the
notification.


> I originally thought this was changing what's acked, but it's just changing the
> return value of ghes_proc() when ghes_read_estatus() returns -EIO.

Sorry, that will be due to my bad description.


>> I don't know what the firmware glue does on ARM but if I'd have to
>> remain logical - which is hard to do with firmware - the proper thing to
>> do would be this:
>>
>>         rc = ghes_read_estatus(ghes, &buf_paddr);
>>         if (rc) {
>>                 ghes_reset_hardware();
> 
> The kernel would have no way of knowing what to do here.

Is there anything wrong with what we do today? We stamp on the records so that
we don't processes them again. (especially if is polled), and we tell firmware
it can re-use this memory.

(I think we should return an error, or print a ratelimited warning for corrupt
records)


>>         }
>>
>>         /* clear estatus and bla bla */
>>
>>         /* Now, I'm in the success case: */
>>          ghes_ack_error();
>>
>>
>> This way, you have the error path clear of something unexpected happened
>> when reading the hardware, obvious and separated. ghes_reset_hardware()
>> clears the registers and does the necessary steps to put the hardware in
>> good state again so that it can report the next error.
>>
>> And the success path simply acks the error and does possibly the same
>> thing. The naming of the functions is important though, to denote what
>> gets called when.

I think this duplicates the record-stamping/acking. If there is anything in that
memory region, the action for processed/copied/ignored-because-its-corrupt is
the same.

We can return on ENOENT out earlier, as nothing needs doing in that case. Its
what the GHES_TO_CLEAR spaghetti is for, we can probably move the ack thing into
ghes_clear_estatus(), that way that thing means 'I'm done with this memory'.

Something like:
-------------------------
rc = ghes_read_estatus();
if (rc == -ENOENT)
	return 0;

if (!rc) {
	ghes_do_proc() and friends;
}

ghes_clear_estatus();

return rc;
-------------------------

We would no longer return errors from the ack code, I suspect that can only
happen for a corrupt gas, which we would have caught earlier as we rely on the
mapping being cached.



Thanks,

James
James Morse Jan. 11, 2019, 6:25 p.m. UTC | #8
Hi Boris,

On 11/01/2019 17:45, Borislav Petkov wrote:
> On Fri, Jan 11, 2019 at 10:32:23AM -0500, Tyler Baicar wrote:
>> The kernel would have no way of knowing what to do here.
> 
> What do you mean, there's no way of knowing what to do? It needs to
> clear registers so that the next error can get reported properly.
> 
> Or of the status read failed and it doesn't need to do anything, then it
> shouldn't.

I think we're speaking at cross-purposes. If the error-detecting-hardware has
some state, that's firmware's problem to deal with.
What we're dealing with here is the memory we read the error records from.


> Whatever it is, the kernel either needs to do something in the error
> case to clean up, or nothing if the firmware doesn't need anything done
> in the error case; *or* ack the error in the success case.

We ack it in the corrupt-record case too, because we are done with the memory.


> This should all be written down somewhere in that GHES v2
> spec/doc/writeup whatever, explaining what the OS is supposed to do to
> signal the error has been read by the OS.

I think it is. 18.3.2.8 of ACPI v6.2 (search for Generic Hardware Error Source
version 2", then below the table):
* OSPM detects error (via interrupt/exception or polling the block status)
* OSPM copies the error status block
* OSPM clears the block status field of the error status block
* OSPM acknowledges the error via Read Ack register

The ENOENT case is excluded by 'polling the block status'.
Unsurprisingly the spec doesn't consider the case that firmware generates
corrupt records!


Thanks,

James
Borislav Petkov Jan. 11, 2019, 7:58 p.m. UTC | #9
On Fri, Jan 11, 2019 at 06:25:21PM +0000, James Morse wrote:
> We ack it in the corrupt-record case too, because we are done with the
> memory.

Ok, so the only thing that we need to do unconditionally is ACK in order
to free the memory. Or is there an exception to that set of steps in
error handling?

> I think it is. 18.3.2.8 of ACPI v6.2 (search for Generic Hardware Error Source
> version 2", then below the table):
> * OSPM detects error (via interrupt/exception or polling the block status)
> * OSPM copies the error status block
> * OSPM clears the block status field of the error status block
> * OSPM acknowledges the error via Read Ack register
> 
> The ENOENT case is excluded by 'polling the block status'.

Ok, so we signal the absence of an error record with ENOENT.

        if (!buf_paddr)
                return -ENOENT;

Can that even happen?

Also, in that case, what would happen if we ACK the error anyway? We'd
confuse the firmware?

I sure hope firmware is prepared for spurious ACKs :)

> Unsurprisingly the spec doesn't consider the case that firmware generates
> corrupt records!

You mean the EIO case?

Not surprised at all. But we do not report that record so all good.

Thx.
Borislav Petkov Jan. 11, 2019, 8:01 p.m. UTC | #10
On Fri, Jan 11, 2019 at 06:09:28PM +0000, James Morse wrote:
> We can return on ENOENT out earlier, as nothing needs doing in that case. Its
> what the GHES_TO_CLEAR spaghetti is for, we can probably move the ack thing into
> ghes_clear_estatus(), that way that thing means 'I'm done with this memory'.

That actually sounds nice and other code in the kernel already does
that: when a failure has been encountered during reading status, you
free up resources right then and there. No need for passing retvals back
and forth. And this would simplify the spaghetti. Which is something
good(tm) on its own!

Thx.
Tyler Baicar Jan. 11, 2019, 8:53 p.m. UTC | #11
On Fri, Jan 11, 2019 at 1:09 PM James Morse <james.morse@arm.com> wrote:
> On 11/01/2019 15:32, Tyler Baicar wrote:
> > On Fri, Jan 11, 2019 at 7:03 AM Borislav Petkov <bp@alien8.de> wrote:
> >> On Thu, Jan 10, 2019 at 04:01:27PM -0500, Tyler Baicar wrote:
> >>> On Thu, Jan 10, 2019 at 1:23 PM James Morse <james.morse@arm.com> wrote:
> >>>>>>
> >>>>>> +    if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
> >>>>>
> >>>>> Since ghes_ack_error() is always prepended with this check, you could
> >>>>> push it down into the function:
> >>>>>
> >>>>> ghes_ack_error(ghes)
> >>>>> ...
> >>>>>
> >>>>>       if (!is_hest_type_generic_v2(ghes))
> >>>>>               return 0;
> >>>>>
> >>>>> and simplify the two callsites :)
> >>>>
> >>>> Great idea! ...
> >>>>
> >>>> .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from
> >>>> ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.
> >>>>
> >>>> Most of the error sources discard the result, the worst thing I can find is
> >>>> ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
> >>>> really handle the IRQ. They're registered as SHARED, but I don't have an example
> >>>> of what goes wrong next.
> >>>>
> >>>> I think this will also stop the spurious handling code kicking in to shut it up
> >>>> if its broken and screaming. Unlikely, but not impossible.
>
> [....]
>
> >>> Looks good to me, I guess there's no harm in acking invalid error status blocks.
>
> Great, I didn't miss something nasty...
>
>
> >> Err, why?
> >
> > If ghes_read_estatus() fails, then either there was no error populated or the
> > error status block was invalid.
> > If the error status block is invalid, then the kernel doesn't know what happened
> > in hardware.
>
> What do we mean by 'hardware' here? We're receiving a corrupt report of
> something via memory.

By Hardware here I meant whatever hardware was reporting the error.

> The GHESv2 ack just means we're done with the memory. I think it exists because
> the external-agent can't peek into the CPU to see if its returned from the
> notification.
>
>
> > I originally thought this was changing what's acked, but it's just changing the
> > return value of ghes_proc() when ghes_read_estatus() returns -EIO.
>
> Sorry, that will be due to my bad description.
>
>
> >> I don't know what the firmware glue does on ARM but if I'd have to
> >> remain logical - which is hard to do with firmware - the proper thing to
> >> do would be this:
> >>
> >>         rc = ghes_read_estatus(ghes, &buf_paddr);
> >>         if (rc) {
> >>                 ghes_reset_hardware();
> >
> > The kernel would have no way of knowing what to do here.
>
> Is there anything wrong with what we do today? We stamp on the records so that
> we don't processes them again. (especially if is polled), and we tell firmware
> it can re-use this memory.
>
> (I think we should return an error, or print a ratelimited warning for corrupt
> records)

Agree, the print is already present in ghes_read_estatus.

> >>         }
> >>
> >>         /* clear estatus and bla bla */
> >>
> >>         /* Now, I'm in the success case: */
> >>          ghes_ack_error();
> >>
> >>
> >> This way, you have the error path clear of something unexpected happened
> >> when reading the hardware, obvious and separated. ghes_reset_hardware()
> >> clears the registers and does the necessary steps to put the hardware in
> >> good state again so that it can report the next error.
> >>
> >> And the success path simply acks the error and does possibly the same
> >> thing. The naming of the functions is important though, to denote what
> >> gets called when.
>
> I think this duplicates the record-stamping/acking. If there is anything in that
> memory region, the action for processed/copied/ignored-because-its-corrupt is
> the same.
>
> We can return on ENOENT out earlier, as nothing needs doing in that case. Its
> what the GHES_TO_CLEAR spaghetti is for, we can probably move the ack thing into
> ghes_clear_estatus(), that way that thing means 'I'm done with this memory'.
>
> Something like:
> -------------------------
> rc = ghes_read_estatus();
> if (rc == -ENOENT)
>         return 0;

We still should be returning at least the -ENOENT from ghes_read_estatus().
That is being used by the SEA handling to determine if an SEA was properly
reported/handled by the host kernel in the KVM SEA case.

Here are the relevant functions:
https://elixir.bootlin.com/linux/latest/source/drivers/acpi/apei/ghes.c#L797
https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/fault.c#L723
https://elixir.bootlin.com/linux/latest/source/virt/kvm/arm/mmu.c#L1706

>
> if (!rc) {
>         ghes_do_proc() and friends;
> }
>
> ghes_clear_estatus();
>
> return rc;
> -------------------------
>
> We would no longer return errors from the ack code, I suspect that can only
> happen for a corrupt gas, which we would have caught earlier as we rely on the
> mapping being cached.
James Morse Jan. 23, 2019, 6:36 p.m. UTC | #12
Hi Boris,

On 11/01/2019 19:58, Borislav Petkov wrote:
> On Fri, Jan 11, 2019 at 06:25:21PM +0000, James Morse wrote:
>> We ack it in the corrupt-record case too, because we are done with the
>> memory.
> 
> Ok, so the only thing that we need to do unconditionally is ACK in order
> to free the memory. Or is there an exception to that set of steps in
> error handling?

Do you consider ENOENT an error? We don't ack in that case as the memory wasn't
in use.

For the other cases its because the records are bogus, but we still
unconditionally tell firmware we're done with them.


>> I think it is. 18.3.2.8 of ACPI v6.2 (search for Generic Hardware Error Source
>> version 2", then below the table):
>> * OSPM detects error (via interrupt/exception or polling the block status)
>> * OSPM copies the error status block
>> * OSPM clears the block status field of the error status block
>> * OSPM acknowledges the error via Read Ack register
>>
>> The ENOENT case is excluded by 'polling the block status'.
> 
> Ok, so we signal the absence of an error record with ENOENT.
> 
>         if (!buf_paddr)
>                 return -ENOENT;
> 
> Can that even happen?

Yes, for NOTIFY_POLLED its the norm. For the IRQ flavours that walk a list of
GHES, all but one of them will return ENOENT.


> Also, in that case, what would happen if we ACK the error anyway? We'd
> confuse the firmware?

No idea.

> I sure hope firmware is prepared for spurious ACKs :)

We could try it and see. It depends if firmware shares ack locations between
multiple GHES. We could ack an empty GHES, and it removes the records of one we
haven't looked at yet.


>> Unsurprisingly the spec doesn't consider the case that firmware generates
>> corrupt records!
> 
> You mean the EIO case?

Yup,

> Not surprised at all. But we do not report that record so all good.



Thanks,

James
Borislav Petkov Jan. 29, 2019, 11:49 a.m. UTC | #13
On Wed, Jan 23, 2019 at 06:36:38PM +0000, James Morse wrote:
> Do you consider ENOENT an error? We don't ack in that case as the
> memory wasn't in use.

So let's see:

        if (!*buf_paddr)
                return -ENOENT;

can happen when apei_read() has returned 0 but it has managed to do

	*val = 0;

Now, that function returns error values which we should be checking
but we're checking the buf_paddr pointed to value for being 0. Are
we fearing that even if acpi_os_read_memory() or acpi_os_read_port()
succeed, *buf_paddr could still be 0 ?

Because if not, we should be checking whether rc == -EINVAL and then
convert it to -ENOENT.

But ghes_read_estatus() handles the error case first *and* *then* checks
buf_paddr too, to make really really sure we won't be reading from
address 0.

> For the other cases its because the records are bogus, but we still
> unconditionally tell firmware we're done with them.

... to free the memory, yes, ok.

> >> I think it is. 18.3.2.8 of ACPI v6.2 (search for Generic Hardware Error Source
> >> version 2", then below the table):
> >> * OSPM detects error (via interrupt/exception or polling the block status)
> >> * OSPM copies the error status block
> >> * OSPM clears the block status field of the error status block
> >> * OSPM acknowledges the error via Read Ack register
> >>
> >> The ENOENT case is excluded by 'polling the block status'.
> > 
> > Ok, so we signal the absence of an error record with ENOENT.
> > 
> >         if (!buf_paddr)
> >                 return -ENOENT;
> > 
> > Can that even happen?
> 
> Yes, for NOTIFY_POLLED its the norm. For the IRQ flavours that walk a list of
> GHES, all but one of them will return ENOENT.

Lemme get this straight: when we do

	apei_read(&buf_paddr, &g->error_status_address);

in the polled case, buf_paddr can be 0?

> We could try it and see. It depends if firmware shares ack locations between
> multiple GHES. We could ack an empty GHES, and it removes the records of one we
> haven't looked at yet.

Yeah, OTOH, we shouldn't be pushing our luck here, I guess.

So let's sum up: we'll ack the GHES error in all but the -ENOENT cases
in order to free the memory occupied by the error record.

The slightly "pathological" -ENOENT case is I guess how the fw behaves
when it is being polled and also for broken firmware which could report
a 0 buf_paddr.

Btw, that last thing I'm assuming because

  d334a49113a4 ("ACPI, APEI, Generic Hardware Error Source memory error support")

doesn't say what that check was needed for.

Thx.
James Morse Jan. 29, 2019, 6:48 p.m. UTC | #14
Hi Boris,

On 29/01/2019 11:49, Borislav Petkov wrote:
> On Wed, Jan 23, 2019 at 06:36:38PM +0000, James Morse wrote:
>> Do you consider ENOENT an error? We don't ack in that case as the
>> memory wasn't in use.
> 
> So let's see:
> 
>         if (!*buf_paddr)
>                 return -ENOENT;
> 
> can happen when apei_read() has returned 0 but it has managed to do
> 
> 	*val = 0;

> Now, that function returns error values which we should be checking
> but we're checking the buf_paddr pointed to value for being 0. Are
> we fearing that even if acpi_os_read_memory() or acpi_os_read_port()
> succeed, *buf_paddr could still be 0 ?

That's what this code is doing, checking for a successful read, of zero.
The g->error_status_address has to point somewhere as its location is advertised
in the tables.

What is the value of g->error_status_address 'out of reset' or before any error
has occurred? This code expects it to be zero, or to point to a CPER block with
an empty block_status.

(the acpi spec is unclear on when *(g->error_status_address) is written)


> Because if not, we should be checking whether rc == -EINVAL and then
> convert it to -ENOENT.

EINVAL implies the reg->space_id wasn't one of the two "System IO or System
Memory". (I thought the spec required this, but it only says this for EINJ:
'This constraint is an attempt to ensure that the registers are accessible in
the presence of hardware error conditions'.)

apei_check_gar() checks for these two in apei_map_generic_address(), so if this
is the case we would have failed at ghes_new() time.


> But ghes_read_estatus() handles the error case first *and* *then* checks
> buf_paddr too, to make really really sure we won't be reading from
> address 0.

I think this is the distinction between 'failed to read', (because
g->error_status_address has bad alignment or an unsupported address-space
id/access-size), and successfully read 0, which is treated as ENOENT.


>> For the other cases its because the records are bogus, but we still
>> unconditionally tell firmware we're done with them.
> 
> ... to free the memory, yes, ok.
> 
>>>> I think it is. 18.3.2.8 of ACPI v6.2 (search for Generic Hardware Error Source
>>>> version 2", then below the table):
>>>> * OSPM detects error (via interrupt/exception or polling the block status)
>>>> * OSPM copies the error status block
>>>> * OSPM clears the block status field of the error status block
>>>> * OSPM acknowledges the error via Read Ack register
>>>>
>>>> The ENOENT case is excluded by 'polling the block status'.
>>>
>>> Ok, so we signal the absence of an error record with ENOENT.
>>>
>>>         if (!buf_paddr)
>>>                 return -ENOENT;
>>>
>>> Can that even happen?
>>
>> Yes, for NOTIFY_POLLED its the norm. For the IRQ flavours that walk a list of
>> GHES, all but one of them will return ENOENT.


> Lemme get this straight: when we do
> 
> 	apei_read(&buf_paddr, &g->error_status_address);
> 
> in the polled case, buf_paddr can be 0?

If firmware has never generated CPER records, so it has never written to void
*error_status_address, yes.

There seem to be two ways of doing this. This zero check implies an example
system could be:
| g->error_status_address == 0xf00d
| *(u64 *)0xf00d == 0
Firmware populates CPER records, then updates 0xf00d.
(0xf00d would have been pre-mapped by apei_map_generic_address() in ghes_new())
Reads of 0xf00d before CPER records are generated get 0.

Once an error occurs, this system now looks like this:
| g->error_status_address == 0xf00d
| *(u64 *)0xf00d == 0xbeef
| *(u64 *)0xbeef == 0

For new errors, firmware populates CPER records, then updates 0xf00d.
Alternatively firmware could re-use the memory at 0xbeef, generating the CPER
records backwards, so that once 0xbeef is updated, the rest of the record is
visible. (firmware knows not to race with another CPU right?)

Firmware could equally point 0xf00d at 0xbeef at startup, so it has one fewer
values to write when an error occurs. I have an arm64 system with a HEST that
does this. (I'm pretty sure its ACPI support is a copy-and-paste from x86, it
even describes NOTIFY_NMI, who knows what that means on arm!)

When linux processes an error, ghes_clear_estatus() NULLs the
estatus->block_status, (which in this example is at 0xbeef). This is the
documented sequence for GHESv2.
Elsewhere the spec talks of checking the block status which is part of the
records, (not the error_status_address, which is the pointer to the records).

Linux can't NULL 0xf00d, because it doesn't know if firmware will write it again
next time it updates the records.
I can't find where in the spec it says the error status address is written to.
Linux works with both 'at boot' and 'on each error'.
If it were know to have a static value, ghes_copy_tofrom_phys() would not have
been necessary, but its been there since d334a49113a4.

In the worst case, if there is a value at the error_status_address, we have to
map/unmap it every time we poll in case firmware wrote new records at that same
location.

I don't think we can change Linux's behaviour here, without interpreting zero as
CPER records or missing new errors.


>> We could try it and see. It depends if firmware shares ack locations between
>> multiple GHES. We could ack an empty GHES, and it removes the records of one we
>> haven't looked at yet.
> 
> Yeah, OTOH, we shouldn't be pushing our luck here, I guess.
> 
> So let's sum up: we'll ack the GHES error in all but the -ENOENT cases
> in order to free the memory occupied by the error record.

I agree.


> The slightly "pathological" -ENOENT case is I guess how the fw behaves
> when it is being polled and also for broken firmware which could report
> a 0 buf_paddr.
> 
> Btw, that last thing I'm assuming because
> 
>   d334a49113a4 ("ACPI, APEI, Generic Hardware Error Source memory error support")
> 
> doesn't say what that check was needed for.

Heh. I'd assume this was the out-of-reset value on the platform that was
developed for, which implicitly assumed we could never get CPER records at zero.


Thanks,

James
James Morse Jan. 29, 2019, 6:48 p.m. UTC | #15
Hi Tyler,

On 11/01/2019 20:53, Tyler Baicar wrote:
> On Fri, Jan 11, 2019 at 1:09 PM James Morse <james.morse@arm.com> wrote:
>> We can return on ENOENT out earlier, as nothing needs doing in that case. Its
>> what the GHES_TO_CLEAR spaghetti is for, we can probably move the ack thing into
>> ghes_clear_estatus(), that way that thing means 'I'm done with this memory'.
>>
>> Something like:
>> -------------------------
>> rc = ghes_read_estatus();
>> if (rc == -ENOENT)
>>         return 0;
> 
> We still should be returning at least the -ENOENT from ghes_read_estatus().
> That is being used by the SEA handling to determine if an SEA was properly
> reported/handled by the host kernel in the KVM SEA case.

Sorry, my terrible example code. You'll be glad to know I would have caught this
when testing it!


Thanks,

James
Borislav Petkov Jan. 31, 2019, 1:29 p.m. UTC | #16
On Tue, Jan 29, 2019 at 06:48:33PM +0000, James Morse wrote:
> If firmware has never generated CPER records, so it has never written to void
> *error_status_address, yes.

I guess this is the bit of information I was missing.

> There seem to be two ways of doing this. This zero check implies an example
> system could be:
> | g->error_status_address == 0xf00d
> | *(u64 *)0xf00d == 0
> Firmware populates CPER records, then updates 0xf00d.
> (0xf00d would have been pre-mapped by apei_map_generic_address() in ghes_new())
> Reads of 0xf00d before CPER records are generated get 0.

Ok, this sounds like the polled case. FW better have a record ready
before raising the NMI.

> Once an error occurs, this system now looks like this:
> | g->error_status_address == 0xf00d
> | *(u64 *)0xf00d == 0xbeef
> | *(u64 *)0xbeef == 0
> 
> For new errors, firmware populates CPER records, then updates 0xf00d.
> Alternatively firmware could re-use the memory at 0xbeef, generating the CPER
> records backwards, so that once 0xbeef is updated, the rest of the record is
> visible. (firmware knows not to race with another CPU right?)

Thanks for the comic relief. :-P

> Firmware could equally point 0xf00d at 0xbeef at startup, so it has one fewer
> values to write when an error occurs. I have an arm64 system with a HEST that
> does this. (I'm pretty sure its ACPI support is a copy-and-paste from x86, it
> even describes NOTIFY_NMI, who knows what that means on arm!)

Oh the fun.

> When linux processes an error, ghes_clear_estatus() NULLs the
> estatus->block_status, (which in this example is at 0xbeef). This is the
> documented sequence for GHESv2.
> Elsewhere the spec talks of checking the block status which is part of the
> records, (not the error_status_address, which is the pointer to the records).
>
> Linux can't NULL 0xf00d, because it doesn't know if firmware will write it again
> next time it updates the records.
> I can't find where in the spec it says the error status address is written to.
> Linux works with both 'at boot' and 'on each error'.
> If it were know to have a static value, ghes_copy_tofrom_phys() would not have
> been necessary, but its been there since d334a49113a4.
>
> In the worst case, if there is a value at the error_status_address, we have to
> map/unmap it every time we poll in case firmware wrote new records at that same
> location.
> 
> I don't think we can change Linux's behaviour here, without interpreting zero as
> CPER records or missing new errors.

Nah, I was simply trying to figure out why we do that buf_paddr check.
Thanks for the extensive clarification.
diff mbox series

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 366dbdd41ef3..15d94373ba72 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -926,6 +926,10 @@  static int _in_nmi_notify_one(struct ghes *ghes)
 	__process_error(ghes);
 	ghes_clear_estatus(ghes, buf_paddr);
 
+	if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
+		pr_warn_ratelimited(FW_WARN GHES_PFX
+				    "Failed to ack error status block!\n");
+
 	return 0;
 }