diff mbox

[v1,2/2] arm64: handle NOTIFY_SEI notification by the APEI driver

Message ID 1527770506-8076-3-git-send-email-gengdongjiu@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dongjiu Geng May 31, 2018, 12:41 p.m. UTC
When kernel or KVM gets the NOTIFY_SEI notification, it firstly
calls the APEI driver to handle this notification.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 arch/arm64/kernel/traps.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
---
change since https://www.spinics.net/lists/kvm/msg168919.html

1. Remove the handle_guest_sei() helper

Comments

Mark Rutland May 31, 2018, 11:01 a.m. UTC | #1
On Thu, May 31, 2018 at 08:41:46PM +0800, Dongjiu Geng wrote:
> When kernel or KVM gets the NOTIFY_SEI notification, it firstly
> calls the APEI driver to handle this notification.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> ---
>  arch/arm64/kernel/traps.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> ---
> change since https://www.spinics.net/lists/kvm/msg168919.html
> 
> 1. Remove the handle_guest_sei() helper
> 
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 8bbdc17..676e40c 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -50,6 +50,7 @@
>  #include <asm/exception.h>
>  #include <asm/system_misc.h>
>  #include <asm/sysreg.h>
> +#include <acpi/ghes.h>

Nit: please place newline before the new include, since it comes from a
different directory (and we do so in fault.c).

>  static const char *handler[]= {
>  	"Synchronous Abort",
> @@ -701,6 +702,20 @@ void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr)
>  bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)
>  {
>  	u32 aet = arm64_ras_serror_get_severity(esr);
> +	int ret = -ENOENT;
> +
> +	if (IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
> +		if (interrupts_enabled(regs))
> +			nmi_enter();
> +
> +		ret = ghes_notify_sei();
> +
> +		if (interrupts_enabled(regs))
> +			nmi_exit();
> +
> +		if (!ret)
> +			return false;
> +	}

In do_serror() we already handle nmi_{enter,exit}(), so there's no need
for that here.

TBH, I don't understand why do_sea() does that conditionally today.
Unless there's some constraint I'm missing, I think it would make more
sense to do that regardless of whether the interrupted context had
interrupts enabled. James -- does that make sense to you?

If you update the prior patch with a stub for !CONFIG_ACPI_APEI_SEI, you
can simplify all of the above additions down to:

	if (!ghes_notify_sei())
		return;

... which would look a lot nicer.

Thanks,
Mark.
James Morse May 31, 2018, 4:51 p.m. UTC | #2
Hi Mark, Dongjiu Geng,

On 31/05/18 12:01, Mark Rutland wrote:
> In do_serror() we already handle nmi_{enter,exit}(), so there's no need
> for that here.

Even better: nmi_enter() has a BUG_ON(in_nmi()).


> TBH, I don't understand why do_sea() does that conditionally today.
> Unless there's some constraint I'm missing, 

APEI uses a different fixmap entry and locks when in_nmi(). This was because we
may interrupt the irq-masked region in APEI that was using the regular memory.
(e.g. the 'polled' notification, or something backed by an interrupt.) But,
Borislav has spotted other things in here that are broken[0]. I'm working on
rolling all that into 'v5' of the in_nmi() rework stuff.

We currently get away with this on arm because 'SEA' is the only NMI-like thing,
and it occurs synchronously. The problem cases are all also cases where the
kernel text is corrupt, which we can't possibly hope to handle.

For NOTIFY_SDEI and NOTIFY_SEI this is the wrong pattern as these are
asynchronous. do_serror() has already done most of the work for NOTIFY_SEI, but
we need to use the estatus queue in APEI, which is currently x86 only.


> I think it would make more
> sense to do that regardless of whether the interrupted context had
> interrupts enabled. James -- does that make sense to you?
> 
> If you update the prior patch with a stub for !CONFIG_ACPI_APEI_SEI, you
> can simplify all of the above additions down to:
> 
> 	if (!ghes_notify_sei())
> 		return;
> 
> ... which would look a lot nicer.

The code that calls ghes_notify_sei() may need calling by KVM too, but its
default action to an 'unclaimed' SError will be different.
Because of the race between memory_failure() and return-to-userspace, we may
need to kick the irq work queue (if we can), as we return from do_serror(). [1]
and [2] provide an example for NOTIFY_SEA. SDEI does this by returning to the
kernel through the IRQ handler, (which handles the KVM case too).


I think this series is unsafe until we can use the estatus queue in APEI. Its
also missing the handling for an SError interrupting a KVM guest.


Thanks,

James

[0] https://www.spinics.net/lists/arm-kernel/msg653332.html
[1] https://www.spinics.net/lists/arm-kernel/msg649237.html
[2] https://www.spinics.net/lists/arm-kernel/msg649239.html
Dongjiu Geng June 1, 2018, 7:21 a.m. UTC | #3
Hi Mark, James,
   Thanks the review.

On 2018/6/1 0:51, James Morse wrote:
> Hi Mark, Dongjiu Geng,
> 
> On 31/05/18 12:01, Mark Rutland wrote:
>> In do_serror() we already handle nmi_{enter,exit}(), so there's no need
>> for that here.
> 
> Even better: nmi_enter() has a BUG_ON(in_nmi()).

There are two places call the arm64_is_fatal_ras_serror():
1. do_serror()
2. kvm_handle_guest_serror()

Yes, the do_serror() already handle nmi_{enter,exit}(), so the arm64_is_fatal_ras_serror() does not need to handle it again.
For the kvm_handle_guest_serror(), it does not handle the nmi_{enter,exit}(), so should we add nmi_{enter,exit}() in  kvm_handle_guest_serror() before calling arm64_is_fatal_ras_serror()?

For the NOTIFY_SEA, I do not know why handle_guest_sea() does not handle the nmi_{enter,exit}(). James, does we miss it in handle_guest_sea()?

> 
> 
>> TBH, I don't understand why do_sea() does that conditionally today.
>> Unless there's some constraint I'm missing, 
> 
> APEI uses a different fixmap entry and locks when in_nmi(). This was because we
> may interrupt the irq-masked region in APEI that was using the regular memory.
> (e.g. the 'polled' notification, or something backed by an interrupt.) But,
> Borislav has spotted other things in here that are broken[0]. I'm working on
> rolling all that into 'v5' of the in_nmi() rework stuff.
> 
> We currently get away with this on arm because 'SEA' is the only NMI-like thing,
> and it occurs synchronously. The problem cases are all also cases where the
> kernel text is corrupt, which we can't possibly hope to handle.
> 
> For NOTIFY_SDEI and NOTIFY_SEI this is the wrong pattern as these are
> asynchronous. do_serror() has already done most of the work for NOTIFY_SEI, but
> we need to use the estatus queue in APEI, which is currently x86 only.
I think this patch can based on you 'v5' of the in_nmi() rework stuff

> 
> 
>> I think it would make more
>> sense to do that regardless of whether the interrupted context had
>> interrupts enabled. James -- does that make sense to you?
>>
>> If you update the prior patch with a stub for !CONFIG_ACPI_APEI_SEI, you
>> can simplify all of the above additions down to:
>>
>> 	if (!ghes_notify_sei())
>> 		return;
>>
>> ... which would look a lot nicer.
> 
> The code that calls ghes_notify_sei() may need calling by KVM too, but its
> default action to an 'unclaimed' SError will be different.
> Because of the race between memory_failure() and return-to-userspace, we may
> need to kick the irq work queue (if we can), as we return from do_serror(). [1]
> and [2] provide an example for NOTIFY_SEA. SDEI does this by returning to the
> kernel through the IRQ handler, (which handles the KVM case too).
I can kick the IRQ work queue as you do for the NOTIFY_SEA and NOTIFY_SDEI.

> 
> 
> I think this series is unsafe until we can use the estatus queue in APEI. Its
> also missing the handling for an SError interrupting a KVM guest.

how about this series is based on your patches that uses the estatus queue in APEI to make it safe?

when an SError interrupting a KVM guest, it will trap to hypervisor, hypervisor will call
below software stack to handle it:
kvm_handle_guest_serror()->arm64_is_fatal_ras_serror()->ghes_notify_sei()

so it already handles the case that an SError interrupting a KVM guest.

> 
> 
> Thanks,
> 
> James
> 
> [0] https://www.spinics.net/lists/arm-kernel/msg653332.html
> [1] https://www.spinics.net/lists/arm-kernel/msg649237.html
> [2] https://www.spinics.net/lists/arm-kernel/msg649239.html
> 
> .
>
James Morse June 15, 2018, 4:49 p.m. UTC | #4
Hi gengdongjiu,

On 01/06/18 08:21, gengdongjiu wrote:
> On 2018/6/1 0:51, James Morse wrote:
>> On 31/05/18 12:01, Mark Rutland wrote:
>>> In do_serror() we already handle nmi_{enter,exit}(), so there's no need
>>> for that here.
>>
>> Even better: nmi_enter() has a BUG_ON(in_nmi()).
> 
> There are two places call the arm64_is_fatal_ras_serror():
> 1. do_serror()
> 2. kvm_handle_guest_serror()
> 
> Yes, the do_serror() already handle nmi_{enter,exit}(), so the arm64_is_fatal_ras_serror() does not need to handle it again.

> For the kvm_handle_guest_serror(), it does not handle the nmi_{enter,exit}(),
> so should we add nmi_{enter,exit}() in  kvm_handle_guest_serror() before calling
> arm64_is_fatal_ras_serror()?

kvm_handle_guest_serror() never interrupts anything, so its not an NMI, this is
why the KVM code doesn't do this today. We can take another SError from there
without issue.

But once we call APEI, it needs to prevent re-entrance, so SError has to be
masked. If we're using the estatus-queue (which we should), then we should
always call the notification helper in the same context. If do_serror() is
in_nmi(), then we should fake it up in the appropriate helper.

Given the latest review comments from Borislav, the in_nmi() weirdness in APEI
may all disappear, in which case all that matters is we call APEI with SError
masked.


> > For the NOTIFY_SEA, I do not know why handle_guest_sea() does not handle
> the nmi_{enter,exit}(). James, does we miss it in handle_guest_sea()?

The exception went to the hypervisor, it only interrupted the guest. KVM returns
to process context, and KVM then goes on to call handle_guest_sea().

nmi_enter() is to tell things like printk() that we may have interrupted its
IRQ-masked regions, and it can't take any locks to do the work. We know this is
never true for KVM at this point.


>>> TBH, I don't understand why do_sea() does that conditionally today.
>>> Unless there's some constraint I'm missing, 
>>
>> APEI uses a different fixmap entry and locks when in_nmi(). This was because we
>> may interrupt the irq-masked region in APEI that was using the regular memory.
>> (e.g. the 'polled' notification, or something backed by an interrupt.) But,
>> Borislav has spotted other things in here that are broken[0]. I'm working on
>> rolling all that into 'v5' of the in_nmi() rework stuff.
>>
>> We currently get away with this on arm because 'SEA' is the only NMI-like thing,
>> and it occurs synchronously. The problem cases are all also cases where the
>> kernel text is corrupt, which we can't possibly hope to handle.
>>
>> For NOTIFY_SDEI and NOTIFY_SEI this is the wrong pattern as these are
>> asynchronous. do_serror() has already done most of the work for NOTIFY_SEI, but
>> we need to use the estatus queue in APEI, which is currently x86 only.

> I think this patch can based on you 'v5' of the in_nmi() rework stuff

I think NOTIFY_SEI needs the estatus-queue, so yes. But the KVM changes mean
this will need doing as a separate series.


>>> I think it would make more
>>> sense to do that regardless of whether the interrupted context had
>>> interrupts enabled. James -- does that make sense to you?
>>>
>>> If you update the prior patch with a stub for !CONFIG_ACPI_APEI_SEI, you
>>> can simplify all of the above additions down to:
>>>
>>> 	if (!ghes_notify_sei())
>>> 		return;
>>>
>>> ... which would look a lot nicer.
>>
>> The code that calls ghes_notify_sei() may need calling by KVM too, but its
>> default action to an 'unclaimed' SError will be different.
>> Because of the race between memory_failure() and return-to-userspace, we may
>> need to kick the irq work queue (if we can), as we return from do_serror(). [1]
>> and [2] provide an example for NOTIFY_SEA. SDEI does this by returning to the
>> kernel through the IRQ handler, (which handles the KVM case too).

> I can kick the IRQ work queue as you do for the NOTIFY_SEA and NOTIFY_SDEI.

(NOTIFY_SDEI has its own tricks: it returns to the kernel through the IRQ VBAR
entry)

I think this needs something like the apei_claim_sea() in that series, probably
with some __ version called from do_serror() to avoid the BUG_ON(in_nmi()).


>> I think this series is unsafe until we can use the estatus queue in APEI. Its
>> also missing the handling for an SError interrupting a KVM guest.
> 
> how about this series is based on your patches that uses the estatus queue in APEI to make it safe?
> 
> when an SError interrupting a KVM guest, it will trap to hypervisor, hypervisor will call
> below software stack to handle it:
> kvm_handle_guest_serror()->arm64_is_fatal_ras_serror()->ghes_notify_sei()
> 
> so it already handles the case that an SError interrupting a KVM guest.

It already calls it at some point... this was called out in the cover-letter
(and maybe commit messages) of the series that added that code: We will need to
do more once this is used as a notification from firmware.

That series was about unmasking SError as much as possible, and enabling IESB.
We then had to handle any case where we may now take an SError, but without
kernel-first support (which requires some topology information) all we could do
is ignore corrected/restartable errors, and panic() otherwise. This isn't
actually trying to handle the error.

KVM calls this code quite late. Now that we are trying to handle the error, the
'when' matters. SError is unmasked when we return to the run-loop, we may take
another, less severe, SError before checking the value we had from guest-exit.
We unmask interrupts, we may take an interrupt and run some other code, while
holding an SError ESR that says the system can't be trusted anymore.

(incidentally ... firmware should avoid making two NOTIFY_SEI pending for the
system at the same time. The SError doesn't tell us which GHES or set of CPER
records correspond with this error, so we have to walk the list and process
whatever we find. Two CPUs doing this at the same time will do the wrong thing
if the CPU-context on one of them matters.).


Thanks,

James
diff mbox

Patch

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 8bbdc17..676e40c 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -50,6 +50,7 @@ 
 #include <asm/exception.h>
 #include <asm/system_misc.h>
 #include <asm/sysreg.h>
+#include <acpi/ghes.h>
 
 static const char *handler[]= {
 	"Synchronous Abort",
@@ -701,6 +702,20 @@  void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr)
 bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)
 {
 	u32 aet = arm64_ras_serror_get_severity(esr);
+	int ret = -ENOENT;
+
+	if (IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+		if (interrupts_enabled(regs))
+			nmi_enter();
+
+		ret = ghes_notify_sei();
+
+		if (interrupts_enabled(regs))
+			nmi_exit();
+
+		if (!ret)
+			return false;
+	}
 
 	switch (aet) {
 	case ESR_ELx_AET_CE:	/* corrected error */