diff mbox

[v4,20/21] KVM: arm64: Take any host SError before entering the guest

Message ID 20171019145807.23251-21-james.morse@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

James Morse Oct. 19, 2017, 2:58 p.m. UTC
On VHE systems KVM masks SError before switching the VBAR value. Any
host RAS error that the CPU knew about before world-switch may become
pending as an SError during world-switch, and only be taken once we enter
the guest.

Until KVM can take RAS SErrors during world switch, add an ESB to
force any RAS errors to be synchronised and taken on the host before
we enter world switch.

RAS errors that become pending during world switch are still taken
once we enter the guest.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Christoffer Dall Oct. 31, 2017, 6:23 a.m. UTC | #1
Hi James,

On Thu, Oct 19, 2017 at 03:58:06PM +0100, James Morse wrote:
> On VHE systems KVM masks SError before switching the VBAR value. Any
> host RAS error that the CPU knew about before world-switch may become
> pending as an SError during world-switch, and only be taken once we enter
> the guest.
> 
> Until KVM can take RAS SErrors during world switch, add an ESB to
> force any RAS errors to be synchronised and taken on the host before
> we enter world switch.
> 
> RAS errors that become pending during world switch are still taken
> once we enter the guest.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index cf5d78ba14b5..5dc6f2877762 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -392,6 +392,7 @@ static inline void __cpu_init_stage2(void)
>  
>  static inline void kvm_arm_vhe_guest_enter(void)
>  {
> +	esb();

I don't fully appreciate what the point of this is?

As I understand it, our fundamental goal here is to try to distinguish
between errors happening on the host or in the guest.

If that's correct, then why don't we do it at the last possible moment
when we still have a scratch register left, in the world switch code
itself, and in the case abort the guest entry and report back a "host
SError" return code.

If the answer to that question is, that since we will always have some
instruction window before entering the guest and things will never be
precise anyway, so we do it here where it's more convenient, then my
counter-question would be why we do it at all then?  If we're not
precise anyway, then why not simply take our chances and hope that the
hardware delivers the SError before we mask them, and if not, tough
luck?

>  	local_daif_mask();
>  }
>  
> -- 
> 2.13.3
> 

Thanks,
-Christoffer
James Morse Oct. 31, 2017, 11:43 a.m. UTC | #2
Hi Christoffer,

On 31/10/17 06:23, Christoffer Dall wrote:
> On Thu, Oct 19, 2017 at 03:58:06PM +0100, James Morse wrote:
>> On VHE systems KVM masks SError before switching the VBAR value. Any
>> host RAS error that the CPU knew about before world-switch may become
>> pending as an SError during world-switch, and only be taken once we enter
>> the guest.
>>
>> Until KVM can take RAS SErrors during world switch, add an ESB to
>> force any RAS errors to be synchronised and taken on the host before
>> we enter world switch.
>>
>> RAS errors that become pending during world switch are still taken
>> once we enter the guest.

>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index cf5d78ba14b5..5dc6f2877762 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -392,6 +392,7 @@ static inline void __cpu_init_stage2(void)
>>  
>>  static inline void kvm_arm_vhe_guest_enter(void)
>>  {
>> +	esb();

> I don't fully appreciate what the point of this is?
> 
> As I understand it, our fundamental goal here is to try to distinguish
> between errors happening on the host or in the guest.

Not just host/guest, but also those we can and can't handle.

KVM can't currently take an SError during world switch, so a RAS error that the
CPU was hoping to defer may spread from the host into KVM's
no-SError:world-switch code. If this happens it will (almost certainly) have to
be re-classified as uncontainable.

There is also a firmware-first angle here: NOTIFY_SEI can't be delivered if the
normal world has SError masked, so any error that spreads past this point
becomes a reboot-by-firmware instead of an OS notification and almost-helpful
error message.


> If that's correct, then why don't we do it at the last possible moment
> when we still have a scratch register left, in the world switch code
> itself, and in the case abort the guest entry and report back a "host
> SError" return code.

We have IESB to run the error-barrier as we enter the guest. This would make any
host error pending as an SError, and we would exit the guest immediately. But if
there was an RAS error during world switch, by this point its likely to be
classified as uncontainable.

This esb() is trying to keep this window of code as small as possible, to just
errors that occur during world switch.

With your vcpu load/save this window becomes a lot smaller, it may be possible
to get a VHE-host's arch-code SError handler to take errors from EL2, in which
case this barrier can disappear.
(note to self: guest may still own the debug hardware)


Thanks,

James


> If the answer to that question is, that since we will always have some
> instruction window before entering the guest and things will never be
> precise anyway, so we do it here where it's more convenient, then my
> counter-question would be why we do it at all then?  If we're not
> precise anyway, then why not simply take our chances and hope that the
> hardware delivers the SError before we mask them, and if not, tough
> luck?
Christoffer Dall Nov. 1, 2017, 4:55 a.m. UTC | #3
On Tue, Oct 31, 2017 at 11:43:42AM +0000, James Morse wrote:
> Hi Christoffer,
> 
> On 31/10/17 06:23, Christoffer Dall wrote:
> > On Thu, Oct 19, 2017 at 03:58:06PM +0100, James Morse wrote:
> >> On VHE systems KVM masks SError before switching the VBAR value. Any
> >> host RAS error that the CPU knew about before world-switch may become
> >> pending as an SError during world-switch, and only be taken once we enter
> >> the guest.
> >>
> >> Until KVM can take RAS SErrors during world switch, add an ESB to
> >> force any RAS errors to be synchronised and taken on the host before
> >> we enter world switch.
> >>
> >> RAS errors that become pending during world switch are still taken
> >> once we enter the guest.
> 
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index cf5d78ba14b5..5dc6f2877762 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -392,6 +392,7 @@ static inline void __cpu_init_stage2(void)
> >>  
> >>  static inline void kvm_arm_vhe_guest_enter(void)
> >>  {
> >> +	esb();
> 
> > I don't fully appreciate what the point of this is?
> > 
> > As I understand it, our fundamental goal here is to try to distinguish
> > between errors happening on the host or in the guest.
> 
> Not just host/guest, but also those we can and can't handle.
> 
> KVM can't currently take an SError during world switch, so a RAS error that the
> CPU was hoping to defer may spread from the host into KVM's
> no-SError:world-switch code. If this happens it will (almost certainly) have to
> be re-classified as uncontainable.
> 
> There is also a firmware-first angle here: NOTIFY_SEI can't be delivered if the
> normal world has SError masked, so any error that spreads past this point
> becomes a reboot-by-firmware instead of an OS notification and almost-helpful
> error message.
> 
> 
> > If that's correct, then why don't we do it at the last possible moment
> > when we still have a scratch register left, in the world switch code
> > itself, and in the case abort the guest entry and report back a "host
> > SError" return code.
> 
> We have IESB to run the error-barrier as we enter the guest. This would make any
> host error pending as an SError, and we would exit the guest immediately. But if
> there was an RAS error during world switch, by this point its likely to be
> classified as uncontainable.
> 
> This esb() is trying to keep this window of code as small as possible, to just
> errors that occur during world switch.
> 
> With your vcpu load/save this window becomes a lot smaller, it may be possible
> to get a VHE-host's arch-code SError handler to take errors from EL2, in which
> case this barrier can disappear.
> (note to self: guest may still own the debug hardware)
> 

ok, thanks for your detailed explanation.  I didn't consider that the
classification of a RAS error as containable vs. non-containable
depended on where we take the exception.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
James Morse Nov. 2, 2017, 12:18 p.m. UTC | #4
Hi Christoffer,

On 01/11/17 04:55, Christoffer Dall wrote:
> On Tue, Oct 31, 2017 at 11:43:42AM +0000, James Morse wrote:
>> On 31/10/17 06:23, Christoffer Dall wrote:
>>> On Thu, Oct 19, 2017 at 03:58:06PM +0100, James Morse wrote:
>>>> On VHE systems KVM masks SError before switching the VBAR value. Any
>>>> host RAS error that the CPU knew about before world-switch may become
>>>> pending as an SError during world-switch, and only be taken once we enter
>>>> the guest.
>>>>
>>>> Until KVM can take RAS SErrors during world switch, add an ESB to
>>>> force any RAS errors to be synchronised and taken on the host before
>>>> we enter world switch.
>>>>
>>>> RAS errors that become pending during world switch are still taken
>>>> once we enter the guest.
>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index cf5d78ba14b5..5dc6f2877762 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -392,6 +392,7 @@ static inline void __cpu_init_stage2(void)
>>>>  
>>>>  static inline void kvm_arm_vhe_guest_enter(void)
>>>>  {
>>>> +	esb();
>>
>>> I don't fully appreciate what the point of this is?
>>>
>>> As I understand it, our fundamental goal here is to try to distinguish
>>> between errors happening on the host or in the guest.
>>
>> Not just host/guest, but also those we can and can't handle.
>>
>> KVM can't currently take an SError during world switch, so a RAS error that the
>> CPU was hoping to defer may spread from the host into KVM's
>> no-SError:world-switch code. If this happens it will (almost certainly) have to
>> be re-classified as uncontainable.
>>
>> There is also a firmware-first angle here: NOTIFY_SEI can't be delivered if the
>> normal world has SError masked, so any error that spreads past this point
>> becomes a reboot-by-firmware instead of an OS notification and almost-helpful
>> error message.
>>
>>
>>> If that's correct, then why don't we do it at the last possible moment
>>> when we still have a scratch register left, in the world switch code
>>> itself, and in the case abort the guest entry and report back a "host
>>> SError" return code.
>>
>> We have IESB to run the error-barrier as we enter the guest. This would make any
>> host error pending as an SError, and we would exit the guest immediately. But if
>> there was an RAS error during world switch, by this point its likely to be
>> classified as uncontainable.
>>
>> This esb() is trying to keep this window of code as small as possible, to just
>> errors that occur during world switch.
>>
>> With your vcpu load/save this window becomes a lot smaller, it may be possible
>> to get a VHE-host's arch-code SError handler to take errors from EL2, in which
>> case this barrier can disappear.
>> (note to self: guest may still own the debug hardware)
>>
> 
> ok, thanks for your detailed explanation.  I didn't consider that the
> classification of a RAS error as containable vs. non-containable
> depended on where we take the exception.

Will makes the point over on patch 11 that until we have different handling for
these different classifications of error, there isn't much point doing this now.
(i.e. we treat an error generated here, or when we enter the guest in the same way).

I was trying to keep my eye on what we need for kernel-first support, so we
don't have to change the code twice, we just expand the error handling to do better.

I'll drop this patch for now, it will come back if/when we get kernel-first
support for RAS.


What about firmware-first? Firmware can always take these errors when the normal
world is running. Dropping the barrier means its up to the CPU when any error
gets reported, if firmware has to use NOTIFY_SEI it will have to do a reboot if
the error occurs during world-switch (as SError is masked). If an error spreads
over this boundary, that's just tough-luck, the kernel would have panic'd anyway.


Sorry for the noise,


Thanks,

James
Christoffer Dall Nov. 3, 2017, 12:49 p.m. UTC | #5
On Thu, Nov 02, 2017 at 12:18:20PM +0000, James Morse wrote:
> Hi Christoffer,
> 
> On 01/11/17 04:55, Christoffer Dall wrote:
> > On Tue, Oct 31, 2017 at 11:43:42AM +0000, James Morse wrote:
> >> On 31/10/17 06:23, Christoffer Dall wrote:
> >>> On Thu, Oct 19, 2017 at 03:58:06PM +0100, James Morse wrote:
> >>>> On VHE systems KVM masks SError before switching the VBAR value. Any
> >>>> host RAS error that the CPU knew about before world-switch may become
> >>>> pending as an SError during world-switch, and only be taken once we enter
> >>>> the guest.
> >>>>
> >>>> Until KVM can take RAS SErrors during world switch, add an ESB to
> >>>> force any RAS errors to be synchronised and taken on the host before
> >>>> we enter world switch.
> >>>>
> >>>> RAS errors that become pending during world switch are still taken
> >>>> once we enter the guest.
> >>
> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>> index cf5d78ba14b5..5dc6f2877762 100644
> >>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>> @@ -392,6 +392,7 @@ static inline void __cpu_init_stage2(void)
> >>>>  
> >>>>  static inline void kvm_arm_vhe_guest_enter(void)
> >>>>  {
> >>>> +	esb();
> >>
> >>> I don't fully appreciate what the point of this is?
> >>>
> >>> As I understand it, our fundamental goal here is to try to distinguish
> >>> between errors happening on the host or in the guest.
> >>
> >> Not just host/guest, but also those we can and can't handle.
> >>
> >> KVM can't currently take an SError during world switch, so a RAS error that the
> >> CPU was hoping to defer may spread from the host into KVM's
> >> no-SError:world-switch code. If this happens it will (almost certainly) have to
> >> be re-classified as uncontainable.
> >>
> >> There is also a firmware-first angle here: NOTIFY_SEI can't be delivered if the
> >> normal world has SError masked, so any error that spreads past this point
> >> becomes a reboot-by-firmware instead of an OS notification and almost-helpful
> >> error message.
> >>
> >>
> >>> If that's correct, then why don't we do it at the last possible moment
> >>> when we still have a scratch register left, in the world switch code
> >>> itself, and in the case abort the guest entry and report back a "host
> >>> SError" return code.
> >>
> >> We have IESB to run the error-barrier as we enter the guest. This would make any
> >> host error pending as an SError, and we would exit the guest immediately. But if
> >> there was an RAS error during world switch, by this point its likely to be
> >> classified as uncontainable.
> >>
> >> This esb() is trying to keep this window of code as small as possible, to just
> >> errors that occur during world switch.
> >>
> >> With your vcpu load/save this window becomes a lot smaller, it may be possible
> >> to get a VHE-host's arch-code SError handler to take errors from EL2, in which
> >> case this barrier can disappear.
> >> (note to self: guest may still own the debug hardware)
> >>
> > 
> > ok, thanks for your detailed explanation.  I didn't consider that the
> > classification of a RAS error as containable vs. non-containable
> > depended on where we take the exception.
> 
> Will makes the point over on patch 11 that until we have different handling for
> these different classifications of error, there isn't much point doing this now.
> (i.e. we treat an error generated here, or when we enter the guest in the same way).
> 
> I was trying to keep my eye on what we need for kernel-first support, so we
> don't have to change the code twice, we just expand the error handling to do better.

I figured as much...

> 
> I'll drop this patch for now, it will come back if/when we get kernel-first
> support for RAS.

Either way is fine from my point of view.
> 
> 
> What about firmware-first? Firmware can always take these errors when the normal
> world is running. Dropping the barrier means its up to the CPU when any error
> gets reported, if firmware has to use NOTIFY_SEI it will have to do a reboot if
> the error occurs during world-switch (as SError is masked). If an error spreads
> over this boundary, that's just tough-luck, the kernel would have panic'd anyway.
> 
> 

Does a non-secure esb() cause the error to be delivered to firmware on
the secure side if anything is pending?

I'm not sure I fully understand the interaction between issuing an
esb() in non-secure and firmware handling a RAS error; I thought there
would be none, and that this was only for kernel-first ?

Thanks,
-Christoffer
James Morse Nov. 3, 2017, 4:14 p.m. UTC | #6
Hi Christoffer,

On 03/11/17 12:49, Christoffer Dall wrote:
> Does a non-secure esb() cause the error to be delivered to firmware on
> the secure side if anything is pending?

Yes, the ESB-instruction causes 'synchronisable' errors to become pending as an
SError. They then follow the normal SError rules.


> I'm not sure I fully understand the interaction between issuing an
> esb() in non-secure and firmware handling a RAS error; I thought there
> would be none, and that this was only for kernel-first ?

To implement firmware-first, EL3 has to set SCR_EL3.EA to route external-aborts
and physical SError to EL3. So an ESB-instruction, even in a guest, may cause an
SError to be taken to EL3 for firmware-first handling.

This is why RAS is causing so much noise for KVM, any notification could also
interrupt a guest, so firmware has to emulate an exception taken to EL2
correctly and KVM will have to plumb the notification across to the host APEI code.


Thanks,

James
Christoffer Dall Nov. 6, 2017, 12:45 p.m. UTC | #7
On Fri, Nov 03, 2017 at 04:14:04PM +0000, James Morse wrote:
> Hi Christoffer,
> 
> On 03/11/17 12:49, Christoffer Dall wrote:
> > Does a non-secure esb() cause the error to be delivered to firmware on
> > the secure side if anything is pending?
> 
> Yes, the ESB-instruction causes 'synchronisable' errors to become pending as an
> SError. They then follow the normal SError rules.
> 
> 
> > I'm not sure I fully understand the interaction between issuing an
> > esb() in non-secure and firmware handling a RAS error; I thought there
> > would be none, and that this was only for kernel-first ?
> 
> To implement firmware-first, EL3 has to set SCR_EL3.EA to route external-aborts
> and physical SError to EL3. So an ESB-instruction, even in a guest, may cause an
> SError to be taken to EL3 for firmware-first handling.
> 
> This is why RAS is causing so much noise for KVM, any notification could also
> interrupt a guest, so firmware has to emulate an exception taken to EL2
> correctly and KVM will have to plumb the notification across to the host APEI code.
> 
> 

I see, but since the end result is that either we panic or firmware
reboots the machine, we don't care and can leave this patch out for now.
Makes sense.

Thanks for the explanation!
-Christoffer
diff mbox

Patch

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index cf5d78ba14b5..5dc6f2877762 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -392,6 +392,7 @@  static inline void __cpu_init_stage2(void)
 
 static inline void kvm_arm_vhe_guest_enter(void)
 {
+	esb();
 	local_daif_mask();
 }