Message ID | 20200211135256.24617-1-joro@8bytes.org (mailing list archive) |
---|---|
Headers | show |
Series | Linux as SEV-ES Guest Support | expand |
On Tue, Feb 11, 2020 at 02:51:54PM +0100, Joerg Roedel wrote: > NMI Special Handling > -------------------- > > The last thing that needs special handling with SEV-ES are NMIs. > Hypervisors usually start to intercept IRET instructions when an NMI got > injected to find out when the NMI window is re-opened. But handling IRET > intercepts requires the hypervisor to access guest register state and is > not possible with SEV-ES. The specification under [1] solves this > problem with an NMI_COMPLETE message sent my the guest to the > hypervisor, upon which the hypervisor re-opens the NMI window for the > guest. > > This patch-set sends the NMI_COMPLETE message before the actual IRET, > while the kernel is still on a valid stack and kernel cr3. This opens > the NMI-window a few instructions early, but this is fine as under > x86-64 Linux NMI-nesting is safe. The alternative would be to > single-step over the IRET, but that requires more intrusive changes to > the entry code because it does not handle entries from kernel-mode while > on the entry stack. > > Besides the special handling above the patch-set contains the handlers > for the #VC exception and all the exit-codes specified in [1]. Oh gawd; so instead of improving the whole NMI situation, AMD went and made it worse still ?!?
On Tue, Feb 11, 2020 at 03:50:08PM +0100, Peter Zijlstra wrote: > Oh gawd; so instead of improving the whole NMI situation, AMD went and > made it worse still ?!? Well, depends on how you want to see it. Under SEV-ES an IRET will not re-open the NMI window, but the guest has to tell the hypervisor explicitly when it is ready to receive new NMIs via the NMI_COMPLETE message. NMIs stay blocked even when an exception happens in the handler, so this could also be seen as a (slight) improvement. Regards, Joerg
On Tue, Feb 11, 2020 at 7:43 AM Joerg Roedel <joro@8bytes.org> wrote: > > On Tue, Feb 11, 2020 at 03:50:08PM +0100, Peter Zijlstra wrote: > > > Oh gawd; so instead of improving the whole NMI situation, AMD went and > > made it worse still ?!? > > Well, depends on how you want to see it. Under SEV-ES an IRET will not > re-open the NMI window, but the guest has to tell the hypervisor > explicitly when it is ready to receive new NMIs via the NMI_COMPLETE > message. NMIs stay blocked even when an exception happens in the > handler, so this could also be seen as a (slight) improvement. > I don't get it. VT-x has a VMCS bit "Interruptibility state"."Blocking by NMI" that tracks the NMI masking state. Would it have killed AMD to solve the problem they same way to retain architectural behavior inside a SEV-ES VM? --Andy
> On Feb 11, 2020, at 5:53 AM, Joerg Roedel <joro@8bytes.org> wrote: > > > * Putting some NMI-load on the guest will make it crash usually > within a minute Suppose you do CPUID or some MMIO and get #VC. You fill in the GHCB to ask for help. Some time between when you start filling it out and when you do VMGEXIT, you get NMI. If the NMI does its own GHCB access [0], it will clobber the outer #VC’a state, resulting in a failure when VMGEXIT happens. There’s a related failure mode if the NMI is after the VMGEXIT but before the result is read. I suspect you can fix this by saving the GHCB at the beginning of do_nmi and restoring it at the end. This has the major caveat that it will not work if do_nmi comes from user mode and schedules, but I don’t believe this can happen. [0] Due to the NMI_COMPLETE catastrophe, there is a 100% chance that this happens.
On Tue, Feb 11, 2020 at 02:12:04PM -0800, Andy Lutomirski wrote: > On Tue, Feb 11, 2020 at 7:43 AM Joerg Roedel <joro@8bytes.org> wrote: > > > > On Tue, Feb 11, 2020 at 03:50:08PM +0100, Peter Zijlstra wrote: > > > > > Oh gawd; so instead of improving the whole NMI situation, AMD went and > > > made it worse still ?!? > > > > Well, depends on how you want to see it. Under SEV-ES an IRET will not > > re-open the NMI window, but the guest has to tell the hypervisor > > explicitly when it is ready to receive new NMIs via the NMI_COMPLETE > > message. NMIs stay blocked even when an exception happens in the > > handler, so this could also be seen as a (slight) improvement. > > > > I don't get it. VT-x has a VMCS bit "Interruptibility > state"."Blocking by NMI" that tracks the NMI masking state. Would it > have killed AMD to solve the problem they same way to retain > architectural behavior inside a SEV-ES VM? No, but it wouldn't solve the problem. Inside an NMI handler there could be #VC exceptions, which do an IRET on their own. Hardware NMI state tracking would re-enable NMIs when the #VC exception returns to the NMI handler, which is not what every OS is comfortable with. Yes, there are many ways to hack around this. The GHCB spec mentions the single-stepping-over-IRET idea, which I also prototyped in a previous version of this patch-set. I gave up on it when I discovered that NMIs that happen when executing in kernel-mode but on entry stack will cause the #VC handler to call into C code while on entry stack, because neither paranoid_entry nor error_entry handle the from-kernel-with-entry-strack case. This could of course also be fixed, but further complicates things already complicated enough by the PTI changes and nested-NMI support. My patch for using the NMI_COMPLETE message is certainly not perfect and needs changes, but having the message specified in the protocol gives the guest the best flexibility in deciding when it is ready to receive new NMIs, imho. Regards, Joerg
On Tue, Feb 11, 2020 at 07:48:12PM -0800, Andy Lutomirski wrote: > > > > On Feb 11, 2020, at 5:53 AM, Joerg Roedel <joro@8bytes.org> wrote: > > > > > > > * Putting some NMI-load on the guest will make it crash usually > > within a minute > > Suppose you do CPUID or some MMIO and get #VC. You fill in the GHCB to > ask for help. Some time between when you start filling it out and when > you do VMGEXIT, you get NMI. If the NMI does its own GHCB access [0], > it will clobber the outer #VC’a state, resulting in a failure when > VMGEXIT happens. There’s a related failure mode if the NMI is after > the VMGEXIT but before the result is read. > > I suspect you can fix this by saving the GHCB at the beginning of > do_nmi and restoring it at the end. This has the major caveat that it > will not work if do_nmi comes from user mode and schedules, but I > don’t believe this can happen. > > [0] Due to the NMI_COMPLETE catastrophe, there is a 100% chance that > this happens. Very true, thank you! You probably saved me a few hours of debugging this further :) I will implement better handling for nested #VC exceptions, which hopefully solves the NMI crashes. Thanks again, Joerg