diff mbox series

[2/2] KVM: x86: Add kvm_x86_ops callback to allow VMX to stash away CR4.VMXE

Message ID 20190327192946.19128-3-sean.j.christopherson@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] KVM: x86: Rename pre_{enter,leave}_smm() ops to reference SMM state save | expand

Commit Message

Sean Christopherson March 27, 2019, 7:29 p.m. UTC
Intel CPUs manually save and load CR4.VMXE internally instead of having
it automatically switched via the SMM State Save.  Specifically, the
SDM uses the following pseudocode to describe SMI and RSM:

SMI:
  enter SMM;
  save the following internal to the processor:
    CR4.VMXE
    an indication of whether the logical processor was in VMX operation (root or non-root)
  IF the logical processor is in VMX operation THEN
    save current VMCS pointer internal to the processor;
    leave VMX operation;
    save VMX-critical state defined below
  FI;
  CR4.VMXE <- 0;
  perform ordinary SMI delivery:
  save processor state in SMRAM;
  set processor state to standard SMM values;

RSM:
  IF VMXE = 1 in CR4 image in SMRAM THEN
    fail and enter shutdown state;
  ELSE
    restore state normally from SMRAM;
    <unrelated stuff omitted for brevity>
  FI;
  CR4.VMXE <- value stored internally;
  <more stuff omitted for brevity>
  leave SMM;

Presumably, the manual handling of CR4.VMXE is done to avoid having a
to special case CR4.VMXE in the legacy state save/load flows.  As luck
would have it, KVM has a similar conundrum in its SMM state save/load
flows in that vmx_set_cr4() forbids CR4.VMXE from being set while in
SMM.  The check in vmx_set_cr4() can cause RSM to fail when loading
CR4 from the SMM save state area.

Take a page out of the SDM and manually handle CR4.VMXE during SMI and
RSM.  Alternatively, HF_SMM_MASK could be temporarily disabled when
setting CR4 as part of RSM, but doing so is rather ugly due to the need
to plumb the "from_rsm" information through to emulator_set_cr(), and
also opens up a virtualization hole, e.g. KVM would not properly handle
the (extremely unlikely) scenario where the guest toggles CR4.VMXE in
the SMM save state area from 0->1

Reported-by: Jon Doron <arilou@gmail.com>
Suggested-by: Jim Mattson <jmattson@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/include/asm/kvm_emulate.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/emulate.c             |  2 ++
 arch/x86/kvm/svm.c                 | 12 ++++++++++++
 arch/x86/kvm/vmx/vmx.c             | 22 ++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h             |  2 ++
 arch/x86/kvm/x86.c                 |  9 +++++++++
 7 files changed, 50 insertions(+)

Comments

Jim Mattson March 27, 2019, 7:50 p.m. UTC | #1
On Wed, Mar 27, 2019 at 12:31 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Intel CPUs manually save and load CR4.VMXE internally instead of having
> it automatically switched via the SMM State Save.  Specifically, the
> SDM uses the following pseudocode to describe SMI and RSM:
>
> SMI:
>   enter SMM;
>   save the following internal to the processor:
>     CR4.VMXE
>     an indication of whether the logical processor was in VMX operation (root or non-root)
>   IF the logical processor is in VMX operation THEN
>     save current VMCS pointer internal to the processor;
>     leave VMX operation;
>     save VMX-critical state defined below
>   FI;
>   CR4.VMXE <- 0;
>   perform ordinary SMI delivery:
>   save processor state in SMRAM;
>   set processor state to standard SMM values;
>
> RSM:
>   IF VMXE = 1 in CR4 image in SMRAM THEN
>     fail and enter shutdown state;
>   ELSE
>     restore state normally from SMRAM;
>     <unrelated stuff omitted for brevity>
>   FI;
>   CR4.VMXE <- value stored internally;
>   <more stuff omitted for brevity>
>   leave SMM;
>
> Presumably, the manual handling of CR4.VMXE is done to avoid having a
> to special case CR4.VMXE in the legacy state save/load flows.  As luck
> would have it, KVM has a similar conundrum in its SMM state save/load
> flows in that vmx_set_cr4() forbids CR4.VMXE from being set while in
> SMM.  The check in vmx_set_cr4() can cause RSM to fail when loading
> CR4 from the SMM save state area.
>
> Take a page out of the SDM and manually handle CR4.VMXE during SMI and
> RSM.  Alternatively, HF_SMM_MASK could be temporarily disabled when
> setting CR4 as part of RSM, but doing so is rather ugly due to the need
> to plumb the "from_rsm" information through to emulator_set_cr(), and
> also opens up a virtualization hole, e.g. KVM would not properly handle
> the (extremely unlikely) scenario where the guest toggles CR4.VMXE in
> the SMM save state area from 0->1
>
> Reported-by: Jon Doron <arilou@gmail.com>
> Suggested-by: Jim Mattson <jmattson@google.com>
> Cc: Liran Alon <liran.alon@oracle.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>

Thanks! This looks like the right way to do it, but I'm probably biased.

> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index a1e00d0a2482..a3444625ca7f 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -162,6 +162,8 @@ struct nested_vmx {
>
>         /* SMM related state */
>         struct {
> +               /* CR4.VMXE=1 on SMM entry? */
> +               bool cr4_vmxe;
>                 /* in VMX operation on SMM entry? */
>                 bool vmxon;
>                 /* in guest mode on SMM entry? */

Where does this state get saved/restored when suspending/resuming a
VCPU that's in SMM?
Sean Christopherson March 27, 2019, 8:20 p.m. UTC | #2
On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote:
> On Wed, Mar 27, 2019 at 12:31 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > Intel CPUs manually save and load CR4.VMXE internally instead of having
> > it automatically switched via the SMM State Save.  Specifically, the
> > SDM uses the following pseudocode to describe SMI and RSM:
> >
> > SMI:
> >   enter SMM;
> >   save the following internal to the processor:
> >     CR4.VMXE
> >     an indication of whether the logical processor was in VMX operation (root or non-root)
> >   IF the logical processor is in VMX operation THEN
> >     save current VMCS pointer internal to the processor;
> >     leave VMX operation;
> >     save VMX-critical state defined below
> >   FI;
> >   CR4.VMXE <- 0;
> >   perform ordinary SMI delivery:
> >   save processor state in SMRAM;
> >   set processor state to standard SMM values;
> >
> > RSM:
> >   IF VMXE = 1 in CR4 image in SMRAM THEN
> >     fail and enter shutdown state;
> >   ELSE
> >     restore state normally from SMRAM;
> >     <unrelated stuff omitted for brevity>
> >   FI;
> >   CR4.VMXE <- value stored internally;
> >   <more stuff omitted for brevity>
> >   leave SMM;
> >
> > Presumably, the manual handling of CR4.VMXE is done to avoid having a
> > to special case CR4.VMXE in the legacy state save/load flows.  As luck
> > would have it, KVM has a similar conundrum in its SMM state save/load
> > flows in that vmx_set_cr4() forbids CR4.VMXE from being set while in
> > SMM.  The check in vmx_set_cr4() can cause RSM to fail when loading
> > CR4 from the SMM save state area.
> >
> > Take a page out of the SDM and manually handle CR4.VMXE during SMI and
> > RSM.  Alternatively, HF_SMM_MASK could be temporarily disabled when
> > setting CR4 as part of RSM, but doing so is rather ugly due to the need
> > to plumb the "from_rsm" information through to emulator_set_cr(), and
> > also opens up a virtualization hole, e.g. KVM would not properly handle
> > the (extremely unlikely) scenario where the guest toggles CR4.VMXE in
> > the SMM save state area from 0->1
> >
> > Reported-by: Jon Doron <arilou@gmail.com>
> > Suggested-by: Jim Mattson <jmattson@google.com>
> > Cc: Liran Alon <liran.alon@oracle.com>
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Thanks! This looks like the right way to do it, but I'm probably biased.
> 
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > index a1e00d0a2482..a3444625ca7f 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -162,6 +162,8 @@ struct nested_vmx {
> >
> >         /* SMM related state */
> >         struct {
> > +               /* CR4.VMXE=1 on SMM entry? */
> > +               bool cr4_vmxe;
> >                 /* in VMX operation on SMM entry? */
> >                 bool vmxon;
> >                 /* in guest mode on SMM entry? */
> 
> Where does this state get saved/restored when suspending/resuming a
> VCPU that's in SMM?

cr4_vmxe is handled by vmx_pre_smi_save_state() and
vmx_post_rsm_load_state().

The existing vmxon and guest_mode are handled by post_smi_save_state()
and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm().

Ideally everything in the struct would be managed together, e.g. in
vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't
want to touch that house of cards at this time.

And even more ideally, post_smi_save_state() and pre_rsm_load_state()
would be dropped altogether, e.g. the SVM code would also be reworked to
use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge
is nowhere near sufficient to judge if that's even remotely feasible.
Jim Mattson March 27, 2019, 8:28 p.m. UTC | #3
On Wed, Mar 27, 2019 at 1:20 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote:
> > Where does this state get saved/restored when suspending/resuming a
> > VCPU that's in SMM?
>
> cr4_vmxe is handled by vmx_pre_smi_save_state() and
> vmx_post_rsm_load_state().
>
> The existing vmxon and guest_mode are handled by post_smi_save_state()
> and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm().
>
> Ideally everything in the struct would be managed together, e.g. in
> vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't
> want to touch that house of cards at this time.
>
> And even more ideally, post_smi_save_state() and pre_rsm_load_state()
> would be dropped altogether, e.g. the SVM code would also be reworked to
> use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge
> is nowhere near sufficient to judge if that's even remotely feasible.

No; I meant which of the myriad KVM_GET_* and KVM_SET_* ioctls provide
a mechanism to save and restore these tidbits of vCPU state (e.g. for
live migration)?
Sean Christopherson March 27, 2019, 8:34 p.m. UTC | #4
On Wed, Mar 27, 2019 at 01:28:09PM -0700, Jim Mattson wrote:
> On Wed, Mar 27, 2019 at 1:20 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote:
> > > Where does this state get saved/restored when suspending/resuming a
> > > VCPU that's in SMM?
> >
> > cr4_vmxe is handled by vmx_pre_smi_save_state() and
> > vmx_post_rsm_load_state().
> >
> > The existing vmxon and guest_mode are handled by post_smi_save_state()
> > and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm().
> >
> > Ideally everything in the struct would be managed together, e.g. in
> > vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't
> > want to touch that house of cards at this time.
> >
> > And even more ideally, post_smi_save_state() and pre_rsm_load_state()
> > would be dropped altogether, e.g. the SVM code would also be reworked to
> > use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge
> > is nowhere near sufficient to judge if that's even remotely feasible.
> 
> No; I meant which of the myriad KVM_GET_* and KVM_SET_* ioctls provide
> a mechanism to save and restore these tidbits of vCPU state (e.g. for
> live migration)?

Ah fudge.  KVM_{GET,SET}_NESTED_STATE, i.e. vmx_{get,set}_nested_state().
This patch would also need to add and handle KVM_STATE_NESTED_SMM_CR4_VMXE.
Jim Mattson March 27, 2019, 8:49 p.m. UTC | #5
On Wed, Mar 27, 2019 at 1:34 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Wed, Mar 27, 2019 at 01:28:09PM -0700, Jim Mattson wrote:
> > On Wed, Mar 27, 2019 at 1:20 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote:
> > > > Where does this state get saved/restored when suspending/resuming a
> > > > VCPU that's in SMM?
> > >
> > > cr4_vmxe is handled by vmx_pre_smi_save_state() and
> > > vmx_post_rsm_load_state().
> > >
> > > The existing vmxon and guest_mode are handled by post_smi_save_state()
> > > and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm().
> > >
> > > Ideally everything in the struct would be managed together, e.g. in
> > > vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't
> > > want to touch that house of cards at this time.
> > >
> > > And even more ideally, post_smi_save_state() and pre_rsm_load_state()
> > > would be dropped altogether, e.g. the SVM code would also be reworked to
> > > use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge
> > > is nowhere near sufficient to judge if that's even remotely feasible.
> >
> > No; I meant which of the myriad KVM_GET_* and KVM_SET_* ioctls provide
> > a mechanism to save and restore these tidbits of vCPU state (e.g. for
> > live migration)?
>
> Ah fudge.  KVM_{GET,SET}_NESTED_STATE, i.e. vmx_{get,set}_nested_state().
> This patch would also need to add and handle KVM_STATE_NESTED_SMM_CR4_VMXE.

Sorry. I haven't been following the evolution of those ioctls since I
first proposed them and Paolo shut me down. I see the handling of
"nested SMM state" now, and I agree that this would be the place to
handle the internal stash of CR4.VMXE. However, vmx_set_nested_state
currently allows only two bits to be set in vmx.smm.flags:
KVM_STATE_NESTED_SMM_GUEST_MODE and KVM_STATE_NESTED_SMM_VMXON. Does
userspace have to opt-in to KVM_STATE_NESTED_SMM_CR4_VMXE through a
new KVM_CAP?
Sean Christopherson March 27, 2019, 10:05 p.m. UTC | #6
On Wed, Mar 27, 2019 at 01:49:22PM -0700, Jim Mattson wrote:
> On Wed, Mar 27, 2019 at 1:34 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Wed, Mar 27, 2019 at 01:28:09PM -0700, Jim Mattson wrote:
> > > On Wed, Mar 27, 2019 at 1:20 PM Sean Christopherson
> > > <sean.j.christopherson@intel.com> wrote:
> > > >
> > > > On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote:
> > > > > Where does this state get saved/restored when suspending/resuming a
> > > > > VCPU that's in SMM?
> > > >
> > > > cr4_vmxe is handled by vmx_pre_smi_save_state() and
> > > > vmx_post_rsm_load_state().
> > > >
> > > > The existing vmxon and guest_mode are handled by post_smi_save_state()
> > > > and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm().
> > > >
> > > > Ideally everything in the struct would be managed together, e.g. in
> > > > vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't
> > > > want to touch that house of cards at this time.
> > > >
> > > > And even more ideally, post_smi_save_state() and pre_rsm_load_state()
> > > > would be dropped altogether, e.g. the SVM code would also be reworked to
> > > > use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge
> > > > is nowhere near sufficient to judge if that's even remotely feasible.
> > >
> > > No; I meant which of the myriad KVM_GET_* and KVM_SET_* ioctls provide
> > > a mechanism to save and restore these tidbits of vCPU state (e.g. for
> > > live migration)?
> >
> > Ah fudge.  KVM_{GET,SET}_NESTED_STATE, i.e. vmx_{get,set}_nested_state().
> > This patch would also need to add and handle KVM_STATE_NESTED_SMM_CR4_VMXE.
> 
> Sorry. I haven't been following the evolution of those ioctls since I
> first proposed them and Paolo shut me down. I see the handling of
> "nested SMM state" now, and I agree that this would be the place to
> handle the internal stash of CR4.VMXE. However, vmx_set_nested_state
> currently allows only two bits to be set in vmx.smm.flags:
> KVM_STATE_NESTED_SMM_GUEST_MODE and KVM_STATE_NESTED_SMM_VMXON. Does
> userspace have to opt-in to KVM_STATE_NESTED_SMM_CR4_VMXE through a
> new KVM_CAP?

Hmm.  I think we could squeak by without it on a technicality.  Nested
VMX was only enabled by default after the bug was introduced.  Any
migration from an older KVM that "officially" supports nested VMX will
crash the guest ater migration regardless of the target KVM, since KVM
will signal a #GP on RSM due to CR4.VMXE=1 in the SMM save state area.
I.e. it wouldn't break anything that wasn't already broken.

Toggling HF_SMM_MASK wouldn't suffer the same fate, i.e. new KVMs would
happily accept migration from broken KVMs.  That being said, the only
way that can happen is if migration is triggered during the first SMI
with CR4.VMXE=1, since the guest will have already crashed due to RSM
taking a #GP in the broken pre-migration KVM.  Given this is the first
bug report for the RSM w/ CR4.VMXE=1, I'm going to go out on a limb and
say that the migration corner case is extremely unlikely, to put it mildly.

So I think we're good?  Maybe add an errata in the docs to note the
extremely unlikely corner case?
Jim Mattson March 27, 2019, 10:14 p.m. UTC | #7
On Wed, Mar 27, 2019 at 3:05 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
> Hmm.  I think we could squeak by without it on a technicality.  Nested
> VMX was only enabled by default after the bug was introduced.  Any
> migration from an older KVM that "officially" supports nested VMX will
> crash the guest ater migration regardless of the target KVM, since KVM
> will signal a #GP on RSM due to CR4.VMXE=1 in the SMM save state area.
> I.e. it wouldn't break anything that wasn't already broken.
>
> Toggling HF_SMM_MASK wouldn't suffer the same fate, i.e. new KVMs would
> happily accept migration from broken KVMs.  That being said, the only
> way that can happen is if migration is triggered during the first SMI
> with CR4.VMXE=1, since the guest will have already crashed due to RSM
> taking a #GP in the broken pre-migration KVM.  Given this is the first
> bug report for the RSM w/ CR4.VMXE=1, I'm going to go out on a limb and
> say that the migration corner case is extremely unlikely, to put it mildly.
>
> So I think we're good?  Maybe add an errata in the docs to note the
> extremely unlikely corner case?

Yeah, probably. Who's using virtual SMM, anyway? :-)

I don't see any tests of virtual SMM in either kvm-unit-tests or
tools/testing/selftests/kvm, but maybe I'm not looking closely enough.
Vitaly Kuznetsov March 28, 2019, 9:42 a.m. UTC | #8
Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Intel CPUs manually save and load CR4.VMXE internally instead of having
> it automatically switched via the SMM State Save.  Specifically, the
> SDM uses the following pseudocode to describe SMI and RSM:
>
> SMI:
>   enter SMM;
>   save the following internal to the processor:
>     CR4.VMXE
>     an indication of whether the logical processor was in VMX operation (root or non-root)
>   IF the logical processor is in VMX operation THEN
>     save current VMCS pointer internal to the processor;
>     leave VMX operation;
>     save VMX-critical state defined below
>   FI;
>   CR4.VMXE <- 0;
>   perform ordinary SMI delivery:
>   save processor state in SMRAM;
>   set processor state to standard SMM values;
>
> RSM:
>   IF VMXE = 1 in CR4 image in SMRAM THEN
>     fail and enter shutdown state;
>   ELSE
>     restore state normally from SMRAM;
>     <unrelated stuff omitted for brevity>
>   FI;
>   CR4.VMXE <- value stored internally;
>   <more stuff omitted for brevity>
>   leave SMM;
>
> Presumably, the manual handling of CR4.VMXE is done to avoid having a
> to special case CR4.VMXE in the legacy state save/load flows.  As luck
> would have it, KVM has a similar conundrum in its SMM state save/load
> flows in that vmx_set_cr4() forbids CR4.VMXE from being set while in
> SMM.  The check in vmx_set_cr4() can cause RSM to fail when loading
> CR4 from the SMM save state area.
>
> Take a page out of the SDM and manually handle CR4.VMXE during SMI and
> RSM.  Alternatively, HF_SMM_MASK could be temporarily disabled when
> setting CR4 as part of RSM, but doing so is rather ugly due to the need
> to plumb the "from_rsm" information through to emulator_set_cr(), and
> also opens up a virtualization hole, e.g. KVM would not properly handle
> the (extremely unlikely) scenario where the guest toggles CR4.VMXE in
> the SMM save state area from 0->1
>
> Reported-by: Jon Doron <arilou@gmail.com>
> Suggested-by: Jim Mattson <jmattson@google.com>
> Cc: Liran Alon <liran.alon@oracle.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/include/asm/kvm_emulate.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  2 ++
>  arch/x86/kvm/emulate.c             |  2 ++
>  arch/x86/kvm/svm.c                 | 12 ++++++++++++
>  arch/x86/kvm/vmx/vmx.c             | 22 ++++++++++++++++++++++
>  arch/x86/kvm/vmx/vmx.h             |  2 ++
>  arch/x86/kvm/x86.c                 |  9 +++++++++
>  7 files changed, 50 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
> index b2a65d08d1f8..02fa7ec89315 100644
> --- a/arch/x86/include/asm/kvm_emulate.h
> +++ b/arch/x86/include/asm/kvm_emulate.h
> @@ -227,6 +227,7 @@ struct x86_emulate_ops {
>  	unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);
>  	void (*set_hflags)(struct x86_emulate_ctxt *ctxt, unsigned hflags);
>  	int (*pre_rsm_load_state)(struct x86_emulate_ctxt *ctxt, u64 smbase);
> +	void (*post_rsm_load_state)(struct x86_emulate_ctxt *ctxt);
>  
>  };
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 29ce45f41ee5..0ceb8b353dc0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1181,8 +1181,10 @@ struct kvm_x86_ops {
>  	void (*get_vmcs12_pages)(struct kvm_vcpu *vcpu);
>  
>  	int (*smi_allowed)(struct kvm_vcpu *vcpu);
> +	void (*pre_smi_save_state)(struct kvm_vcpu *vcpu);
>  	int (*post_smi_save_state)(struct kvm_vcpu *vcpu, char *smstate);
>  	int (*pre_rsm_load_state)(struct kvm_vcpu *vcpu, u64 smbase);
> +	void (*post_rsm_load_state)(struct kvm_vcpu *vcpu);
>  	int (*enable_smi_window)(struct kvm_vcpu *vcpu);
>  
>  	int (*mem_enc_op)(struct kvm *kvm, void __user *argp);
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index ca60eb3358d9..706accf524c3 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2625,6 +2625,8 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
>  		return X86EMUL_UNHANDLEABLE;
>  	}
>  
> +	ctxt->ops->post_rsm_load_state(ctxt);
> +
>  	if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
>  		ctxt->ops->set_nmi_mask(ctxt, false);
>  
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index c0e511c3b298..edc12fb39b32 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -6193,6 +6193,11 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static void svm_pre_smi_save_state(struct kvm_vcpu *vcpu)
> +{
> +
> +}
> +
>  static int svm_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -6243,6 +6248,11 @@ static int svm_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase)
>  	return ret;
>  }
>  
> +static void svm_post_rsm_load_state(struct kvm_vcpu *vcpu)
> +{
> +
> +}
> +
>  static int enable_smi_window(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -7251,8 +7261,10 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>  	.setup_mce = svm_setup_mce,
>  
>  	.smi_allowed = svm_smi_allowed,
> +	.pre_smi_save_state = svm_pre_smi_save_state,
>  	.post_smi_save_state = svm_post_smi_save_state,
>  	.pre_rsm_load_state = svm_pre_rsm_load_state,
> +	.post_rsm_load_state = svm_post_rsm_load_state,
>  	.enable_smi_window = enable_smi_window,
>  
>  	.mem_enc_op = svm_mem_enc_op,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index fc96101e39ac..617443c1c6d7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7354,6 +7354,16 @@ static int vmx_smi_allowed(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) {
> +		vmx->nested.smm.cr4_vmxe = true;
> +		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE));

This WARN_ON fires: vmx_set_cr4() has the following check:

if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
        return 1;

X86_CR4_VMXE can't be unset while nested.vmxon is on....


> +	}
> +}
> +
>  static int vmx_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -7390,6 +7400,16 @@ static int vmx_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase)
>  	return 0;
>  }
>  
> +static void vmx_post_rsm_load_state(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	if (vmx->nested.smm.cr4_vmxe) {
> +		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) | X86_CR4_VMXE));
> +		vmx->nested.smm.cr4_vmxe = false;

If we manage to pass the previous problem this will likely fail:

post_rsm_load_state() is called with HF_SMM_MASK still set.

> +	}
> +}
> +
>  static int enable_smi_window(struct kvm_vcpu *vcpu)
>  {
>  	return 0;
> @@ -7693,8 +7713,10 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>  	.setup_mce = vmx_setup_mce,
>  
>  	.smi_allowed = vmx_smi_allowed,
> +	.pre_smi_save_state = vmx_pre_smi_save_state,
>  	.post_smi_save_state = vmx_post_smi_save_state,
>  	.pre_rsm_load_state = vmx_pre_rsm_load_state,
> +	.post_rsm_load_state = vmx_post_rsm_load_state,
>  	.enable_smi_window = enable_smi_window,
>  
>  	.check_nested_events = NULL,
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index a1e00d0a2482..a3444625ca7f 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -162,6 +162,8 @@ struct nested_vmx {
>  
>  	/* SMM related state */
>  	struct {
> +		/* CR4.VMXE=1 on SMM entry? */
> +		bool cr4_vmxe;
>  		/* in VMX operation on SMM entry? */
>  		bool vmxon;
>  		/* in guest mode on SMM entry? */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index abfac99fb7c5..1b07706df840 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5966,6 +5966,11 @@ static int emulator_pre_rsm_load_state(struct x86_emulate_ctxt *ctxt, u64 smbase
>  	return kvm_x86_ops->pre_rsm_load_state(emul_to_vcpu(ctxt), smbase);
>  }
>  
> +static void emulator_post_rsm_load_state(struct x86_emulate_ctxt *ctxt)
> +{
> +	kvm_x86_ops->post_rsm_load_state(emul_to_vcpu(ctxt));
> +}
> +
>  static const struct x86_emulate_ops emulate_ops = {
>  	.read_gpr            = emulator_read_gpr,
>  	.write_gpr           = emulator_write_gpr,
> @@ -6006,6 +6011,7 @@ static const struct x86_emulate_ops emulate_ops = {
>  	.get_hflags          = emulator_get_hflags,
>  	.set_hflags          = emulator_set_hflags,
>  	.pre_rsm_load_state  = emulator_pre_rsm_load_state,
> +	.post_rsm_load_state = emulator_post_rsm_load_state,
>  };
>  
>  static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
> @@ -7516,6 +7522,9 @@ static void enter_smm(struct kvm_vcpu *vcpu)
>  	u32 cr0;
>  
>  	trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
> +
> +	kvm_x86_ops->pre_smi_save_state(vcpu);
> +
>  	memset(buf, 0, 512);
>  	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
>  		enter_smm_save_state_64(vcpu, buf);
Paolo Bonzini March 28, 2019, 1:19 p.m. UTC | #9
On 27/03/19 23:14, Jim Mattson wrote:
>> Hmm.  I think we could squeak by without it on a technicality.  Nested
>> VMX was only enabled by default after the bug was introduced.  Any
>> migration from an older KVM that "officially" supports nested VMX will
>> crash the guest ater migration regardless of the target KVM, since KVM
>> will signal a #GP on RSM due to CR4.VMXE=1 in the SMM save state area.
>> I.e. it wouldn't break anything that wasn't already broken.
>>
>> Toggling HF_SMM_MASK wouldn't suffer the same fate, i.e. new KVMs would
>> happily accept migration from broken KVMs.  That being said, the only
>> way that can happen is if migration is triggered during the first SMI
>> with CR4.VMXE=1, since the guest will have already crashed due to RSM
>> taking a #GP in the broken pre-migration KVM.  Given this is the first
>> bug report for the RSM w/ CR4.VMXE=1, I'm going to go out on a limb and
>> say that the migration corner case is extremely unlikely, to put it mildly.
>>
>> So I think we're good?  Maybe add an errata in the docs to note the
>> extremely unlikely corner case?
> Yeah, probably. Who's using virtual SMM, anyway? :-)

Anyone that is using DOS with versions of SeaBIOS that run 32-bit drives
in SMM. :)  But yeah, there is no way that this can ever have worked so
we can fix the API.

Paolo

> I don't see any tests of virtual SMM in either kvm-unit-tests or
> tools/testing/selftests/kvm, but maybe I'm not looking closely enough.
Paolo Bonzini March 28, 2019, 4:39 p.m. UTC | #10
On 27/03/19 21:34, Sean Christopherson wrote:
> Ah fudge.  KVM_{GET,SET}_NESTED_STATE, i.e. vmx_{get,set}_nested_state().
> This patch would also need to add and handle KVM_STATE_NESTED_SMM_CR4_VMXE.

Like this:

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index dabfcf7c3941..88bdff7b866c 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -388,6 +388,7 @@ struct kvm_sync_regs {
 
 #define KVM_STATE_NESTED_SMM_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_SMM_VMXON	0x00000002
+#define KVM_STATE_NESTED_SMM_CR4_VMXE	0x00000004
 
 struct kvm_vmx_nested_state {
 	__u64 vmxon_pa;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 153e539c29c9..cbd5f2ddf7f7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5245,6 +5245,9 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
 	if (nested_vmx_allowed(vcpu) && vmx->nested.enlightened_vmcs_enabled)
 		kvm_state.flags |= KVM_STATE_NESTED_EVMCS;
 
+	if (vmx->nested.smm.cr4_vmxe)
+		kvm_state.vmx.smm.flags |= KVM_STATE_NESTED_SMM_CR4_VMXE;
+
 	if (nested_vmx_allowed(vcpu) &&
 	    (vmx->nested.vmxon || vmx->nested.smm.vmxon)) {
 		kvm_state.vmx.vmxon_pa = vmx->nested.vmxon_ptr;
@@ -5365,7 +5368,8 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 		return -EINVAL;
 
 	if (kvm_state->vmx.smm.flags &
-	    ~(KVM_STATE_NESTED_SMM_GUEST_MODE | KVM_STATE_NESTED_SMM_VMXON))
+	    ~(KVM_STATE_NESTED_SMM_GUEST_MODE | KVM_STATE_NESTED_SMM_VMXON |
+	      KVM_STATE_NESTED_SMM_CR4_VMXE))
 		return -EINVAL;
 
 	/*
@@ -5380,6 +5384,10 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 	    !(kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_VMXON))
 		return -EINVAL;
 
+	if ((kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_VMXON) &&
+	    !(kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_CR4_VMXE))
+		return -EINVAL;
+
 	vmx_leave_nested(vcpu);
 	if (kvm_state->vmx.vmxon_pa == -1ull)
 		return 0;
@@ -5409,6 +5417,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 		return -EINVAL;
 	}
 
+	vmx->nested.smm.cr4_vmxe =
+		!!(kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_CR4_VMXE);
+
 	if (kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_VMXON) {
 		vmx->nested.smm.vmxon = true;
 		vmx->nested.vmxon = false;

I'll try to put together a testcase.

Paolo
Paolo Bonzini March 28, 2019, 4:58 p.m. UTC | #11
On 28/03/19 10:42, Vitaly Kuznetsov wrote:
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index fc96101e39ac..617443c1c6d7 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -7354,6 +7354,16 @@ static int vmx_smi_allowed(struct kvm_vcpu *vcpu)
>>  	return 1;
>>  }
>>  
>> +static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> +	if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) {
>> +		vmx->nested.smm.cr4_vmxe = true;
>> +		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE));
> 
> This WARN_ON fires: vmx_set_cr4() has the following check:
> 
> if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
>         return 1;
> 
> X86_CR4_VMXE can't be unset while nested.vmxon is on....

Could be fixed like this:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 47fbf91fe902..31ef4973e90a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2874,15 +2874,29 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	vmcs_writel(GUEST_CR3, guest_cr3);
 }
 
+static void __vmx_set_cr4_shadow(struct vcpu_vmx *vmx, unsigned long cr4,
+				 unsigned long hw_cr4)
+{
+	WARN_ON((hw_cr4 ^ cr4) & ~vmx->vcpu.arch.cr4_guest_owned_bits);
+	vmcs_writel(CR4_READ_SHADOW, cr4);
+}
+
+static void vmx_set_cr4_shadow(struct vcpu_vmx *vmx, unsigned long cr4)
+{
+	unsigned long hw_cr4 = vmcs_readl(GUEST_CR4);
+	__vmx_set_cr4_shadow(vmx, cr4, hw_cr4);
+}
+
 int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	unsigned long hw_cr4;
+
 	/*
 	 * Pass through host's Machine Check Enable value to hw_cr4, which
 	 * is in force while we are in guest mode.  Do not let guests control
 	 * this bit, even if host CR4.MCE == 0.
 	 */
-	unsigned long hw_cr4;
-
 	hw_cr4 = (cr4_read_shadow() & X86_CR4_MCE) | (cr4 & ~X86_CR4_MCE);
 	if (enable_unrestricted_guest)
 		hw_cr4 |= KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST;
@@ -2944,8 +2958,8 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 			hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE);
 	}
 
-	vmcs_writel(CR4_READ_SHADOW, cr4);
 	vmcs_writel(GUEST_CR4, hw_cr4);
+	__vmx_set_cr4_shadow(vmx, cr4, hw_cr4);
 	return 0;
 }
 
@@ -7361,7 +7375,13 @@ static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu)
 
 	if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) {
 		vmx->nested.smm.cr4_vmxe = true;
-		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE));
+		/*
+		 * We cannot move nested.vmxon to nested.smm.vmxon until after
+		 * the state has been saved, on the other hand we need to
+		 * clear CR4.VMXE *before* it is saved to the SMRAM state save
+		 * area.  Fudge it directly in the VMCS.
+		 */
+		vmx_set_cr4_shadow(vmx, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE);
 	}
 }
 

> 
>> +	}
>> +}
>> +
>>  static int vmx_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate)
>>  {
>>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> @@ -7390,6 +7400,16 @@ static int vmx_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase)
>>  	return 0;
>>  }
>>  
>> +static void vmx_post_rsm_load_state(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> +	if (vmx->nested.smm.cr4_vmxe) {
>> +		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) | X86_CR4_VMXE));
>> +		vmx->nested.smm.cr4_vmxe = false;
> 
> If we manage to pass the previous problem this will likely fail:
> 
> post_rsm_load_state() is called with HF_SMM_MASK still set.

And this is hopefully just a matter of

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 706accf524c3..a508d07d7483 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2625,13 +2625,14 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
 		return X86EMUL_UNHANDLEABLE;
 	}
 
-	ctxt->ops->post_rsm_load_state(ctxt);
-
 	if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
 		ctxt->ops->set_nmi_mask(ctxt, false);
 
 	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
 		~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK));
+
+	ctxt->ops->post_rsm_load_state(ctxt);
+
 	return X86EMUL_CONTINUE;
 }
 

but it's probably a good idea to also use vmx_set_cr4_shadow in
vmx_post_rsm_load_state.  At the same time, we should really try
to unify the callbacks so that at least Intel only uses pre_rsm_save
and post_rsm_load.

After I have a test case, I'll refresh Sean's patches and send them out.

Paolo

>> +	}
>> +}
>> +
>>  static int enable_smi_window(struct kvm_vcpu *vcpu)
>>  {
>>  	return 0;
>> @@ -7693,8 +7713,10 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>>  	.setup_mce = vmx_setup_mce,
>>  
>>  	.smi_allowed = vmx_smi_allowed,
>> +	.pre_smi_save_state = vmx_pre_smi_save_state,
>>  	.post_smi_save_state = vmx_post_smi_save_state,
>>  	.pre_rsm_load_state = vmx_pre_rsm_load_state,
>> +	.post_rsm_load_state = vmx_post_rsm_load_state,
>>  	.enable_smi_window = enable_smi_window,
>>  
>>  	.check_nested_events = NULL,
>> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> index a1e00d0a2482..a3444625ca7f 100644
>> --- a/arch/x86/kvm/vmx/vmx.h
>> +++ b/arch/x86/kvm/vmx/vmx.h
>> @@ -162,6 +162,8 @@ struct nested_vmx {
>>  
>>  	/* SMM related state */
>>  	struct {
>> +		/* CR4.VMXE=1 on SMM entry? */
>> +		bool cr4_vmxe;
>>  		/* in VMX operation on SMM entry? */
>>  		bool vmxon;
>>  		/* in guest mode on SMM entry? */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index abfac99fb7c5..1b07706df840 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5966,6 +5966,11 @@ static int emulator_pre_rsm_load_state(struct x86_emulate_ctxt *ctxt, u64 smbase
>>  	return kvm_x86_ops->pre_rsm_load_state(emul_to_vcpu(ctxt), smbase);
>>  }
>>  
>> +static void emulator_post_rsm_load_state(struct x86_emulate_ctxt *ctxt)
>> +{
>> +	kvm_x86_ops->post_rsm_load_state(emul_to_vcpu(ctxt));
>> +}
>> +
>>  static const struct x86_emulate_ops emulate_ops = {
>>  	.read_gpr            = emulator_read_gpr,
>>  	.write_gpr           = emulator_write_gpr,
>> @@ -6006,6 +6011,7 @@ static const struct x86_emulate_ops emulate_ops = {
>>  	.get_hflags          = emulator_get_hflags,
>>  	.set_hflags          = emulator_set_hflags,
>>  	.pre_rsm_load_state  = emulator_pre_rsm_load_state,
>> +	.post_rsm_load_state = emulator_post_rsm_load_state,
>>  };
>>  
>>  static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
>> @@ -7516,6 +7522,9 @@ static void enter_smm(struct kvm_vcpu *vcpu)
>>  	u32 cr0;
>>  
>>  	trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
>> +
>> +	kvm_x86_ops->pre_smi_save_state(vcpu);
>> +
>>  	memset(buf, 0, 512);
>>  	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
>>  		enter_smm_save_state_64(vcpu, buf);
>
Sean Christopherson March 28, 2019, 5:41 p.m. UTC | #12
On Thu, Mar 28, 2019 at 10:42:31AM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:

...

> > +static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) {
> > +		vmx->nested.smm.cr4_vmxe = true;
> > +		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE));
> 
> This WARN_ON fires: vmx_set_cr4() has the following check:
> 
> if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
>         return 1;
> 
> X86_CR4_VMXE can't be unset while nested.vmxon is on....

Blech.  Moving the handling of nested.vmxon here is somewhat doable,
but holy cow does it become a jumbled mess.  Rather than trying to
constantly juggle HF_SMM_MASK, I have a more ambitious idea:

Clear HF_SMM_MASK before loading state.

RFC incoming...

> > +	}
> > +}
> > +
> >  static int vmx_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -7390,6 +7400,16 @@ static int vmx_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase)
> >  	return 0;
> >  }
> >  
> > +static void vmx_post_rsm_load_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > +
> > +	if (vmx->nested.smm.cr4_vmxe) {
> > +		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) | X86_CR4_VMXE));
> > +		vmx->nested.smm.cr4_vmxe = false;
> 
> If we manage to pass the previous problem this will likely fail:
> 
> post_rsm_load_state() is called with HF_SMM_MASK still set.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index b2a65d08d1f8..02fa7ec89315 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -227,6 +227,7 @@  struct x86_emulate_ops {
 	unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);
 	void (*set_hflags)(struct x86_emulate_ctxt *ctxt, unsigned hflags);
 	int (*pre_rsm_load_state)(struct x86_emulate_ctxt *ctxt, u64 smbase);
+	void (*post_rsm_load_state)(struct x86_emulate_ctxt *ctxt);
 
 };
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 29ce45f41ee5..0ceb8b353dc0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1181,8 +1181,10 @@  struct kvm_x86_ops {
 	void (*get_vmcs12_pages)(struct kvm_vcpu *vcpu);
 
 	int (*smi_allowed)(struct kvm_vcpu *vcpu);
+	void (*pre_smi_save_state)(struct kvm_vcpu *vcpu);
 	int (*post_smi_save_state)(struct kvm_vcpu *vcpu, char *smstate);
 	int (*pre_rsm_load_state)(struct kvm_vcpu *vcpu, u64 smbase);
+	void (*post_rsm_load_state)(struct kvm_vcpu *vcpu);
 	int (*enable_smi_window)(struct kvm_vcpu *vcpu);
 
 	int (*mem_enc_op)(struct kvm *kvm, void __user *argp);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ca60eb3358d9..706accf524c3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2625,6 +2625,8 @@  static int em_rsm(struct x86_emulate_ctxt *ctxt)
 		return X86EMUL_UNHANDLEABLE;
 	}
 
+	ctxt->ops->post_rsm_load_state(ctxt);
+
 	if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
 		ctxt->ops->set_nmi_mask(ctxt, false);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c0e511c3b298..edc12fb39b32 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -6193,6 +6193,11 @@  static int svm_smi_allowed(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static void svm_pre_smi_save_state(struct kvm_vcpu *vcpu)
+{
+
+}
+
 static int svm_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -6243,6 +6248,11 @@  static int svm_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase)
 	return ret;
 }
 
+static void svm_post_rsm_load_state(struct kvm_vcpu *vcpu)
+{
+
+}
+
 static int enable_smi_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -7251,8 +7261,10 @@  static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.setup_mce = svm_setup_mce,
 
 	.smi_allowed = svm_smi_allowed,
+	.pre_smi_save_state = svm_pre_smi_save_state,
 	.post_smi_save_state = svm_post_smi_save_state,
 	.pre_rsm_load_state = svm_pre_rsm_load_state,
+	.post_rsm_load_state = svm_post_rsm_load_state,
 	.enable_smi_window = enable_smi_window,
 
 	.mem_enc_op = svm_mem_enc_op,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fc96101e39ac..617443c1c6d7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7354,6 +7354,16 @@  static int vmx_smi_allowed(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) {
+		vmx->nested.smm.cr4_vmxe = true;
+		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE));
+	}
+}
+
 static int vmx_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7390,6 +7400,16 @@  static int vmx_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase)
 	return 0;
 }
 
+static void vmx_post_rsm_load_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (vmx->nested.smm.cr4_vmxe) {
+		WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) | X86_CR4_VMXE));
+		vmx->nested.smm.cr4_vmxe = false;
+	}
+}
+
 static int enable_smi_window(struct kvm_vcpu *vcpu)
 {
 	return 0;
@@ -7693,8 +7713,10 @@  static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.setup_mce = vmx_setup_mce,
 
 	.smi_allowed = vmx_smi_allowed,
+	.pre_smi_save_state = vmx_pre_smi_save_state,
 	.post_smi_save_state = vmx_post_smi_save_state,
 	.pre_rsm_load_state = vmx_pre_rsm_load_state,
+	.post_rsm_load_state = vmx_post_rsm_load_state,
 	.enable_smi_window = enable_smi_window,
 
 	.check_nested_events = NULL,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a1e00d0a2482..a3444625ca7f 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -162,6 +162,8 @@  struct nested_vmx {
 
 	/* SMM related state */
 	struct {
+		/* CR4.VMXE=1 on SMM entry? */
+		bool cr4_vmxe;
 		/* in VMX operation on SMM entry? */
 		bool vmxon;
 		/* in guest mode on SMM entry? */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index abfac99fb7c5..1b07706df840 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5966,6 +5966,11 @@  static int emulator_pre_rsm_load_state(struct x86_emulate_ctxt *ctxt, u64 smbase
 	return kvm_x86_ops->pre_rsm_load_state(emul_to_vcpu(ctxt), smbase);
 }
 
+static void emulator_post_rsm_load_state(struct x86_emulate_ctxt *ctxt)
+{
+	kvm_x86_ops->post_rsm_load_state(emul_to_vcpu(ctxt));
+}
+
 static const struct x86_emulate_ops emulate_ops = {
 	.read_gpr            = emulator_read_gpr,
 	.write_gpr           = emulator_write_gpr,
@@ -6006,6 +6011,7 @@  static const struct x86_emulate_ops emulate_ops = {
 	.get_hflags          = emulator_get_hflags,
 	.set_hflags          = emulator_set_hflags,
 	.pre_rsm_load_state  = emulator_pre_rsm_load_state,
+	.post_rsm_load_state = emulator_post_rsm_load_state,
 };
 
 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
@@ -7516,6 +7522,9 @@  static void enter_smm(struct kvm_vcpu *vcpu)
 	u32 cr0;
 
 	trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true);
+
+	kvm_x86_ops->pre_smi_save_state(vcpu);
+
 	memset(buf, 0, 512);
 	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
 		enter_smm_save_state_64(vcpu, buf);