Message ID | 20190327192946.19128-3-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] KVM: x86: Rename pre_{enter,leave}_smm() ops to reference SMM state save | expand |
On Wed, Mar 27, 2019 at 12:31 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > Intel CPUs manually save and load CR4.VMXE internally instead of having > it automatically switched via the SMM State Save. Specifically, the > SDM uses the following pseudocode to describe SMI and RSM: > > SMI: > enter SMM; > save the following internal to the processor: > CR4.VMXE > an indication of whether the logical processor was in VMX operation (root or non-root) > IF the logical processor is in VMX operation THEN > save current VMCS pointer internal to the processor; > leave VMX operation; > save VMX-critical state defined below > FI; > CR4.VMXE <- 0; > perform ordinary SMI delivery: > save processor state in SMRAM; > set processor state to standard SMM values; > > RSM: > IF VMXE = 1 in CR4 image in SMRAM THEN > fail and enter shutdown state; > ELSE > restore state normally from SMRAM; > <unrelated stuff omitted for brevity> > FI; > CR4.VMXE <- value stored internally; > <more stuff omitted for brevity> > leave SMM; > > Presumably, the manual handling of CR4.VMXE is done to avoid having a > to special case CR4.VMXE in the legacy state save/load flows. As luck > would have it, KVM has a similar conundrum in its SMM state save/load > flows in that vmx_set_cr4() forbids CR4.VMXE from being set while in > SMM. The check in vmx_set_cr4() can cause RSM to fail when loading > CR4 from the SMM save state area. > > Take a page out of the SDM and manually handle CR4.VMXE during SMI and > RSM. Alternatively, HF_SMM_MASK could be temporarily disabled when > setting CR4 as part of RSM, but doing so is rather ugly due to the need > to plumb the "from_rsm" information through to emulator_set_cr(), and > also opens up a virtualization hole, e.g. KVM would not properly handle > the (extremely unlikely) scenario where the guest toggles CR4.VMXE in > the SMM save state area from 0->1 > > Reported-by: Jon Doron <arilou@gmail.com> > Suggested-by: Jim Mattson <jmattson@google.com> > Cc: Liran Alon <liran.alon@oracle.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM") > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Thanks! This looks like the right way to do it, but I'm probably biased. > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > index a1e00d0a2482..a3444625ca7f 100644 > --- a/arch/x86/kvm/vmx/vmx.h > +++ b/arch/x86/kvm/vmx/vmx.h > @@ -162,6 +162,8 @@ struct nested_vmx { > > /* SMM related state */ > struct { > + /* CR4.VMXE=1 on SMM entry? */ > + bool cr4_vmxe; > /* in VMX operation on SMM entry? */ > bool vmxon; > /* in guest mode on SMM entry? */ Where does this state get saved/restored when suspending/resuming a VCPU that's in SMM?
On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote: > On Wed, Mar 27, 2019 at 12:31 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > Intel CPUs manually save and load CR4.VMXE internally instead of having > > it automatically switched via the SMM State Save. Specifically, the > > SDM uses the following pseudocode to describe SMI and RSM: > > > > SMI: > > enter SMM; > > save the following internal to the processor: > > CR4.VMXE > > an indication of whether the logical processor was in VMX operation (root or non-root) > > IF the logical processor is in VMX operation THEN > > save current VMCS pointer internal to the processor; > > leave VMX operation; > > save VMX-critical state defined below > > FI; > > CR4.VMXE <- 0; > > perform ordinary SMI delivery: > > save processor state in SMRAM; > > set processor state to standard SMM values; > > > > RSM: > > IF VMXE = 1 in CR4 image in SMRAM THEN > > fail and enter shutdown state; > > ELSE > > restore state normally from SMRAM; > > <unrelated stuff omitted for brevity> > > FI; > > CR4.VMXE <- value stored internally; > > <more stuff omitted for brevity> > > leave SMM; > > > > Presumably, the manual handling of CR4.VMXE is done to avoid having a > > to special case CR4.VMXE in the legacy state save/load flows. As luck > > would have it, KVM has a similar conundrum in its SMM state save/load > > flows in that vmx_set_cr4() forbids CR4.VMXE from being set while in > > SMM. The check in vmx_set_cr4() can cause RSM to fail when loading > > CR4 from the SMM save state area. > > > > Take a page out of the SDM and manually handle CR4.VMXE during SMI and > > RSM. Alternatively, HF_SMM_MASK could be temporarily disabled when > > setting CR4 as part of RSM, but doing so is rather ugly due to the need > > to plumb the "from_rsm" information through to emulator_set_cr(), and > > also opens up a virtualization hole, e.g. KVM would not properly handle > > the (extremely unlikely) scenario where the guest toggles CR4.VMXE in > > the SMM save state area from 0->1 > > > > Reported-by: Jon Doron <arilou@gmail.com> > > Suggested-by: Jim Mattson <jmattson@google.com> > > Cc: Liran Alon <liran.alon@oracle.com> > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM") > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > Thanks! This looks like the right way to do it, but I'm probably biased. > > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > > index a1e00d0a2482..a3444625ca7f 100644 > > --- a/arch/x86/kvm/vmx/vmx.h > > +++ b/arch/x86/kvm/vmx/vmx.h > > @@ -162,6 +162,8 @@ struct nested_vmx { > > > > /* SMM related state */ > > struct { > > + /* CR4.VMXE=1 on SMM entry? */ > > + bool cr4_vmxe; > > /* in VMX operation on SMM entry? */ > > bool vmxon; > > /* in guest mode on SMM entry? */ > > Where does this state get saved/restored when suspending/resuming a > VCPU that's in SMM? cr4_vmxe is handled by vmx_pre_smi_save_state() and vmx_post_rsm_load_state(). The existing vmxon and guest_mode are handled by post_smi_save_state() and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm(). Ideally everything in the struct would be managed together, e.g. in vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't want to touch that house of cards at this time. And even more ideally, post_smi_save_state() and pre_rsm_load_state() would be dropped altogether, e.g. the SVM code would also be reworked to use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge is nowhere near sufficient to judge if that's even remotely feasible.
On Wed, Mar 27, 2019 at 1:20 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote: > > Where does this state get saved/restored when suspending/resuming a > > VCPU that's in SMM? > > cr4_vmxe is handled by vmx_pre_smi_save_state() and > vmx_post_rsm_load_state(). > > The existing vmxon and guest_mode are handled by post_smi_save_state() > and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm(). > > Ideally everything in the struct would be managed together, e.g. in > vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't > want to touch that house of cards at this time. > > And even more ideally, post_smi_save_state() and pre_rsm_load_state() > would be dropped altogether, e.g. the SVM code would also be reworked to > use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge > is nowhere near sufficient to judge if that's even remotely feasible. No; I meant which of the myriad KVM_GET_* and KVM_SET_* ioctls provide a mechanism to save and restore these tidbits of vCPU state (e.g. for live migration)?
On Wed, Mar 27, 2019 at 01:28:09PM -0700, Jim Mattson wrote: > On Wed, Mar 27, 2019 at 1:20 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote: > > > Where does this state get saved/restored when suspending/resuming a > > > VCPU that's in SMM? > > > > cr4_vmxe is handled by vmx_pre_smi_save_state() and > > vmx_post_rsm_load_state(). > > > > The existing vmxon and guest_mode are handled by post_smi_save_state() > > and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm(). > > > > Ideally everything in the struct would be managed together, e.g. in > > vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't > > want to touch that house of cards at this time. > > > > And even more ideally, post_smi_save_state() and pre_rsm_load_state() > > would be dropped altogether, e.g. the SVM code would also be reworked to > > use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge > > is nowhere near sufficient to judge if that's even remotely feasible. > > No; I meant which of the myriad KVM_GET_* and KVM_SET_* ioctls provide > a mechanism to save and restore these tidbits of vCPU state (e.g. for > live migration)? Ah fudge. KVM_{GET,SET}_NESTED_STATE, i.e. vmx_{get,set}_nested_state(). This patch would also need to add and handle KVM_STATE_NESTED_SMM_CR4_VMXE.
On Wed, Mar 27, 2019 at 1:34 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Wed, Mar 27, 2019 at 01:28:09PM -0700, Jim Mattson wrote: > > On Wed, Mar 27, 2019 at 1:20 PM Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > > On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote: > > > > Where does this state get saved/restored when suspending/resuming a > > > > VCPU that's in SMM? > > > > > > cr4_vmxe is handled by vmx_pre_smi_save_state() and > > > vmx_post_rsm_load_state(). > > > > > > The existing vmxon and guest_mode are handled by post_smi_save_state() > > > and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm(). > > > > > > Ideally everything in the struct would be managed together, e.g. in > > > vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't > > > want to touch that house of cards at this time. > > > > > > And even more ideally, post_smi_save_state() and pre_rsm_load_state() > > > would be dropped altogether, e.g. the SVM code would also be reworked to > > > use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge > > > is nowhere near sufficient to judge if that's even remotely feasible. > > > > No; I meant which of the myriad KVM_GET_* and KVM_SET_* ioctls provide > > a mechanism to save and restore these tidbits of vCPU state (e.g. for > > live migration)? > > Ah fudge. KVM_{GET,SET}_NESTED_STATE, i.e. vmx_{get,set}_nested_state(). > This patch would also need to add and handle KVM_STATE_NESTED_SMM_CR4_VMXE. Sorry. I haven't been following the evolution of those ioctls since I first proposed them and Paolo shut me down. I see the handling of "nested SMM state" now, and I agree that this would be the place to handle the internal stash of CR4.VMXE. However, vmx_set_nested_state currently allows only two bits to be set in vmx.smm.flags: KVM_STATE_NESTED_SMM_GUEST_MODE and KVM_STATE_NESTED_SMM_VMXON. Does userspace have to opt-in to KVM_STATE_NESTED_SMM_CR4_VMXE through a new KVM_CAP?
On Wed, Mar 27, 2019 at 01:49:22PM -0700, Jim Mattson wrote: > On Wed, Mar 27, 2019 at 1:34 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Wed, Mar 27, 2019 at 01:28:09PM -0700, Jim Mattson wrote: > > > On Wed, Mar 27, 2019 at 1:20 PM Sean Christopherson > > > <sean.j.christopherson@intel.com> wrote: > > > > > > > > On Wed, Mar 27, 2019 at 12:50:27PM -0700, Jim Mattson wrote: > > > > > Where does this state get saved/restored when suspending/resuming a > > > > > VCPU that's in SMM? > > > > > > > > cr4_vmxe is handled by vmx_pre_smi_save_state() and > > > > vmx_post_rsm_load_state(). > > > > > > > > The existing vmxon and guest_mode are handled by post_smi_save_state() > > > > and pre_rsm_load_state(), which were previously pre_{enter,leave}_smm(). > > > > > > > > Ideally everything in the struct would be managed together, e.g. in > > > > vmx_pre_smi_save_state() and vmx_post_rsm_load_state(), but I didn't > > > > want to touch that house of cards at this time. > > > > > > > > And even more ideally, post_smi_save_state() and pre_rsm_load_state() > > > > would be dropped altogether, e.g. the SVM code would also be reworked to > > > > use pre_smi_save_state() and post_rsm_load_state(), but my SVM knowledge > > > > is nowhere near sufficient to judge if that's even remotely feasible. > > > > > > No; I meant which of the myriad KVM_GET_* and KVM_SET_* ioctls provide > > > a mechanism to save and restore these tidbits of vCPU state (e.g. for > > > live migration)? > > > > Ah fudge. KVM_{GET,SET}_NESTED_STATE, i.e. vmx_{get,set}_nested_state(). > > This patch would also need to add and handle KVM_STATE_NESTED_SMM_CR4_VMXE. > > Sorry. I haven't been following the evolution of those ioctls since I > first proposed them and Paolo shut me down. I see the handling of > "nested SMM state" now, and I agree that this would be the place to > handle the internal stash of CR4.VMXE. However, vmx_set_nested_state > currently allows only two bits to be set in vmx.smm.flags: > KVM_STATE_NESTED_SMM_GUEST_MODE and KVM_STATE_NESTED_SMM_VMXON. Does > userspace have to opt-in to KVM_STATE_NESTED_SMM_CR4_VMXE through a > new KVM_CAP? Hmm. I think we could squeak by without it on a technicality. Nested VMX was only enabled by default after the bug was introduced. Any migration from an older KVM that "officially" supports nested VMX will crash the guest ater migration regardless of the target KVM, since KVM will signal a #GP on RSM due to CR4.VMXE=1 in the SMM save state area. I.e. it wouldn't break anything that wasn't already broken. Toggling HF_SMM_MASK wouldn't suffer the same fate, i.e. new KVMs would happily accept migration from broken KVMs. That being said, the only way that can happen is if migration is triggered during the first SMI with CR4.VMXE=1, since the guest will have already crashed due to RSM taking a #GP in the broken pre-migration KVM. Given this is the first bug report for the RSM w/ CR4.VMXE=1, I'm going to go out on a limb and say that the migration corner case is extremely unlikely, to put it mildly. So I think we're good? Maybe add an errata in the docs to note the extremely unlikely corner case?
On Wed, Mar 27, 2019 at 3:05 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > Hmm. I think we could squeak by without it on a technicality. Nested > VMX was only enabled by default after the bug was introduced. Any > migration from an older KVM that "officially" supports nested VMX will > crash the guest ater migration regardless of the target KVM, since KVM > will signal a #GP on RSM due to CR4.VMXE=1 in the SMM save state area. > I.e. it wouldn't break anything that wasn't already broken. > > Toggling HF_SMM_MASK wouldn't suffer the same fate, i.e. new KVMs would > happily accept migration from broken KVMs. That being said, the only > way that can happen is if migration is triggered during the first SMI > with CR4.VMXE=1, since the guest will have already crashed due to RSM > taking a #GP in the broken pre-migration KVM. Given this is the first > bug report for the RSM w/ CR4.VMXE=1, I'm going to go out on a limb and > say that the migration corner case is extremely unlikely, to put it mildly. > > So I think we're good? Maybe add an errata in the docs to note the > extremely unlikely corner case? Yeah, probably. Who's using virtual SMM, anyway? :-) I don't see any tests of virtual SMM in either kvm-unit-tests or tools/testing/selftests/kvm, but maybe I'm not looking closely enough.
Sean Christopherson <sean.j.christopherson@intel.com> writes: > Intel CPUs manually save and load CR4.VMXE internally instead of having > it automatically switched via the SMM State Save. Specifically, the > SDM uses the following pseudocode to describe SMI and RSM: > > SMI: > enter SMM; > save the following internal to the processor: > CR4.VMXE > an indication of whether the logical processor was in VMX operation (root or non-root) > IF the logical processor is in VMX operation THEN > save current VMCS pointer internal to the processor; > leave VMX operation; > save VMX-critical state defined below > FI; > CR4.VMXE <- 0; > perform ordinary SMI delivery: > save processor state in SMRAM; > set processor state to standard SMM values; > > RSM: > IF VMXE = 1 in CR4 image in SMRAM THEN > fail and enter shutdown state; > ELSE > restore state normally from SMRAM; > <unrelated stuff omitted for brevity> > FI; > CR4.VMXE <- value stored internally; > <more stuff omitted for brevity> > leave SMM; > > Presumably, the manual handling of CR4.VMXE is done to avoid having a > to special case CR4.VMXE in the legacy state save/load flows. As luck > would have it, KVM has a similar conundrum in its SMM state save/load > flows in that vmx_set_cr4() forbids CR4.VMXE from being set while in > SMM. The check in vmx_set_cr4() can cause RSM to fail when loading > CR4 from the SMM save state area. > > Take a page out of the SDM and manually handle CR4.VMXE during SMI and > RSM. Alternatively, HF_SMM_MASK could be temporarily disabled when > setting CR4 as part of RSM, but doing so is rather ugly due to the need > to plumb the "from_rsm" information through to emulator_set_cr(), and > also opens up a virtualization hole, e.g. KVM would not properly handle > the (extremely unlikely) scenario where the guest toggles CR4.VMXE in > the SMM save state area from 0->1 > > Reported-by: Jon Doron <arilou@gmail.com> > Suggested-by: Jim Mattson <jmattson@google.com> > Cc: Liran Alon <liran.alon@oracle.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM") > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/include/asm/kvm_emulate.h | 1 + > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/emulate.c | 2 ++ > arch/x86/kvm/svm.c | 12 ++++++++++++ > arch/x86/kvm/vmx/vmx.c | 22 ++++++++++++++++++++++ > arch/x86/kvm/vmx/vmx.h | 2 ++ > arch/x86/kvm/x86.c | 9 +++++++++ > 7 files changed, 50 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h > index b2a65d08d1f8..02fa7ec89315 100644 > --- a/arch/x86/include/asm/kvm_emulate.h > +++ b/arch/x86/include/asm/kvm_emulate.h > @@ -227,6 +227,7 @@ struct x86_emulate_ops { > unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt); > void (*set_hflags)(struct x86_emulate_ctxt *ctxt, unsigned hflags); > int (*pre_rsm_load_state)(struct x86_emulate_ctxt *ctxt, u64 smbase); > + void (*post_rsm_load_state)(struct x86_emulate_ctxt *ctxt); > > }; > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 29ce45f41ee5..0ceb8b353dc0 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1181,8 +1181,10 @@ struct kvm_x86_ops { > void (*get_vmcs12_pages)(struct kvm_vcpu *vcpu); > > int (*smi_allowed)(struct kvm_vcpu *vcpu); > + void (*pre_smi_save_state)(struct kvm_vcpu *vcpu); > int (*post_smi_save_state)(struct kvm_vcpu *vcpu, char *smstate); > int (*pre_rsm_load_state)(struct kvm_vcpu *vcpu, u64 smbase); > + void (*post_rsm_load_state)(struct kvm_vcpu *vcpu); > int (*enable_smi_window)(struct kvm_vcpu *vcpu); > > int (*mem_enc_op)(struct kvm *kvm, void __user *argp); > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index ca60eb3358d9..706accf524c3 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -2625,6 +2625,8 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt) > return X86EMUL_UNHANDLEABLE; > } > > + ctxt->ops->post_rsm_load_state(ctxt); > + > if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0) > ctxt->ops->set_nmi_mask(ctxt, false); > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index c0e511c3b298..edc12fb39b32 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -6193,6 +6193,11 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu) > return 1; > } > > +static void svm_pre_smi_save_state(struct kvm_vcpu *vcpu) > +{ > + > +} > + > static int svm_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate) > { > struct vcpu_svm *svm = to_svm(vcpu); > @@ -6243,6 +6248,11 @@ static int svm_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase) > return ret; > } > > +static void svm_post_rsm_load_state(struct kvm_vcpu *vcpu) > +{ > + > +} > + > static int enable_smi_window(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > @@ -7251,8 +7261,10 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { > .setup_mce = svm_setup_mce, > > .smi_allowed = svm_smi_allowed, > + .pre_smi_save_state = svm_pre_smi_save_state, > .post_smi_save_state = svm_post_smi_save_state, > .pre_rsm_load_state = svm_pre_rsm_load_state, > + .post_rsm_load_state = svm_post_rsm_load_state, > .enable_smi_window = enable_smi_window, > > .mem_enc_op = svm_mem_enc_op, > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index fc96101e39ac..617443c1c6d7 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7354,6 +7354,16 @@ static int vmx_smi_allowed(struct kvm_vcpu *vcpu) > return 1; > } > > +static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + > + if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) { > + vmx->nested.smm.cr4_vmxe = true; > + WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE)); This WARN_ON fires: vmx_set_cr4() has the following check: if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4)) return 1; X86_CR4_VMXE can't be unset while nested.vmxon is on.... > + } > +} > + > static int vmx_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > @@ -7390,6 +7400,16 @@ static int vmx_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase) > return 0; > } > > +static void vmx_post_rsm_load_state(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + > + if (vmx->nested.smm.cr4_vmxe) { > + WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) | X86_CR4_VMXE)); > + vmx->nested.smm.cr4_vmxe = false; If we manage to pass the previous problem this will likely fail: post_rsm_load_state() is called with HF_SMM_MASK still set. > + } > +} > + > static int enable_smi_window(struct kvm_vcpu *vcpu) > { > return 0; > @@ -7693,8 +7713,10 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { > .setup_mce = vmx_setup_mce, > > .smi_allowed = vmx_smi_allowed, > + .pre_smi_save_state = vmx_pre_smi_save_state, > .post_smi_save_state = vmx_post_smi_save_state, > .pre_rsm_load_state = vmx_pre_rsm_load_state, > + .post_rsm_load_state = vmx_post_rsm_load_state, > .enable_smi_window = enable_smi_window, > > .check_nested_events = NULL, > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > index a1e00d0a2482..a3444625ca7f 100644 > --- a/arch/x86/kvm/vmx/vmx.h > +++ b/arch/x86/kvm/vmx/vmx.h > @@ -162,6 +162,8 @@ struct nested_vmx { > > /* SMM related state */ > struct { > + /* CR4.VMXE=1 on SMM entry? */ > + bool cr4_vmxe; > /* in VMX operation on SMM entry? */ > bool vmxon; > /* in guest mode on SMM entry? */ > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index abfac99fb7c5..1b07706df840 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5966,6 +5966,11 @@ static int emulator_pre_rsm_load_state(struct x86_emulate_ctxt *ctxt, u64 smbase > return kvm_x86_ops->pre_rsm_load_state(emul_to_vcpu(ctxt), smbase); > } > > +static void emulator_post_rsm_load_state(struct x86_emulate_ctxt *ctxt) > +{ > + kvm_x86_ops->post_rsm_load_state(emul_to_vcpu(ctxt)); > +} > + > static const struct x86_emulate_ops emulate_ops = { > .read_gpr = emulator_read_gpr, > .write_gpr = emulator_write_gpr, > @@ -6006,6 +6011,7 @@ static const struct x86_emulate_ops emulate_ops = { > .get_hflags = emulator_get_hflags, > .set_hflags = emulator_set_hflags, > .pre_rsm_load_state = emulator_pre_rsm_load_state, > + .post_rsm_load_state = emulator_post_rsm_load_state, > }; > > static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) > @@ -7516,6 +7522,9 @@ static void enter_smm(struct kvm_vcpu *vcpu) > u32 cr0; > > trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true); > + > + kvm_x86_ops->pre_smi_save_state(vcpu); > + > memset(buf, 0, 512); > if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) > enter_smm_save_state_64(vcpu, buf);
On 27/03/19 23:14, Jim Mattson wrote: >> Hmm. I think we could squeak by without it on a technicality. Nested >> VMX was only enabled by default after the bug was introduced. Any >> migration from an older KVM that "officially" supports nested VMX will >> crash the guest ater migration regardless of the target KVM, since KVM >> will signal a #GP on RSM due to CR4.VMXE=1 in the SMM save state area. >> I.e. it wouldn't break anything that wasn't already broken. >> >> Toggling HF_SMM_MASK wouldn't suffer the same fate, i.e. new KVMs would >> happily accept migration from broken KVMs. That being said, the only >> way that can happen is if migration is triggered during the first SMI >> with CR4.VMXE=1, since the guest will have already crashed due to RSM >> taking a #GP in the broken pre-migration KVM. Given this is the first >> bug report for the RSM w/ CR4.VMXE=1, I'm going to go out on a limb and >> say that the migration corner case is extremely unlikely, to put it mildly. >> >> So I think we're good? Maybe add an errata in the docs to note the >> extremely unlikely corner case? > Yeah, probably. Who's using virtual SMM, anyway? :-) Anyone that is using DOS with versions of SeaBIOS that run 32-bit drives in SMM. :) But yeah, there is no way that this can ever have worked so we can fix the API. Paolo > I don't see any tests of virtual SMM in either kvm-unit-tests or > tools/testing/selftests/kvm, but maybe I'm not looking closely enough.
On 27/03/19 21:34, Sean Christopherson wrote: > Ah fudge. KVM_{GET,SET}_NESTED_STATE, i.e. vmx_{get,set}_nested_state(). > This patch would also need to add and handle KVM_STATE_NESTED_SMM_CR4_VMXE. Like this: diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index dabfcf7c3941..88bdff7b866c 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -388,6 +388,7 @@ struct kvm_sync_regs { #define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_SMM_VMXON 0x00000002 +#define KVM_STATE_NESTED_SMM_CR4_VMXE 0x00000004 struct kvm_vmx_nested_state { __u64 vmxon_pa; diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 153e539c29c9..cbd5f2ddf7f7 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5245,6 +5245,9 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu, if (nested_vmx_allowed(vcpu) && vmx->nested.enlightened_vmcs_enabled) kvm_state.flags |= KVM_STATE_NESTED_EVMCS; + if (vmx->nested.smm.cr4_vmxe) + kvm_state.vmx.smm.flags |= KVM_STATE_NESTED_SMM_CR4_VMXE; + if (nested_vmx_allowed(vcpu) && (vmx->nested.vmxon || vmx->nested.smm.vmxon)) { kvm_state.vmx.vmxon_pa = vmx->nested.vmxon_ptr; @@ -5365,7 +5368,8 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu, return -EINVAL; if (kvm_state->vmx.smm.flags & - ~(KVM_STATE_NESTED_SMM_GUEST_MODE | KVM_STATE_NESTED_SMM_VMXON)) + ~(KVM_STATE_NESTED_SMM_GUEST_MODE | KVM_STATE_NESTED_SMM_VMXON | + KVM_STATE_NESTED_SMM_CR4_VMXE)) return -EINVAL; /* @@ -5380,6 +5384,10 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu, !(kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_VMXON)) return -EINVAL; + if ((kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_VMXON) && + !(kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_CR4_VMXE)) + return -EINVAL; + vmx_leave_nested(vcpu); if (kvm_state->vmx.vmxon_pa == -1ull) return 0; @@ -5409,6 +5417,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu, return -EINVAL; } + vmx->nested.smm.cr4_vmxe = + !!(kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_CR4_VMXE); + if (kvm_state->vmx.smm.flags & KVM_STATE_NESTED_SMM_VMXON) { vmx->nested.smm.vmxon = true; vmx->nested.vmxon = false; I'll try to put together a testcase. Paolo
On 28/03/19 10:42, Vitaly Kuznetsov wrote: >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >> index fc96101e39ac..617443c1c6d7 100644 >> --- a/arch/x86/kvm/vmx/vmx.c >> +++ b/arch/x86/kvm/vmx/vmx.c >> @@ -7354,6 +7354,16 @@ static int vmx_smi_allowed(struct kvm_vcpu *vcpu) >> return 1; >> } >> >> +static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu) >> +{ >> + struct vcpu_vmx *vmx = to_vmx(vcpu); >> + >> + if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) { >> + vmx->nested.smm.cr4_vmxe = true; >> + WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE)); > > This WARN_ON fires: vmx_set_cr4() has the following check: > > if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4)) > return 1; > > X86_CR4_VMXE can't be unset while nested.vmxon is on.... Could be fixed like this: diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 47fbf91fe902..31ef4973e90a 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2874,15 +2874,29 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) vmcs_writel(GUEST_CR3, guest_cr3); } +static void __vmx_set_cr4_shadow(struct vcpu_vmx *vmx, unsigned long cr4, + unsigned long hw_cr4) +{ + WARN_ON((hw_cr4 ^ cr4) & ~vmx->vcpu.arch.cr4_guest_owned_bits); + vmcs_writel(CR4_READ_SHADOW, cr4); +} + +static void vmx_set_cr4_shadow(struct vcpu_vmx *vmx, unsigned long cr4) +{ + unsigned long hw_cr4 = vmcs_readl(GUEST_CR4); + __vmx_set_cr4_shadow(vmx, cr4, hw_cr4); +} + int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { + struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long hw_cr4; + /* * Pass through host's Machine Check Enable value to hw_cr4, which * is in force while we are in guest mode. Do not let guests control * this bit, even if host CR4.MCE == 0. */ - unsigned long hw_cr4; - hw_cr4 = (cr4_read_shadow() & X86_CR4_MCE) | (cr4 & ~X86_CR4_MCE); if (enable_unrestricted_guest) hw_cr4 |= KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST; @@ -2944,8 +2958,8 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE); } - vmcs_writel(CR4_READ_SHADOW, cr4); vmcs_writel(GUEST_CR4, hw_cr4); + __vmx_set_cr4_shadow(vmx, cr4, hw_cr4); return 0; } @@ -7361,7 +7375,13 @@ static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu) if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) { vmx->nested.smm.cr4_vmxe = true; - WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE)); + /* + * We cannot move nested.vmxon to nested.smm.vmxon until after + * the state has been saved, on the other hand we need to + * clear CR4.VMXE *before* it is saved to the SMRAM state save + * area. Fudge it directly in the VMCS. + */ + vmx_set_cr4_shadow(vmx, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE); } } > >> + } >> +} >> + >> static int vmx_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate) >> { >> struct vcpu_vmx *vmx = to_vmx(vcpu); >> @@ -7390,6 +7400,16 @@ static int vmx_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase) >> return 0; >> } >> >> +static void vmx_post_rsm_load_state(struct kvm_vcpu *vcpu) >> +{ >> + struct vcpu_vmx *vmx = to_vmx(vcpu); >> + >> + if (vmx->nested.smm.cr4_vmxe) { >> + WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) | X86_CR4_VMXE)); >> + vmx->nested.smm.cr4_vmxe = false; > > If we manage to pass the previous problem this will likely fail: > > post_rsm_load_state() is called with HF_SMM_MASK still set. And this is hopefully just a matter of diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 706accf524c3..a508d07d7483 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2625,13 +2625,14 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt) return X86EMUL_UNHANDLEABLE; } - ctxt->ops->post_rsm_load_state(ctxt); - if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0) ctxt->ops->set_nmi_mask(ctxt, false); ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) & ~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK)); + + ctxt->ops->post_rsm_load_state(ctxt); + return X86EMUL_CONTINUE; } but it's probably a good idea to also use vmx_set_cr4_shadow in vmx_post_rsm_load_state. At the same time, we should really try to unify the callbacks so that at least Intel only uses pre_rsm_save and post_rsm_load. After I have a test case, I'll refresh Sean's patches and send them out. Paolo >> + } >> +} >> + >> static int enable_smi_window(struct kvm_vcpu *vcpu) >> { >> return 0; >> @@ -7693,8 +7713,10 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { >> .setup_mce = vmx_setup_mce, >> >> .smi_allowed = vmx_smi_allowed, >> + .pre_smi_save_state = vmx_pre_smi_save_state, >> .post_smi_save_state = vmx_post_smi_save_state, >> .pre_rsm_load_state = vmx_pre_rsm_load_state, >> + .post_rsm_load_state = vmx_post_rsm_load_state, >> .enable_smi_window = enable_smi_window, >> >> .check_nested_events = NULL, >> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h >> index a1e00d0a2482..a3444625ca7f 100644 >> --- a/arch/x86/kvm/vmx/vmx.h >> +++ b/arch/x86/kvm/vmx/vmx.h >> @@ -162,6 +162,8 @@ struct nested_vmx { >> >> /* SMM related state */ >> struct { >> + /* CR4.VMXE=1 on SMM entry? */ >> + bool cr4_vmxe; >> /* in VMX operation on SMM entry? */ >> bool vmxon; >> /* in guest mode on SMM entry? */ >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index abfac99fb7c5..1b07706df840 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -5966,6 +5966,11 @@ static int emulator_pre_rsm_load_state(struct x86_emulate_ctxt *ctxt, u64 smbase >> return kvm_x86_ops->pre_rsm_load_state(emul_to_vcpu(ctxt), smbase); >> } >> >> +static void emulator_post_rsm_load_state(struct x86_emulate_ctxt *ctxt) >> +{ >> + kvm_x86_ops->post_rsm_load_state(emul_to_vcpu(ctxt)); >> +} >> + >> static const struct x86_emulate_ops emulate_ops = { >> .read_gpr = emulator_read_gpr, >> .write_gpr = emulator_write_gpr, >> @@ -6006,6 +6011,7 @@ static const struct x86_emulate_ops emulate_ops = { >> .get_hflags = emulator_get_hflags, >> .set_hflags = emulator_set_hflags, >> .pre_rsm_load_state = emulator_pre_rsm_load_state, >> + .post_rsm_load_state = emulator_post_rsm_load_state, >> }; >> >> static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) >> @@ -7516,6 +7522,9 @@ static void enter_smm(struct kvm_vcpu *vcpu) >> u32 cr0; >> >> trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true); >> + >> + kvm_x86_ops->pre_smi_save_state(vcpu); >> + >> memset(buf, 0, 512); >> if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) >> enter_smm_save_state_64(vcpu, buf); >
On Thu, Mar 28, 2019 at 10:42:31AM +0100, Vitaly Kuznetsov wrote: > Sean Christopherson <sean.j.christopherson@intel.com> writes: ... > > +static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu) > > +{ > > + struct vcpu_vmx *vmx = to_vmx(vcpu); > > + > > + if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) { > > + vmx->nested.smm.cr4_vmxe = true; > > + WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE)); > > This WARN_ON fires: vmx_set_cr4() has the following check: > > if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4)) > return 1; > > X86_CR4_VMXE can't be unset while nested.vmxon is on.... Blech. Moving the handling of nested.vmxon here is somewhat doable, but holy cow does it become a jumbled mess. Rather than trying to constantly juggle HF_SMM_MASK, I have a more ambitious idea: Clear HF_SMM_MASK before loading state. RFC incoming... > > + } > > +} > > + > > static int vmx_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate) > > { > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > @@ -7390,6 +7400,16 @@ static int vmx_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase) > > return 0; > > } > > > > +static void vmx_post_rsm_load_state(struct kvm_vcpu *vcpu) > > +{ > > + struct vcpu_vmx *vmx = to_vmx(vcpu); > > + > > + if (vmx->nested.smm.cr4_vmxe) { > > + WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) | X86_CR4_VMXE)); > > + vmx->nested.smm.cr4_vmxe = false; > > If we manage to pass the previous problem this will likely fail: > > post_rsm_load_state() is called with HF_SMM_MASK still set.
diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b2a65d08d1f8..02fa7ec89315 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -227,6 +227,7 @@ struct x86_emulate_ops { unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt); void (*set_hflags)(struct x86_emulate_ctxt *ctxt, unsigned hflags); int (*pre_rsm_load_state)(struct x86_emulate_ctxt *ctxt, u64 smbase); + void (*post_rsm_load_state)(struct x86_emulate_ctxt *ctxt); }; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 29ce45f41ee5..0ceb8b353dc0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1181,8 +1181,10 @@ struct kvm_x86_ops { void (*get_vmcs12_pages)(struct kvm_vcpu *vcpu); int (*smi_allowed)(struct kvm_vcpu *vcpu); + void (*pre_smi_save_state)(struct kvm_vcpu *vcpu); int (*post_smi_save_state)(struct kvm_vcpu *vcpu, char *smstate); int (*pre_rsm_load_state)(struct kvm_vcpu *vcpu, u64 smbase); + void (*post_rsm_load_state)(struct kvm_vcpu *vcpu); int (*enable_smi_window)(struct kvm_vcpu *vcpu); int (*mem_enc_op)(struct kvm *kvm, void __user *argp); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ca60eb3358d9..706accf524c3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2625,6 +2625,8 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt) return X86EMUL_UNHANDLEABLE; } + ctxt->ops->post_rsm_load_state(ctxt); + if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0) ctxt->ops->set_nmi_mask(ctxt, false); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c0e511c3b298..edc12fb39b32 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -6193,6 +6193,11 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu) return 1; } +static void svm_pre_smi_save_state(struct kvm_vcpu *vcpu) +{ + +} + static int svm_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate) { struct vcpu_svm *svm = to_svm(vcpu); @@ -6243,6 +6248,11 @@ static int svm_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase) return ret; } +static void svm_post_rsm_load_state(struct kvm_vcpu *vcpu) +{ + +} + static int enable_smi_window(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -7251,8 +7261,10 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .setup_mce = svm_setup_mce, .smi_allowed = svm_smi_allowed, + .pre_smi_save_state = svm_pre_smi_save_state, .post_smi_save_state = svm_post_smi_save_state, .pre_rsm_load_state = svm_pre_rsm_load_state, + .post_rsm_load_state = svm_post_rsm_load_state, .enable_smi_window = enable_smi_window, .mem_enc_op = svm_mem_enc_op, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index fc96101e39ac..617443c1c6d7 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7354,6 +7354,16 @@ static int vmx_smi_allowed(struct kvm_vcpu *vcpu) return 1; } +static void vmx_pre_smi_save_state(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (kvm_read_cr4(vcpu) & X86_CR4_VMXE) { + vmx->nested.smm.cr4_vmxe = true; + WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) & ~X86_CR4_VMXE)); + } +} + static int vmx_post_smi_save_state(struct kvm_vcpu *vcpu, char *smstate) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -7390,6 +7400,16 @@ static int vmx_pre_rsm_load_state(struct kvm_vcpu *vcpu, u64 smbase) return 0; } +static void vmx_post_rsm_load_state(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (vmx->nested.smm.cr4_vmxe) { + WARN_ON(vmx_set_cr4(vcpu, kvm_read_cr4(vcpu) | X86_CR4_VMXE)); + vmx->nested.smm.cr4_vmxe = false; + } +} + static int enable_smi_window(struct kvm_vcpu *vcpu) { return 0; @@ -7693,8 +7713,10 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .setup_mce = vmx_setup_mce, .smi_allowed = vmx_smi_allowed, + .pre_smi_save_state = vmx_pre_smi_save_state, .post_smi_save_state = vmx_post_smi_save_state, .pre_rsm_load_state = vmx_pre_rsm_load_state, + .post_rsm_load_state = vmx_post_rsm_load_state, .enable_smi_window = enable_smi_window, .check_nested_events = NULL, diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index a1e00d0a2482..a3444625ca7f 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -162,6 +162,8 @@ struct nested_vmx { /* SMM related state */ struct { + /* CR4.VMXE=1 on SMM entry? */ + bool cr4_vmxe; /* in VMX operation on SMM entry? */ bool vmxon; /* in guest mode on SMM entry? */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index abfac99fb7c5..1b07706df840 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5966,6 +5966,11 @@ static int emulator_pre_rsm_load_state(struct x86_emulate_ctxt *ctxt, u64 smbase return kvm_x86_ops->pre_rsm_load_state(emul_to_vcpu(ctxt), smbase); } +static void emulator_post_rsm_load_state(struct x86_emulate_ctxt *ctxt) +{ + kvm_x86_ops->post_rsm_load_state(emul_to_vcpu(ctxt)); +} + static const struct x86_emulate_ops emulate_ops = { .read_gpr = emulator_read_gpr, .write_gpr = emulator_write_gpr, @@ -6006,6 +6011,7 @@ static const struct x86_emulate_ops emulate_ops = { .get_hflags = emulator_get_hflags, .set_hflags = emulator_set_hflags, .pre_rsm_load_state = emulator_pre_rsm_load_state, + .post_rsm_load_state = emulator_post_rsm_load_state, }; static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) @@ -7516,6 +7522,9 @@ static void enter_smm(struct kvm_vcpu *vcpu) u32 cr0; trace_kvm_enter_smm(vcpu->vcpu_id, vcpu->arch.smbase, true); + + kvm_x86_ops->pre_smi_save_state(vcpu); + memset(buf, 0, 512); if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) enter_smm_save_state_64(vcpu, buf);
Intel CPUs manually save and load CR4.VMXE internally instead of having it automatically switched via the SMM State Save. Specifically, the SDM uses the following pseudocode to describe SMI and RSM: SMI: enter SMM; save the following internal to the processor: CR4.VMXE an indication of whether the logical processor was in VMX operation (root or non-root) IF the logical processor is in VMX operation THEN save current VMCS pointer internal to the processor; leave VMX operation; save VMX-critical state defined below FI; CR4.VMXE <- 0; perform ordinary SMI delivery: save processor state in SMRAM; set processor state to standard SMM values; RSM: IF VMXE = 1 in CR4 image in SMRAM THEN fail and enter shutdown state; ELSE restore state normally from SMRAM; <unrelated stuff omitted for brevity> FI; CR4.VMXE <- value stored internally; <more stuff omitted for brevity> leave SMM; Presumably, the manual handling of CR4.VMXE is done to avoid having a to special case CR4.VMXE in the legacy state save/load flows. As luck would have it, KVM has a similar conundrum in its SMM state save/load flows in that vmx_set_cr4() forbids CR4.VMXE from being set while in SMM. The check in vmx_set_cr4() can cause RSM to fail when loading CR4 from the SMM save state area. Take a page out of the SDM and manually handle CR4.VMXE during SMI and RSM. Alternatively, HF_SMM_MASK could be temporarily disabled when setting CR4 as part of RSM, but doing so is rather ugly due to the need to plumb the "from_rsm" information through to emulator_set_cr(), and also opens up a virtualization hole, e.g. KVM would not properly handle the (extremely unlikely) scenario where the guest toggles CR4.VMXE in the SMM save state area from 0->1 Reported-by: Jon Doron <arilou@gmail.com> Suggested-by: Jim Mattson <jmattson@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/asm/kvm_emulate.h | 1 + arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/emulate.c | 2 ++ arch/x86/kvm/svm.c | 12 ++++++++++++ arch/x86/kvm/vmx/vmx.c | 22 ++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.h | 2 ++ arch/x86/kvm/x86.c | 9 +++++++++ 7 files changed, 50 insertions(+)