diff mbox

[v3,6/6] KVM: nSVM: fix SMI injection in guest mode

Message ID 20170925080904.24850-7-lprosek@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ladi Prosek Sept. 25, 2017, 8:09 a.m. UTC
Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* MMU was confused (walk_mmu was still set to nested_mmu)
* INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
document this but they provide fields in the SMM state-save area to stash the
current state of SVM. What we need to do is basically get out of guest mode for
the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
control and no L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using
enter_svm_guest_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/svm.c              | 56 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c              |  3 ---
 3 files changed, 57 insertions(+), 5 deletions(-)

Comments

Radim Krčmář Oct. 3, 2017, 7:53 p.m. UTC | #1
2017-09-25 10:09+0200, Ladi Prosek:
> Entering SMM while running in guest mode wasn't working very well because several
> pieces of the vcpu state were left set up for nested operation.
> 
> Some of the issues observed:
> 
> * L1 was getting unexpected VM exits (using L1 interception controls but running
>   in SMM execution environment)
> * MMU was confused (walk_mmu was still set to nested_mmu)
> * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
> 
> Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
> entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
> document this but they provide fields in the SMM state-save area to stash the
> current state of SVM. What we need to do is basically get out of guest mode for
> the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
> control and no L1 observable state changes.

The best description I found is in APM vol. 2, 15.22 SMM Support:

• The simplest solution is to not intercept SMI signals. SMIs
  encountered while in a guest context are taken from within the guest
  context. In this case, the SMM handler is not subject to any
  intercepts set up by the VMM and consequently runs outside of the
  virtualization controls. The state saved in the SMM State-Save area as
  seen by the SMM handler reflects the state of the guest that had been
  running at the time the SMI was encountered. When the SMM handler
  executes the RSM instruction, the processor returns to executing in
  the guest context, and any modifications to the SMM State-Save area
  made by the SMM handler are reflected in the guest state.

I think that the last sentence is not implemented correctly:
svm_prep_enter_smm() loads the host state into VMCB and
enter_smm_save_state_64() then puts host state where the SMM handler
would expect guest state.

Do Windows intercept SMI?

Thanks.
Ladi Prosek Oct. 4, 2017, 10:10 a.m. UTC | #2
On Tue, Oct 3, 2017 at 9:53 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2017-09-25 10:09+0200, Ladi Prosek:
>> Entering SMM while running in guest mode wasn't working very well because several
>> pieces of the vcpu state were left set up for nested operation.
>>
>> Some of the issues observed:
>>
>> * L1 was getting unexpected VM exits (using L1 interception controls but running
>>   in SMM execution environment)
>> * MMU was confused (walk_mmu was still set to nested_mmu)
>> * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
>>
>> Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
>> entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
>> document this but they provide fields in the SMM state-save area to stash the
>> current state of SVM. What we need to do is basically get out of guest mode for
>> the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
>> control and no L1 observable state changes.
>
> The best description I found is in APM vol. 2, 15.22 SMM Support:
>
> • The simplest solution is to not intercept SMI signals. SMIs
>   encountered while in a guest context are taken from within the guest
>   context. In this case, the SMM handler is not subject to any
>   intercepts set up by the VMM and consequently runs outside of the
>   virtualization controls. The state saved in the SMM State-Save area as
>   seen by the SMM handler reflects the state of the guest that had been
>   running at the time the SMI was encountered. When the SMM handler
>   executes the RSM instruction, the processor returns to executing in
>   the guest context, and any modifications to the SMM State-Save area
>   made by the SMM handler are reflected in the guest state.
>
> I think that the last sentence is not implemented correctly:
> svm_prep_enter_smm() loads the host state into VMCB and
> enter_smm_save_state_64() then puts host state where the SMM handler
> would expect guest state.

That's true. It seems to be easier to switch to the nested guest after
we've restored all the basics from the SMM area. I briefly tried to
reverse the order, which should make it compliant with 15.22 quoted
above and hopefully also with what Intel wants (it's not clear to me
whether their "leave VMX operation" in 34.14.1 Default Treatment of
SMI Delivery means switching to host state or not). I quickly ran into
issues though because entering the nested guest doesn't quite work if
the vCPU is still in real mode.

I think that the bootstrap sequence in em_rsm() has to be tweaked to
make it possible to call post_leave_smm() before restoring state from
the SMM area. I'll look into it.

> Do Windows intercept SMI?

Yes, but only if started with >1 vCPU.

Thanks!
Radim Krčmář Oct. 4, 2017, 2:42 p.m. UTC | #3
2017-10-04 12:10+0200, Ladi Prosek:
> On Tue, Oct 3, 2017 at 9:53 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > 2017-09-25 10:09+0200, Ladi Prosek:
> >> Entering SMM while running in guest mode wasn't working very well because several
> >> pieces of the vcpu state were left set up for nested operation.
> >>
> >> Some of the issues observed:
> >>
> >> * L1 was getting unexpected VM exits (using L1 interception controls but running
> >>   in SMM execution environment)
> >> * MMU was confused (walk_mmu was still set to nested_mmu)
> >> * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
> >>
> >> Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
> >> entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
> >> document this but they provide fields in the SMM state-save area to stash the
> >> current state of SVM. What we need to do is basically get out of guest mode for
> >> the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
> >> control and no L1 observable state changes.
> >
> > The best description I found is in APM vol. 2, 15.22 SMM Support:
> >
> > • The simplest solution is to not intercept SMI signals. SMIs
> >   encountered while in a guest context are taken from within the guest
> >   context. In this case, the SMM handler is not subject to any
> >   intercepts set up by the VMM and consequently runs outside of the
> >   virtualization controls. The state saved in the SMM State-Save area as
> >   seen by the SMM handler reflects the state of the guest that had been
> >   running at the time the SMI was encountered. When the SMM handler
> >   executes the RSM instruction, the processor returns to executing in
> >   the guest context, and any modifications to the SMM State-Save area
> >   made by the SMM handler are reflected in the guest state.
> >
> > I think that the last sentence is not implemented correctly:
> > svm_prep_enter_smm() loads the host state into VMCB and
> > enter_smm_save_state_64() then puts host state where the SMM handler
> > would expect guest state.
> 
> That's true. It seems to be easier to switch to the nested guest after
> we've restored all the basics from the SMM area. I briefly tried to
> reverse the order, which should make it compliant with 15.22 quoted
> above and hopefully also with what Intel wants (it's not clear to me
> whether their "leave VMX operation" in 34.14.1 Default Treatment of
> SMI Delivery means switching to host state or not). I quickly ran into
> issues though because entering the nested guest doesn't quite work if
> the vCPU is still in real mode.

We basically want to keep the L2 state while removing L2 controls.
I think the current order could be maintained if we copied L2 state into
L1 before running current SMM routines.  Afterward, we'd copy restored
L1 state into L2.

It is inefficient, but should be simple. :)

> I think that the bootstrap sequence in em_rsm() has to be tweaked to
> make it possible to call post_leave_smm() before restoring state from
> the SMM area. I'll look into it.
> 
> > Do Windows intercept SMI?
> 
> Yes, but only if started with >1 vCPU.

Interesting, thanks.
Ladi Prosek Oct. 10, 2017, 8:03 a.m. UTC | #4
On Wed, Oct 4, 2017 at 4:42 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2017-10-04 12:10+0200, Ladi Prosek:
>> On Tue, Oct 3, 2017 at 9:53 PM, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> > 2017-09-25 10:09+0200, Ladi Prosek:
>> >> Entering SMM while running in guest mode wasn't working very well because several
>> >> pieces of the vcpu state were left set up for nested operation.
>> >>
>> >> Some of the issues observed:
>> >>
>> >> * L1 was getting unexpected VM exits (using L1 interception controls but running
>> >>   in SMM execution environment)
>> >> * MMU was confused (walk_mmu was still set to nested_mmu)
>> >> * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)
>> >>
>> >> Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
>> >> entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
>> >> document this but they provide fields in the SMM state-save area to stash the
>> >> current state of SVM. What we need to do is basically get out of guest mode for
>> >> the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
>> >> control and no L1 observable state changes.
>> >
>> > The best description I found is in APM vol. 2, 15.22 SMM Support:
>> >
>> > • The simplest solution is to not intercept SMI signals. SMIs
>> >   encountered while in a guest context are taken from within the guest
>> >   context. In this case, the SMM handler is not subject to any
>> >   intercepts set up by the VMM and consequently runs outside of the
>> >   virtualization controls. The state saved in the SMM State-Save area as
>> >   seen by the SMM handler reflects the state of the guest that had been
>> >   running at the time the SMI was encountered. When the SMM handler
>> >   executes the RSM instruction, the processor returns to executing in
>> >   the guest context, and any modifications to the SMM State-Save area
>> >   made by the SMM handler are reflected in the guest state.
>> >
>> > I think that the last sentence is not implemented correctly:
>> > svm_prep_enter_smm() loads the host state into VMCB and
>> > enter_smm_save_state_64() then puts host state where the SMM handler
>> > would expect guest state.
>>
>> That's true. It seems to be easier to switch to the nested guest after
>> we've restored all the basics from the SMM area. I briefly tried to
>> reverse the order, which should make it compliant with 15.22 quoted
>> above and hopefully also with what Intel wants (it's not clear to me
>> whether their "leave VMX operation" in 34.14.1 Default Treatment of
>> SMI Delivery means switching to host state or not). I quickly ran into
>> issues though because entering the nested guest doesn't quite work if
>> the vCPU is still in real mode.
>
> We basically want to keep the L2 state while removing L2 controls.
> I think the current order could be maintained if we copied L2 state into
> L1 before running current SMM routines.  Afterward, we'd copy restored
> L1 state into L2.
>
> It is inefficient, but should be simple. :)

I've been able to make it work like this:

- On SMM entry we save the L2 state into the state-save area.
- Then we exit to L1 and continue with SMM entry as usual.

- On RSM we restore the L2 state from the state-save area (while still
in L1, i.e. is_guest_mode == false).
- Then we re-enter L2.
- Finally we restore the L2 state again (with a minor tweak as we're
no longer switching from real to protected).

If we didn't enter from L2, the second restore is omitted.

Definitely inefficient but can be done reusing the existing routines.
I'll send a v4 shortly.

>> I think that the bootstrap sequence in em_rsm() has to be tweaked to
>> make it possible to call post_leave_smm() before restoring state from
>> the SMM area. I'll look into it.
>>
>> > Do Windows intercept SMI?
>>
>> Yes, but only if started with >1 vCPU.
>
> Interesting, thanks.

I lied, Windows intercepts SMI always. It's just that on single CPU I
was able to get away without emulating the interception for some
reason. Which means that all this restore-after-RSM logic won't be
exercised with Windows guests on AMD. It's still going to be used on
Intel though and the code is common for both.

Thanks!
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2445b2ba26f9..e582b8c9579b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1430,4 +1430,7 @@  static inline int kvm_cpu_get_apicid(int mps_cpu)
 #endif
 }
 
+#define put_smstate(type, buf, offset, val)                      \
+	*(type *)((buf) + (offset) - 0x7e00) = val
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b9f19d715fc3..416ec56d6715 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5401,18 +5401,70 @@  static void svm_setup_mce(struct kvm_vcpu *vcpu)
 
 static int svm_smi_allowed(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	/* Per APM Vol.2 15.22.2 "Response to SMI" */
+	if (!gif_set(svm))
+		return 0;
+
+	if (is_guest_mode(&svm->vcpu) &&
+	    svm->nested.intercept & (1ULL << INTERCEPT_SMI)) {
+		/* TODO: Might need to set exit_info_1 and exit_info_2 here */
+		svm->vmcb->control.exit_code = SVM_EXIT_SMI;
+		svm->nested.exit_required = true;
+		return 0;
+	}
+
 	return 1;
 }
 
 static int svm_prep_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 {
-	/* TODO: Implement */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (is_guest_mode(vcpu)) {
+		/* FED8h - SVM Guest */
+		put_smstate(u64, smstate, 0x7ed8, 1);
+		/* FEE0h - SVM Guest VMCB Physical Address */
+		put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb);
+
+		svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
+		svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
+		svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
+
+		ret = nested_svm_vmexit(svm);
+		if (ret)
+			return ret;
+	}
 	return 0;
 }
 
 static int svm_post_leave_smm(struct kvm_vcpu *vcpu, u64 smbase)
 {
-	/* TODO: Implement */
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *nested_vmcb;
+	struct page *page;
+	struct {
+		u64 guest;
+		u64 vmcb;
+	} svm_state_save;
+	int r;
+
+	/* Temporarily set the SMM flag to access the SMM state-save area */
+	vcpu->arch.hflags |= HF_SMM_MASK;
+	r = kvm_vcpu_read_guest(vcpu, smbase + 0xfed8, &svm_state_save,
+				sizeof(svm_state_save));
+	vcpu->arch.hflags &= ~HF_SMM_MASK;
+	if (r)
+		return r;
+
+	if (svm_state_save.guest) {
+		nested_vmcb = nested_svm_map(svm, svm_state_save.vmcb, &page);
+		if (!nested_vmcb)
+			return 1;
+		enter_svm_guest_mode(svm, svm_state_save.vmcb, nested_vmcb, page);
+	}
 	return 0;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c4c49e8e660..41aa1da599bc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6481,9 +6481,6 @@  static void process_nmi(struct kvm_vcpu *vcpu)
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
 
-#define put_smstate(type, buf, offset, val)			  \
-	*(type *)((buf) + (offset) - 0x7e00) = val
-
 static u32 enter_smm_get_segment_flags(struct kvm_segment *seg)
 {
 	u32 flags = 0;