diff mbox series

[v3,13/13] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

Message ID 20220803155011.43721-14-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series SMM emulation and interrupt shadow fixes | expand

Commit Message

Maxim Levitsky Aug. 3, 2022, 3:50 p.m. UTC
When #SMI is asserted, the CPU can be in interrupt shadow
due to sti or mov ss.

It is not mandatory in  Intel/AMD prm to have the #SMI
blocked during the shadow, and on top of
that, since neither SVM nor VMX has true support for SMI
window, waiting for one instruction would mean single stepping
the guest.

Instead, allow #SMI in this case, but both reset the interrupt
window and stash its value in SMRAM to restore it on exit
from SMM.

This fixes rare failures seen mostly on windows guests on VMX,
when #SMI falls on the sti instruction which mainfest in
VM entry failure due to EFLAGS.IF not being set, but STI interrupt
window still being set in the VMCS.


Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/emulate.c     | 17 ++++++++++++++---
 arch/x86/kvm/kvm_emulate.h | 10 ++++++----
 arch/x86/kvm/x86.c         | 12 ++++++++++++
 3 files changed, 32 insertions(+), 7 deletions(-)

Comments

Sean Christopherson Aug. 24, 2022, 11:50 p.m. UTC | #1
On Wed, Aug 03, 2022, Maxim Levitsky wrote:
> @@ -518,7 +519,8 @@ struct kvm_smram_state_32 {
>  	u32 reserved1[62];
>  	u32 smbase;
>  	u32 smm_revision;
> -	u32 reserved2[5];
> +	u32 reserved2[4];
> +	u32 int_shadow; /* KVM extension */

Looking at this with fresh(er) eyes, I agree with Jim: KVM shouldn't add its own
fields in SMRAM.  There's no need to use vmcb/vmcs memory either, just add fields
in kvm_vcpu_arch to save/restore the state across SMI/RSM, and then borrow VMX's
approach of supporting migration by adding flags to do out-of-band migration,
e.g. KVM_STATE_NESTED_SMM_STI_BLOCKING and KVM_STATE_NESTED_SMM_MOV_SS_BLOCKING.

	/* SMM state that's not saved in SMRAM. */
	struct {
		struct {
			u8 interruptibility;
		} smm;
	} nested;

That'd finally give us an excuse to move nested_run_pending to common code too :-)
Maxim Levitsky Aug. 25, 2022, 10:13 a.m. UTC | #2
On Wed, 2022-08-24 at 23:50 +0000, Sean Christopherson wrote:
> On Wed, Aug 03, 2022, Maxim Levitsky wrote:
> > @@ -518,7 +519,8 @@ struct kvm_smram_state_32 {
> >  	u32 reserved1[62];
> >  	u32 smbase;
> >  	u32 smm_revision;
> > -	u32 reserved2[5];
> > +	u32 reserved2[4];
> > +	u32 int_shadow; /* KVM extension */
> 
> Looking at this with fresh(er) eyes, I agree with Jim: KVM shouldn't add its own
> fields in SMRAM.  There's no need to use vmcb/vmcs memory either, just add fields
> in kvm_vcpu_arch to save/restore the state across SMI/RSM, and then borrow VMX's
> approach of supporting migration by adding flags to do out-of-band migration,
> e.g. KVM_STATE_NESTED_SMM_STI_BLOCKING and KVM_STATE_NESTED_SMM_MOV_SS_BLOCKING.
> 
> 	/* SMM state that's not saved in SMRAM. */
> 	struct {
> 		struct {
> 			u8 interruptibility;
> 		} smm;
> 	} nested;
> 
> That'd finally give us an excuse to move nested_run_pending to common code too :-)
> 
Paolo told me that he wants it to be done this way (save the state in the smram).

My first version of this patch was actually saving the state in kvm internal state,
I personally don't mind that much if to do it this way or another.

But note that I can't use nested state - the int shadow thing has nothing to do with
nesting.

I think that 'struct kvm_vcpu_events' is the right place for this, and in fact it already
has interrupt.shadow (which btw Qemu doesn't migrate...)

My approach was to use upper 4 bits of 'interrupt.shadow' since it is hightly unlikely
that we will ever see more that 16 different interrupt shadows.

It would be a bit more clean to put it into the 'smi' substruct, but we already
have the 'triple_fault' afterwards 

(but I think that this was very recent addition - maybe it is not too late?)

A new 'KVM_VCPUEVENT_VALID_SMM_SHADOW' flag can be added to the struct to indicate the
extra bits if you want.

Best regards,
	Maxim Levitsky
Sean Christopherson Aug. 25, 2022, 3:44 p.m. UTC | #3
On Thu, Aug 25, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-24 at 23:50 +0000, Sean Christopherson wrote:
> > On Wed, Aug 03, 2022, Maxim Levitsky wrote:
> > > @@ -518,7 +519,8 @@ struct kvm_smram_state_32 {
> > >  	u32 reserved1[62];
> > >  	u32 smbase;
> > >  	u32 smm_revision;
> > > -	u32 reserved2[5];
> > > +	u32 reserved2[4];
> > > +	u32 int_shadow; /* KVM extension */
> > 
> > Looking at this with fresh(er) eyes, I agree with Jim: KVM shouldn't add its own
> > fields in SMRAM.  There's no need to use vmcb/vmcs memory either, just add fields
> > in kvm_vcpu_arch to save/restore the state across SMI/RSM, and then borrow VMX's
> > approach of supporting migration by adding flags to do out-of-band migration,
> > e.g. KVM_STATE_NESTED_SMM_STI_BLOCKING and KVM_STATE_NESTED_SMM_MOV_SS_BLOCKING.
> > 
> > 	/* SMM state that's not saved in SMRAM. */
> > 	struct {
> > 		struct {
> > 			u8 interruptibility;
> > 		} smm;
> > 	} nested;
> > 
> > That'd finally give us an excuse to move nested_run_pending to common code too :-)
> > 
> Paolo told me that he wants it to be done this way (save the state in the
> smram).

Paolo, what's the motivation for using SMRAM?  I don't see any obvious advantage
for KVM.  QEMU apparently would need to migrate interrupt.shadow, but QEMU should
be doing that anyways, no?

> My first version of this patch was actually saving the state in kvm internal
> state, I personally don't mind that much if to do it this way or another.
> 
> But note that I can't use nested state - the int shadow thing has nothing to
> do with nesting.

Oh, duh.

> I think that 'struct kvm_vcpu_events' is the right place for this, and in fact it already
> has interrupt.shadow (which btw Qemu doesn't migrate...)
> 
> My approach was to use upper 4 bits of 'interrupt.shadow' since it is hightly unlikely
> that we will ever see more that 16 different interrupt shadows.

Heh, unless we ensure STI+MOVSS are mutually exclusive... s/16/4, because
KVM_X86_SHADOW_INT_* are currently treated as masks, not values.

Pedantry aside, using interrupt.shadow definitely seems like the way to go.  We
wouldn't even technically need to use the upper four bits since the bits are KVM
controlled and not hardware-defined, though I agree that using bits 5 and 6 would
give us more flexibility if we ever need to convert the masks to values.

> It would be a bit more clean to put it into the 'smi' substruct, but we already
> have the 'triple_fault' afterwards 
> 
> (but I think that this was very recent addition - maybe it is not too late?)
> 
> A new 'KVM_VCPUEVENT_VALID_SMM_SHADOW' flag can be added to the struct to indicate the
> extra bits if you want.
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4bdbc5893a1657..b4bc45cec3249d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2447,7 +2447,7 @@  static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
 			     const struct kvm_smram_state_32 *smstate)
 {
 	struct desc_ptr dt;
-	int i;
+	int i, r;
 
 	ctxt->eflags =  smstate->eflags | X86_EFLAGS_FIXED;
 	ctxt->_eip =  smstate->eip;
@@ -2482,8 +2482,16 @@  static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
 
 	ctxt->ops->set_smbase(ctxt, smstate->smbase);
 
-	return rsm_enter_protected_mode(ctxt, smstate->cr0,
-					smstate->cr3, smstate->cr4);
+	r = rsm_enter_protected_mode(ctxt, smstate->cr0,
+				     smstate->cr3, smstate->cr4);
+
+	if (r != X86EMUL_CONTINUE)
+		return r;
+
+	ctxt->ops->set_int_shadow(ctxt, 0);
+	ctxt->interruptibility = (u8)smstate->int_shadow;
+
+	return X86EMUL_CONTINUE;
 }
 
 #ifdef CONFIG_X86_64
@@ -2532,6 +2540,9 @@  static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
 	rsm_load_seg_64(ctxt, &smstate->fs, VCPU_SREG_FS);
 	rsm_load_seg_64(ctxt, &smstate->gs, VCPU_SREG_GS);
 
+	ctxt->ops->set_int_shadow(ctxt, 0);
+	ctxt->interruptibility = (u8)smstate->int_shadow;
+
 	return X86EMUL_CONTINUE;
 }
 #endif
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 76c0b8e7890b5d..a7313add0f2a58 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -234,6 +234,7 @@  struct x86_emulate_ops {
 	bool (*guest_has_rdpid)(struct x86_emulate_ctxt *ctxt);
 
 	void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
+	void (*set_int_shadow)(struct x86_emulate_ctxt *ctxt, u8 shadow);
 
 	unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);
 	void (*exiting_smm)(struct x86_emulate_ctxt *ctxt);
@@ -518,7 +519,8 @@  struct kvm_smram_state_32 {
 	u32 reserved1[62];
 	u32 smbase;
 	u32 smm_revision;
-	u32 reserved2[5];
+	u32 reserved2[4];
+	u32 int_shadow; /* KVM extension */
 	u32 cr4; /* CR4 is not present in Intel/AMD SMRAM image */
 	u32 reserved3[5];
 
@@ -566,6 +568,7 @@  static inline void __check_smram32_offsets(void)
 	__CHECK_SMRAM32_OFFSET(smbase,		0xFEF8);
 	__CHECK_SMRAM32_OFFSET(smm_revision,	0xFEFC);
 	__CHECK_SMRAM32_OFFSET(reserved2,	0xFF00);
+	__CHECK_SMRAM32_OFFSET(int_shadow,	0xFF10);
 	__CHECK_SMRAM32_OFFSET(cr4,		0xFF14);
 	__CHECK_SMRAM32_OFFSET(reserved3,	0xFF18);
 	__CHECK_SMRAM32_OFFSET(ds,		0xFF2C);
@@ -625,7 +628,7 @@  struct kvm_smram_state_64 {
 	u64 io_restart_rsi;
 	u64 io_restart_rdi;
 	u32 io_restart_dword;
-	u32 reserved1;
+	u32 int_shadow;
 	u8 io_inst_restart;
 	u8 auto_hlt_restart;
 	u8 reserved2[6];
@@ -663,7 +666,6 @@  struct kvm_smram_state_64 {
 	u64 gprs[16]; /* GPRS in a reversed "natural" X86 order (R15/R14/../RCX/RAX.) */
 };
 
-
 static inline void __check_smram64_offsets(void)
 {
 #define __CHECK_SMRAM64_OFFSET(field, offset) \
@@ -684,7 +686,7 @@  static inline void __check_smram64_offsets(void)
 	__CHECK_SMRAM64_OFFSET(io_restart_rsi,		0xFEB0);
 	__CHECK_SMRAM64_OFFSET(io_restart_rdi,		0xFEB8);
 	__CHECK_SMRAM64_OFFSET(io_restart_dword,	0xFEC0);
-	__CHECK_SMRAM64_OFFSET(reserved1,		0xFEC4);
+	__CHECK_SMRAM64_OFFSET(int_shadow,		0xFEC4);
 	__CHECK_SMRAM64_OFFSET(io_inst_restart,		0xFEC8);
 	__CHECK_SMRAM64_OFFSET(auto_hlt_restart,	0xFEC9);
 	__CHECK_SMRAM64_OFFSET(reserved2,		0xFECA);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e3ef63baf83df..ae4c20cec7a9fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8041,6 +8041,11 @@  static void emulator_set_nmi_mask(struct x86_emulate_ctxt *ctxt, bool masked)
 	static_call(kvm_x86_set_nmi_mask)(emul_to_vcpu(ctxt), masked);
 }
 
+static void emulator_set_int_shadow(struct x86_emulate_ctxt *ctxt, u8 shadow)
+{
+	 static_call(kvm_x86_set_interrupt_shadow)(emul_to_vcpu(ctxt), shadow);
+}
+
 static unsigned emulator_get_hflags(struct x86_emulate_ctxt *ctxt)
 {
 	return emul_to_vcpu(ctxt)->arch.hflags;
@@ -8121,6 +8126,7 @@  static const struct x86_emulate_ops emulate_ops = {
 	.guest_has_fxsr      = emulator_guest_has_fxsr,
 	.guest_has_rdpid     = emulator_guest_has_rdpid,
 	.set_nmi_mask        = emulator_set_nmi_mask,
+	.set_int_shadow      = emulator_set_int_shadow,
 	.get_hflags          = emulator_get_hflags,
 	.exiting_smm         = emulator_exiting_smm,
 	.leave_smm           = emulator_leave_smm,
@@ -9903,6 +9909,8 @@  static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, struct kvm_smram_stat
 	smram->cr4 = kvm_read_cr4(vcpu);
 	smram->smm_revision = 0x00020000;
 	smram->smbase = vcpu->arch.smbase;
+
+	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
 }
 
 #ifdef CONFIG_X86_64
@@ -9951,6 +9959,8 @@  static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, struct kvm_smram_stat
 	enter_smm_save_seg_64(vcpu, &smram->ds, VCPU_SREG_DS);
 	enter_smm_save_seg_64(vcpu, &smram->fs, VCPU_SREG_FS);
 	enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
+
+	smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
 }
 #endif
 
@@ -9987,6 +9997,8 @@  static void enter_smm(struct kvm_vcpu *vcpu)
 	kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
 	kvm_rip_write(vcpu, 0x8000);
 
+	static_call(kvm_x86_set_interrupt_shadow)(vcpu, 0);
+
 	cr0 = vcpu->arch.cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0_PG);
 	static_call(kvm_x86_set_cr0)(vcpu, cr0);
 	vcpu->arch.cr0 = cr0;