diff mbox

KVM: VMX: require virtual NMI support

Message ID bc63c9f8-1090-9596-fe9d-49ecf4e60a49@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini Oct. 25, 2017, 8:40 a.m. UTC
On 25/10/2017 10:12, Fabian Grünbichler wrote:
> any update on this? 4.14 is soon out the door, and 4.13 with the
> breaking change just got released as part of Ubuntu 17.10 / Artful..

I had posted a patch on October 10th, but no one replied with test results.

Here is it again:

Comments

Fabian Grünbichler Oct. 25, 2017, 9:07 a.m. UTC | #1
On Wed, Oct 25, 2017 at 10:40:11AM +0200, Paolo Bonzini wrote:
> On 25/10/2017 10:12, Fabian Grünbichler wrote:
> > any update on this? 4.14 is soon out the door, and 4.13 with the
> > breaking change just got released as part of Ubuntu 17.10 / Artful..
> 
> I had posted a patch on October 10th, but no one replied with test results.
> 

sorry, missed the (second) thread.

currently building, will get back with test results ASAP.
Fabian Grünbichler Oct. 27, 2017, 8:38 a.m. UTC | #2
On Wed, Oct 25, 2017 at 11:07:54AM +0200, Fabian Grünbichler wrote:
> On Wed, Oct 25, 2017 at 10:40:11AM +0200, Paolo Bonzini wrote:
> > On 25/10/2017 10:12, Fabian Grünbichler wrote:
> > > any update on this? 4.14 is soon out the door, and 4.13 with the
> > > breaking change just got released as part of Ubuntu 17.10 / Artful..
> > 
> > I had posted a patch on October 10th, but no one replied with test results.
> > 
> 
> sorry, missed the (second) thread.
> 
> currently building, will get back with test results ASAP.

so, while we haven't found any affected hardware in our test lab (not
even in the to-be-discarded pile), feedback from users has been positive
across the board for the following CPUs which were not able to use KVM
without the patch:

- Intel Xeon 3040 (Conroe)
- Intel Xeon 5050 (Dempsey)
- Intel Xeon 5130 (Woodcrest)
- Intel Xeon E3-1256L V2 (Ivy Bridge)
- Intel Core 2 Duo T5600 (Merom)

with no negative reports so far[1]. if you want more detailed
information like vmxcap or cpuinfo, those can probably be requested.

the kernel used is based on Ubuntu's 4.13.0-16.19 kernel, which is based
on 4.13.4.

I'll monitor the thread for more feedback one way or another, and
forward any problems that seem relevant.

thanks!
Paolo Bonzini Oct. 30, 2017, 8:34 a.m. UTC | #3
On 27/10/2017 10:38, Fabian Grünbichler wrote:
> On Wed, Oct 25, 2017 at 11:07:54AM +0200, Fabian Grünbichler wrote:
>> On Wed, Oct 25, 2017 at 10:40:11AM +0200, Paolo Bonzini wrote:
>>> On 25/10/2017 10:12, Fabian Grünbichler wrote:
>>>> any update on this? 4.14 is soon out the door, and 4.13 with the
>>>> breaking change just got released as part of Ubuntu 17.10 / Artful..
>>>
>>> I had posted a patch on October 10th, but no one replied with test results.
>>>
>>
>> sorry, missed the (second) thread.
>>
>> currently building, will get back with test results ASAP.
> 
> so, while we haven't found any affected hardware in our test lab (not
> even in the to-be-discarded pile), feedback from users has been positive
> across the board for the following CPUs which were not able to use KVM
> without the patch:
> 
> - Intel Xeon 3040 (Conroe)
> - Intel Xeon 5050 (Dempsey)
> - Intel Xeon 5130 (Woodcrest)
> - Intel Xeon E3-1256L V2 (Ivy Bridge)

Ivy Bridge definitely worked without the patch.

Can you send me a link to the exact patch you applied (pastebin or
something like that)?

Thanks,

Paolo

> - Intel Core 2 Duo T5600 (Merom)
> 
> with no negative reports so far[1]. if you want more detailed
> information like vmxcap or cpuinfo, those can probably be requested.
> 
> the kernel used is based on Ubuntu's 4.13.0-16.19 kernel, which is based
> on 4.13.4.
> 
> I'll monitor the thread for more feedback one way or another, and
> forward any problems that seem relevant.
> 
> thanks!
>
Fabian Grünbichler Oct. 30, 2017, 8:52 a.m. UTC | #4
On Mon, Oct 30, 2017 at 09:34:00AM +0100, Paolo Bonzini wrote:
> On 27/10/2017 10:38, Fabian Grünbichler wrote:
> > On Wed, Oct 25, 2017 at 11:07:54AM +0200, Fabian Grünbichler wrote:
> >> On Wed, Oct 25, 2017 at 10:40:11AM +0200, Paolo Bonzini wrote:
> >>> On 25/10/2017 10:12, Fabian Grünbichler wrote:
> >>>> any update on this? 4.14 is soon out the door, and 4.13 with the
> >>>> breaking change just got released as part of Ubuntu 17.10 / Artful..
> >>>
> >>> I had posted a patch on October 10th, but no one replied with test results.
> >>>
> >>
> >> sorry, missed the (second) thread.
> >>
> >> currently building, will get back with test results ASAP.
> > 
> > so, while we haven't found any affected hardware in our test lab (not
> > even in the to-be-discarded pile), feedback from users has been positive
> > across the board for the following CPUs which were not able to use KVM
> > without the patch:
> > 
> > - Intel Xeon 3040 (Conroe)
> > - Intel Xeon 5050 (Dempsey)
> > - Intel Xeon 5130 (Woodcrest)
> > - Intel Xeon E3-1256L V2 (Ivy Bridge)
> 
> Ivy Bridge definitely worked without the patch.
> 
> Can you send me a link to the exact patch you applied (pastebin or
> something like that)?
> 

the exact wording was the following, so it is possible that that user
confused an unrelated (kernel or other) issue with the one at hand:

-----%<-----

After rolling back the 5.1 update, and staying on pve 5.0, testing the
kernel posted above seams to work. However, I will be staying with the
4.10.17 kernel for now (and pve 5.0). 

-----%<-----

the switch from 4.10 based to 4.13 based coincides with a minor release
unfortunately. like I said, we don't have any test HW available, so we
have to rely on user reports.

there has been no further feedback so far.

patch is available at http://sprunge.us/ULML
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6ef2940119b..c29bf36485fd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -200,6 +200,10 @@  struct loaded_vmcs {
 	int cpu;
 	bool launched;
 	bool nmi_known_unmasked;
+	/* Support for vnmi-less CPUs */
+	int soft_vnmi_blocked;
+	ktime_t entry_time;
+	s64 vnmi_blocked_time;
 	struct list_head loaded_vmcss_on_cpu_link;
 };
 
@@ -1288,6 +1292,11 @@  static inline bool cpu_has_vmx_invpcid(void)
 		SECONDARY_EXEC_ENABLE_INVPCID;
 }
 
+static inline bool cpu_has_virtual_nmis(void)
+{
+	return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS;
+}
+
 static inline bool cpu_has_vmx_wbinvd_exit(void)
 {
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -3678,9 +3687,9 @@  static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
 				&_vmexit_control) < 0)
 		return -EIO;
 
-	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING |
-		PIN_BASED_VIRTUAL_NMIS;
-	opt = PIN_BASED_POSTED_INTR | PIN_BASED_VMX_PREEMPTION_TIMER;
+	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING;
+	opt = PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR |
+		 PIN_BASED_VMX_PREEMPTION_TIMER;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
 				&_pin_based_exec_control) < 0)
 		return -EIO;
@@ -5531,7 +5540,8 @@  static void enable_irq_window(struct kvm_vcpu *vcpu)
 
 static void enable_nmi_window(struct kvm_vcpu *vcpu)
 {
-	if (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_STI) {
+	if (!cpu_has_virtual_nmis() ||
+	    vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_STI) {
 		enable_irq_window(vcpu);
 		return;
 	}
@@ -5571,6 +5581,19 @@  static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	if (!cpu_has_virtual_nmis()) {
+		/*
+		 * Tracking the NMI-blocked state in software is built upon
+		 * finding the next open IRQ window. This, in turn, depends on
+		 * well-behaving guests: They have to keep IRQs disabled at
+		 * least as long as the NMI handler runs. Otherwise we may
+		 * cause NMI nesting, maybe breaking the guest. But as this is
+		 * highly unlikely, we can live with the residual risk.
+		 */
+		vmx->loaded_vmcs->soft_vnmi_blocked = 1;
+		vmx->loaded_vmcs->vnmi_blocked_time = 0;
+	}
+
 	++vcpu->stat.nmi_injections;
 	vmx->loaded_vmcs->nmi_known_unmasked = false;
 
@@ -5589,6 +5612,8 @@  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	bool masked;
 
+	if (!cpu_has_virtual_nmis())
+		return vmx->loaded_vmcs->soft_vnmi_blocked;
 	if (vmx->loaded_vmcs->nmi_known_unmasked)
 		return false;
 	masked = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & GUEST_INTR_STATE_NMI;
@@ -5600,13 +5625,20 @@  static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-	vmx->loaded_vmcs->nmi_known_unmasked = !masked;
-	if (masked)
-		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
-			      GUEST_INTR_STATE_NMI);
-	else
-		vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
-				GUEST_INTR_STATE_NMI);
+	if (!cpu_has_virtual_nmis()) {
+		if (vmx->loaded_vmcs->soft_vnmi_blocked != masked) {
+			vmx->loaded_vmcs->soft_vnmi_blocked = masked;
+			vmx->loaded_vmcs->vnmi_blocked_time = 0;
+		}
+	} else {
+		vmx->loaded_vmcs->nmi_known_unmasked = !masked;
+		if (masked)
+			vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+				      GUEST_INTR_STATE_NMI);
+		else
+			vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
+					GUEST_INTR_STATE_NMI);
+	}
 }
 
 static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
@@ -5614,6 +5646,10 @@  static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 	if (to_vmx(vcpu)->nested.nested_run_pending)
 		return 0;
 
+	if (!cpu_has_virtual_nmis() &&
+	    to_vmx(vcpu)->loaded_vmcs->soft_vnmi_blocked)
+		return 0;
+
 	return	!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
 		  (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_STI
 		   | GUEST_INTR_STATE_NMI));
@@ -6341,6 +6377,7 @@  static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	 * AAK134, BY25.
 	 */
 	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+			cpu_has_virtual_nmis() &&
 			(exit_qualification & INTR_INFO_UNBLOCK_NMI))
 		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
 
@@ -6813,7 +6850,7 @@  static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx)
 	}
 
 	/* Create a new VMCS */
-	item = kmalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
+	item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
 	if (!item)
 		return NULL;
 	item->vmcs02.vmcs = alloc_vmcs();
@@ -7830,6 +7867,7 @@  static int handle_pml_full(struct kvm_vcpu *vcpu)
 	 * "blocked by NMI" bit has to be set before next VM entry.
 	 */
 	if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+			cpu_has_virtual_nmis() &&
 			(exit_qualification & INTR_INFO_UNBLOCK_NMI))
 		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
 				GUEST_INTR_STATE_NMI);
@@ -8547,6 +8585,25 @@  static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 		return 0;
 	}
 
+	if (unlikely(!cpu_has_virtual_nmis() &&
+		     vmx->loaded_vmcs->soft_vnmi_blocked)) {
+		if (vmx_interrupt_allowed(vcpu)) {
+			vmx->loaded_vmcs->soft_vnmi_blocked = 0;
+		} else if (vmx->loaded_vmcs->vnmi_blocked_time > 1000000000LL &&
+			   vcpu->arch.nmi_pending) {
+			/*
+			 * This CPU don't support us in finding the end of an
+			 * NMI-blocked window if the guest runs with IRQs
+			 * disabled. So we pull the trigger after 1 s of
+			 * futile waiting, but inform the user about this.
+			 */
+			printk(KERN_WARNING "%s: Breaking out of NMI-blocked "
+			       "state on VCPU %d after 1 s timeout\n",
+			       __func__, vcpu->vcpu_id);
+			vmx->loaded_vmcs->soft_vnmi_blocked = 0;
+		}
+	}
+
 	if (exit_reason < kvm_vmx_max_exit_handlers
 	    && kvm_vmx_exit_handlers[exit_reason])
 		return kvm_vmx_exit_handlers[exit_reason](vcpu);
@@ -8830,33 +8887,38 @@  static void vmx_recover_nmi_blocking(struct vcpu_vmx *vmx)
 
 	idtv_info_valid = vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK;
 
-	if (vmx->loaded_vmcs->nmi_known_unmasked)
-		return;
-	/*
-	 * Can't use vmx->exit_intr_info since we're not sure what
-	 * the exit reason is.
-	 */
-	exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
-	unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0;
-	vector = exit_intr_info & INTR_INFO_VECTOR_MASK;
-	/*
-	 * SDM 3: 27.7.1.2 (September 2008)
-	 * Re-set bit "block by NMI" before VM entry if vmexit caused by
-	 * a guest IRET fault.
-	 * SDM 3: 23.2.2 (September 2008)
-	 * Bit 12 is undefined in any of the following cases:
-	 *  If the VM exit sets the valid bit in the IDT-vectoring
-	 *   information field.
-	 *  If the VM exit is due to a double fault.
-	 */
-	if ((exit_intr_info & INTR_INFO_VALID_MASK) && unblock_nmi &&
-	    vector != DF_VECTOR && !idtv_info_valid)
-		vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
-			      GUEST_INTR_STATE_NMI);
-	else
-		vmx->loaded_vmcs->nmi_known_unmasked =
-			!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO)
-			  & GUEST_INTR_STATE_NMI);
+	if (cpu_has_virtual_nmis()) {
+		if (vmx->loaded_vmcs->nmi_known_unmasked)
+			return;
+		/*
+		 * Can't use vmx->exit_intr_info since we're not sure what
+		 * the exit reason is.
+		 */
+		exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+		unblock_nmi = (exit_intr_info & INTR_INFO_UNBLOCK_NMI) != 0;
+		vector = exit_intr_info & INTR_INFO_VECTOR_MASK;
+		/*
+		 * SDM 3: 27.7.1.2 (September 2008)
+		 * Re-set bit "block by NMI" before VM entry if vmexit caused by
+		 * a guest IRET fault.
+		 * SDM 3: 23.2.2 (September 2008)
+		 * Bit 12 is undefined in any of the following cases:
+		 *  If the VM exit sets the valid bit in the IDT-vectoring
+		 *   information field.
+		 *  If the VM exit is due to a double fault.
+		 */
+		if ((exit_intr_info & INTR_INFO_VALID_MASK) && unblock_nmi &&
+		    vector != DF_VECTOR && !idtv_info_valid)
+			vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+				      GUEST_INTR_STATE_NMI);
+		else
+			vmx->loaded_vmcs->nmi_known_unmasked =
+				!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO)
+				  & GUEST_INTR_STATE_NMI);
+	} else if (unlikely(vmx->loaded_vmcs->soft_vnmi_blocked))
+		vmx->loaded_vmcs->vnmi_blocked_time +=
+			ktime_to_ns(ktime_sub(ktime_get(),
+					      vmx->loaded_vmcs->entry_time));
 }
 
 static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu,
@@ -8973,6 +9035,11 @@  static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	unsigned long debugctlmsr, cr3, cr4;
 
+	/* Record the guest's net vcpu time for enforced NMI injections. */
+	if (unlikely(!cpu_has_virtual_nmis() &&
+		     vmx->loaded_vmcs->soft_vnmi_blocked))
+		vmx->loaded_vmcs->entry_time = ktime_get();
+
 	/* Don't enter VMX if guest state is invalid, let the exit handler
 	   start emulation until we arrive back to a valid state */
 	if (vmx->emulation_required)