diff mbox series

[3/3,V2] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device

Message ID 20210526172014.GA29007@fuller.cnet (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Marcelo Tosatti May 26, 2021, 5:20 p.m. UTC
For VMX, when a vcpu enters HLT emulation, pi_post_block will:

1) Add vcpu to per-cpu list of blocked vcpus.

2) Program the posted-interrupt descriptor "notification vector" 
to POSTED_INTR_WAKEUP_VECTOR

With interrupt remapping, an interrupt will set the PIR bit for the 
vector programmed for the device on the CPU, test-and-set the 
ON bit on the posted interrupt descriptor, and if the ON bit is clear
generate an interrupt for the notification vector.

This way, the target CPU wakes upon a device interrupt and wakes up
the target vcpu.

Problem is that pi_post_block only programs the notification vector
if kvm_arch_has_assigned_device() is true. Its possible for the
following to happen:

1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false,
notification vector is not programmed
2) device is assigned to VM
3) device interrupts vcpu V, sets ON bit
(notification vector not programmed, so pcpu P remains in idle)
4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set,
kvm_vcpu_kick is skipped
5) vcpu 0 busy spins on vcpu V's response for several seconds, until
RCU watchdog NMIs all vCPUs.

To fix this, use the start_assignment kvm_x86_ops callback to kick
vcpus out of the halt loop, so the notification vector is 
properly reprogrammed to the wakeup vector.

Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

For build error:
Reported-by: kernel test robot <lkp@intel.com>

---

v2: Add brief comment to vmx_pi_start_assignment (Peter Xu).
    Export kvm_make_all_cpus_request (kernel test robot).
diff mbox series

Patch

Index: linux-2.6/arch/x86/kvm/vmx/posted_intr.c
===================================================================
--- linux-2.6.orig/arch/x86/kvm/vmx/posted_intr.c
+++ linux-2.6/arch/x86/kvm/vmx/posted_intr.c
@@ -238,6 +238,20 @@  bool pi_has_pending_interrupt(struct kvm
 
 
 /*
+ * Bail out of the block loop if the VM has an assigned
+ * device, but the blocking vCPU didn't reconfigure the
+ * PI.NV to the wakeup vector, i.e. the assigned device
+ * came along after the initial check in pi_pre_block().
+ */
+void vmx_pi_start_assignment(struct kvm *kvm)
+{
+	if (!irq_remapping_cap(IRQ_POSTING_CAP))
+		return;
+
+	kvm_make_all_cpus_request(kvm, KVM_REQ_UNBLOCK);
+}
+
+/*
  * pi_update_irte - set IRTE for Posted-Interrupts
  *
  * @kvm: kvm
Index: linux-2.6/arch/x86/kvm/vmx/posted_intr.h
===================================================================
--- linux-2.6.orig/arch/x86/kvm/vmx/posted_intr.h
+++ linux-2.6/arch/x86/kvm/vmx/posted_intr.h
@@ -95,5 +95,6 @@  void __init pi_init_cpu(int cpu);
 bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
 		   bool set);
+void vmx_pi_start_assignment(struct kvm *kvm);
 
 #endif /* __KVM_X86_VMX_POSTED_INTR_H */
Index: linux-2.6/arch/x86/kvm/vmx/vmx.c
===================================================================
--- linux-2.6.orig/arch/x86/kvm/vmx/vmx.c
+++ linux-2.6/arch/x86/kvm/vmx/vmx.c
@@ -7732,6 +7732,7 @@  static struct kvm_x86_ops vmx_x86_ops __
 	.nested_ops = &vmx_nested_ops,
 
 	.update_pi_irte = pi_update_irte,
+	.start_assignment = vmx_pi_start_assignment,
 
 #ifdef CONFIG_X86_64
 	.set_hv_timer = vmx_set_hv_timer,
Index: linux-2.6/virt/kvm/kvm_main.c
===================================================================
--- linux-2.6.orig/virt/kvm/kvm_main.c
+++ linux-2.6/virt/kvm/kvm_main.c
@@ -307,6 +307,7 @@  bool kvm_make_all_cpus_request(struct kv
 {
 	return kvm_make_all_cpus_request_except(kvm, req, NULL);
 }
+EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request);
 
 #ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
 void kvm_flush_remote_tlbs(struct kvm *kvm)