Message ID | 20210525134321.345140341@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | VMX: configure posted interrupt descriptor when assigning device (v5) | expand |
On Tue, May 25, 2021 at 10:41:18AM -0300, Marcelo Tosatti wrote: > For VMX, when a vcpu enters HLT emulation, pi_post_block will: > > 1) Add vcpu to per-cpu list of blocked vcpus. > > 2) Program the posted-interrupt descriptor "notification vector" > to POSTED_INTR_WAKEUP_VECTOR > > With interrupt remapping, an interrupt will set the PIR bit for the > vector programmed for the device on the CPU, test-and-set the > ON bit on the posted interrupt descriptor, and if the ON bit is clear > generate an interrupt for the notification vector. > > This way, the target CPU wakes upon a device interrupt and wakes up > the target vcpu. > > Problem is that pi_post_block only programs the notification vector > if kvm_arch_has_assigned_device() is true. Its possible for the > following to happen: > > 1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false, > notification vector is not programmed > 2) device is assigned to VM > 3) device interrupts vcpu V, sets ON bit > (notification vector not programmed, so pcpu P remains in idle) > 4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set, > kvm_vcpu_kick is skipped > 5) vcpu 0 busy spins on vcpu V's response for several seconds, until > RCU watchdog NMIs all vCPUs. > > To fix this, use the start_assignment kvm_x86_ops callback to kick > vcpus out of the halt loop, so the notification vector is > properly reprogrammed to the wakeup vector. > > Reported-by: Pei Zhang <pezhang@redhat.com> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > Index: kvm/arch/x86/kvm/vmx/posted_intr.c > =================================================================== > --- kvm.orig/arch/x86/kvm/vmx/posted_intr.c > +++ kvm/arch/x86/kvm/vmx/posted_intr.c > @@ -236,6 +236,13 @@ bool pi_has_pending_interrupt(struct kvm > (pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc)); > } > > +void vmx_pi_start_assignment(struct kvm *kvm) > +{ > + if (!irq_remapping_cap(IRQ_POSTING_CAP)) > + return; > + > + kvm_make_all_cpus_request(kvm, KVM_REQ_UNBLOCK); Shall we add a simple comment block explaining why we need this? > +} The patch itself looks right to me. Reviewed-by: Peter Xu <peterx@redhat.com> Thanks,
Hi Marcelo,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on kvm/queue]
[also build test ERROR on vhost/linux-next v5.13-rc3 next-20210525]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/VMX-configure-posted-interrupt-descriptor-when-assigning-device-v5/20210525-215604
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: x86_64-rhel-8.3 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/e515e68f330d3787af0952dcfb3e0fbf2d9b9f06
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Marcelo-Tosatti/VMX-configure-posted-interrupt-descriptor-when-assigning-device-v5/20210525-215604
git checkout e515e68f330d3787af0952dcfb3e0fbf2d9b9f06
# save the attached .config to linux build tree
make W=1 ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "kvm_make_all_cpus_request" [arch/x86/kvm/kvm-intel.ko] undefined!
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Index: kvm/arch/x86/kvm/vmx/posted_intr.c =================================================================== --- kvm.orig/arch/x86/kvm/vmx/posted_intr.c +++ kvm/arch/x86/kvm/vmx/posted_intr.c @@ -236,6 +236,13 @@ bool pi_has_pending_interrupt(struct kvm (pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc)); } +void vmx_pi_start_assignment(struct kvm *kvm) +{ + if (!irq_remapping_cap(IRQ_POSTING_CAP)) + return; + + kvm_make_all_cpus_request(kvm, KVM_REQ_UNBLOCK); +} /* * pi_update_irte - set IRTE for Posted-Interrupts Index: kvm/arch/x86/kvm/vmx/posted_intr.h =================================================================== --- kvm.orig/arch/x86/kvm/vmx/posted_intr.h +++ kvm/arch/x86/kvm/vmx/posted_intr.h @@ -95,5 +95,6 @@ void __init pi_init_cpu(int cpu); bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu); int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, bool set); +void vmx_pi_start_assignment(struct kvm *kvm); #endif /* __KVM_X86_VMX_POSTED_INTR_H */ Index: kvm/arch/x86/kvm/vmx/vmx.c =================================================================== --- kvm.orig/arch/x86/kvm/vmx/vmx.c +++ kvm/arch/x86/kvm/vmx/vmx.c @@ -7732,6 +7732,7 @@ static struct kvm_x86_ops vmx_x86_ops __ .nested_ops = &vmx_nested_ops, .update_pi_irte = pi_update_irte, + .start_assignment = vmx_pi_start_assignment, #ifdef CONFIG_X86_64 .set_hv_timer = vmx_set_hv_timer,
For VMX, when a vcpu enters HLT emulation, pi_post_block will: 1) Add vcpu to per-cpu list of blocked vcpus. 2) Program the posted-interrupt descriptor "notification vector" to POSTED_INTR_WAKEUP_VECTOR With interrupt remapping, an interrupt will set the PIR bit for the vector programmed for the device on the CPU, test-and-set the ON bit on the posted interrupt descriptor, and if the ON bit is clear generate an interrupt for the notification vector. This way, the target CPU wakes upon a device interrupt and wakes up the target vcpu. Problem is that pi_post_block only programs the notification vector if kvm_arch_has_assigned_device() is true. Its possible for the following to happen: 1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false, notification vector is not programmed 2) device is assigned to VM 3) device interrupts vcpu V, sets ON bit (notification vector not programmed, so pcpu P remains in idle) 4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set, kvm_vcpu_kick is skipped 5) vcpu 0 busy spins on vcpu V's response for several seconds, until RCU watchdog NMIs all vCPUs. To fix this, use the start_assignment kvm_x86_ops callback to kick vcpus out of the halt loop, so the notification vector is properly reprogrammed to the wakeup vector. Reported-by: Pei Zhang <pezhang@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>