@@ -70,13 +70,10 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
/*
* If the vCPU wasn't on the wakeup list and wasn't migrated, then the
* full update can be skipped as neither the vector nor the destination
- * needs to be changed.
+ * needs to be changed. Clear SN even if there is no assigned device,
+ * again for simplicity.
*/
if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR && vcpu->cpu == cpu) {
- /*
- * Clear SN if it was set due to being preempted. Again, do
- * this even if there is no assigned device for simplicity.
- */
if (pi_test_and_clear_sn(pi_desc))
goto after_clear_sn;
return;
@@ -200,15 +197,22 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
if (!vmx_needs_pi_wakeup(vcpu))
return;
- if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu))
+ /*
+ * If the vCPU is blocking with IRQs enabled and ISN'T being preempted,
+ * enable the wakeup handler so that notification IRQ wakes the vCPU as
+ * expected. There is no need to enable the wakeup handler if the vCPU
+ * is preempted between setting its wait state and manually scheduling
+ * out, as the task is still runnable, i.e. doesn't need a wake event
+ * from KVM to be scheduled in.
+ *
+ * If the wakeup handler isn't being enabled, Suppress Notifications as
+ * the cost of propagating PIR.IRR to PID.ON is negligible compared to
+ * the cost of a spurious IRQ, and vCPU put/load is a slow path.
+ */
+ if (!vcpu->preempted && kvm_vcpu_is_blocking(vcpu) &&
+ !vmx_interrupt_blocked(vcpu))
pi_enable_wakeup_handler(vcpu);
-
- /*
- * Set SN when the vCPU is preempted. Note, the vCPU can both be seen
- * as blocking and preempted, e.g. if it's preempted between setting
- * its wait state and manually scheduling out.
- */
- if (vcpu->preempted)
+ else
pi_set_sn(pi_desc);
}
Suppress posted interrupt notifications (set PID.SN=1) whenever the vCPU is put, i.e. unloaded, not just when the vCPU is preempted, as KVM doesn't do anything in response to a notification IRQ that arrives in the host, nor does KVM rely on the Outstanding Notification (PID.ON) flag when the vCPU is unloaded. And, the cost of scanning the PIR to manually set PID.ON when loading the vCPU is quite small, especially relative to the cost of loading (and unloading) a vCPU. On the flip side, leaving SN clear means a notification for the vCPU will result in a spurious IRQ for the pCPU, even if vCPU task is scheduled out, running in userspace, etc. Even worse, if the pCPU is running a different vCPU, the spurious IRQ could trigger posted interrupt processing for the wrong vCPU, which is technically a violation of the architecture, as setting bits in PIR aren't supposed to be propagated to the vIRR until a notification IRQ is received. The saving grace of the current behavior is that hardware sends notification interrupts if and only if PID.ON=0, i.e. only the first posted interrupt for a vCPU will trigger a spurious IRQ (for each window where the vCPU is unloaded). Ideally, KVM would suppress notifications before enabling IRQs in the VM-Exit, but KVM relies on PID.ON as an indicator that there is a posted interrupt pending in PIR, e.g. in vmx_sync_pir_to_irr(), and sadly there is no way to ask hardware to set PID.ON, but not generate an interrupt. That could be solved by using pi_has_pending_interrupt() instead of checking only PID.ON, but it's not at all clear that would be a performance win, as KVM would end up scanning the entire PIR whenever an interrupt isn't pending. And long term, the spurious IRQ window, i.e. where a vCPU is loaded with IRQs enabled, can effectively be made smaller for hot paths by moving performance critical VM-Exit handlers into the fastpath, i.e. by never enabling IRQs for hot path VM-Exits. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/vmx/posted_intr.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-)