Message ID | 20210510172818.025080848@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | VMX: configure posted interrupt descriptor when assigning device (v3) | expand |
On 10/05/21 19:26, Marcelo Tosatti wrote: > +void vmx_pi_start_assignment(struct kvm *kvm) > +{ > + struct kvm_vcpu *vcpu; > + int i; > + > + if (!irq_remapping_cap(IRQ_POSTING_CAP)) > + return; > + > + /* > + * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and > + * go back through vcpu_block(). > + */ > + kvm_for_each_vcpu(i, vcpu, kvm) { > + if (!kvm_vcpu_apicv_active(vcpu)) > + continue; > + > + kvm_vcpu_wake_up(vcpu); Would you still need the check_block callback, if you also added a kvm_make_request(KVM_REQ_EVENT)? In fact, since this is entirely not a hot path, can you just do kvm_make_all_cpus_request(kvm, KVM_REQ_EVENT) instead of this loop? Thanks, Paolo > + } > +} > > /* > * pi_update_irte - set IRTE for Posted-Interrupts > Index: kvm/arch/x86/kvm/vmx/posted_intr.h > =================================================================== > --- kvm.orig/arch/x86/kvm/vmx/posted_intr.h > +++ kvm/arch/x86/kvm/vmx/posted_intr.h > @@ -95,5 +95,7 @@ void __init pi_init_cpu(int cpu); > bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu); > int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, > bool set); > +void vmx_pi_start_assignment(struct kvm *kvm); > +int vmx_vcpu_check_block(struct kvm_vcpu *vcpu); > > #endif /* __KVM_X86_VMX_POSTED_INTR_H */ > Index: kvm/arch/x86/kvm/vmx/vmx.c > =================================================================== > --- kvm.orig/arch/x86/kvm/vmx/vmx.c > +++ kvm/arch/x86/kvm/vmx/vmx.c > @@ -7727,13 +7727,13 @@ static struct kvm_x86_ops vmx_x86_ops __ > > .pre_block = vmx_pre_block, > .post_block = vmx_post_block, > - .vcpu_check_block = NULL, > + .vcpu_check_block = vmx_vcpu_check_block, > > .pmu_ops = &intel_pmu_ops, > .nested_ops = &vmx_nested_ops, > > .update_pi_irte = pi_update_irte, > - .start_assignment = NULL, > + .start_assignment = vmx_pi_start_assignment, > > #ifdef CONFIG_X86_64 > .set_hv_timer = vmx_set_hv_timer, > >
On Mon, May 24, 2021 at 05:55:18PM +0200, Paolo Bonzini wrote: > On 10/05/21 19:26, Marcelo Tosatti wrote: > > +void vmx_pi_start_assignment(struct kvm *kvm) > > +{ > > + struct kvm_vcpu *vcpu; > > + int i; > > + > > + if (!irq_remapping_cap(IRQ_POSTING_CAP)) > > + return; > > + > > + /* > > + * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and > > + * go back through vcpu_block(). > > + */ > > + kvm_for_each_vcpu(i, vcpu, kvm) { > > + if (!kvm_vcpu_apicv_active(vcpu)) > > + continue; > > + > > + kvm_vcpu_wake_up(vcpu); > > Would you still need the check_block callback, if you also added a > kvm_make_request(KVM_REQ_EVENT)? > > In fact, since this is entirely not a hot path, can you just do > kvm_make_all_cpus_request(kvm, KVM_REQ_EVENT) instead of this loop? > > Thanks, > > Paolo Hi Paolo, Don't think so: int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu); } static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) { int ret = -EINTR; int idx = srcu_read_lock(&vcpu->kvm->srcu); if (kvm_arch_vcpu_runnable(vcpu)) { kvm_make_request(KVM_REQ_UNHALT, vcpu); <---- don't want KVM_REQ_UNHALT goto out; } if (kvm_cpu_has_pending_timer(vcpu)) goto out; if (signal_pending(current)) goto out; ret = 0; out: srcu_read_unlock(&vcpu->kvm->srcu, idx); return ret; } See previous discussion: Date: Wed, 12 May 2021 14:41:56 +0000 From: Sean Christopherson <seanjc@google.com> To: Marcelo Tosatti <mtosatti@redhat.com> Cc: Peter Xu <peterx@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org, Alex Williamson <alex.williamson@redhat.com>, Pei Zhang <pezhang@redhat.com> Subject: Re: [patch 4/4] KVM: VMX: update vcpu posted-interrupt descriptor when assigning device On Tue, May 11, 2021, Marcelo Tosatti wrote: > > The KVM_REQ_UNBLOCK patch will resume execution even any such event > > even without any such event > > > occuring. So the behaviour would be different from baremetal. I agree with Marcelo, we don't want to spuriously unhalt the vCPU. It's legal, albeit risky, to do something like hlt /* #UD to triple fault if this CPU is awakened. */ ud2 when offlining a CPU, in which case the spurious wake event will crash the guest.
On 24/05/21 19:53, Marcelo Tosatti wrote: > On Mon, May 24, 2021 at 05:55:18PM +0200, Paolo Bonzini wrote: >> On 10/05/21 19:26, Marcelo Tosatti wrote: >>> +void vmx_pi_start_assignment(struct kvm *kvm) >>> +{ >>> + struct kvm_vcpu *vcpu; >>> + int i; >>> + >>> + if (!irq_remapping_cap(IRQ_POSTING_CAP)) >>> + return; >>> + >>> + /* >>> + * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and >>> + * go back through vcpu_block(). >>> + */ >>> + kvm_for_each_vcpu(i, vcpu, kvm) { >>> + if (!kvm_vcpu_apicv_active(vcpu)) >>> + continue; >>> + >>> + kvm_vcpu_wake_up(vcpu); >> >> Would you still need the check_block callback, if you also added a >> kvm_make_request(KVM_REQ_EVENT)? >> >> In fact, since this is entirely not a hot path, can you just do >> kvm_make_all_cpus_request(kvm, KVM_REQ_EVENT) instead of this loop? >> >> Thanks, >> >> Paolo > > Hi Paolo, > > Don't think so: > > static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu) > { > int ret = -EINTR; > int idx = srcu_read_lock(&vcpu->kvm->srcu); > > if (kvm_arch_vcpu_runnable(vcpu)) { > kvm_make_request(KVM_REQ_UNHALT, vcpu); <---- don't want KVM_REQ_UNHALT UNHALT is incorrect indeed, but requests don't have to unhalt the vCPU. This case is somewhat similar to signal_pending(), where the next KVM_RUN ioctl resumes the halt. It's also similar to KVM_REQ_PENDING_TIMER. So you can: - rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK except in arch/powerpc, where instead you add KVM_REQ_PENDING_TIMER to arch/powerpc/include/asm/kvm_host.h - here, you add if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu)) goto out; - then vmx_pi_start_assignment only needs to if (!irq_remapping_cap(IRQ_POSTING_CAP)) return; kvm_make_all_cpus_request(kvm, KVM_REQ_UNBLOCK); kvm_arch_vcpu_runnable() would still return false, so the mp_state would not change. Paolo
Index: kvm/arch/x86/kvm/vmx/posted_intr.c =================================================================== --- kvm.orig/arch/x86/kvm/vmx/posted_intr.c +++ kvm/arch/x86/kvm/vmx/posted_intr.c @@ -204,6 +204,32 @@ void pi_post_block(struct kvm_vcpu *vcpu } /* + * Bail out of the block loop if the VM has an assigned + * device, but the blocking vCPU didn't reconfigure the + * PI.NV to the wakeup vector, i.e. the assigned device + * came along after the initial check in vcpu_block(). + */ + +int vmx_vcpu_check_block(struct kvm_vcpu *vcpu) +{ + struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); + + if (!irq_remapping_cap(IRQ_POSTING_CAP)) + return 0; + + if (!kvm_vcpu_apicv_active(vcpu)) + return 0; + + if (!kvm_arch_has_assigned_device(vcpu->kvm)) + return 0; + + if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) + return 0; + + return 1; +} + +/* * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR. */ void pi_wakeup_handler(void) @@ -236,6 +262,25 @@ bool pi_has_pending_interrupt(struct kvm (pi_test_sn(pi_desc) && !pi_is_pir_empty(pi_desc)); } +void vmx_pi_start_assignment(struct kvm *kvm) +{ + struct kvm_vcpu *vcpu; + int i; + + if (!irq_remapping_cap(IRQ_POSTING_CAP)) + return; + + /* + * Wakeup will cause the vCPU to bail out of kvm_vcpu_block() and + * go back through vcpu_block(). + */ + kvm_for_each_vcpu(i, vcpu, kvm) { + if (!kvm_vcpu_apicv_active(vcpu)) + continue; + + kvm_vcpu_wake_up(vcpu); + } +} /* * pi_update_irte - set IRTE for Posted-Interrupts Index: kvm/arch/x86/kvm/vmx/posted_intr.h =================================================================== --- kvm.orig/arch/x86/kvm/vmx/posted_intr.h +++ kvm/arch/x86/kvm/vmx/posted_intr.h @@ -95,5 +95,7 @@ void __init pi_init_cpu(int cpu); bool pi_has_pending_interrupt(struct kvm_vcpu *vcpu); int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq, bool set); +void vmx_pi_start_assignment(struct kvm *kvm); +int vmx_vcpu_check_block(struct kvm_vcpu *vcpu); #endif /* __KVM_X86_VMX_POSTED_INTR_H */ Index: kvm/arch/x86/kvm/vmx/vmx.c =================================================================== --- kvm.orig/arch/x86/kvm/vmx/vmx.c +++ kvm/arch/x86/kvm/vmx/vmx.c @@ -7727,13 +7727,13 @@ static struct kvm_x86_ops vmx_x86_ops __ .pre_block = vmx_pre_block, .post_block = vmx_post_block, - .vcpu_check_block = NULL, + .vcpu_check_block = vmx_vcpu_check_block, .pmu_ops = &intel_pmu_ops, .nested_ops = &vmx_nested_ops, .update_pi_irte = pi_update_irte, - .start_assignment = NULL, + .start_assignment = vmx_pi_start_assignment, #ifdef CONFIG_X86_64 .set_hv_timer = vmx_set_hv_timer,
For VMX, when a vcpu enters HLT emulation, pi_post_block will: 1) Add vcpu to per-cpu list of blocked vcpus. 2) Program the posted-interrupt descriptor "notification vector" to POSTED_INTR_WAKEUP_VECTOR With interrupt remapping, an interrupt will set the PIR bit for the vector programmed for the device on the CPU, test-and-set the ON bit on the posted interrupt descriptor, and if the ON bit is clear generate an interrupt for the notification vector. This way, the target CPU wakes upon a device interrupt and wakes up the target vcpu. Problem is that pi_post_block only programs the notification vector if kvm_arch_has_assigned_device() is true. Its possible for the following to happen: 1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false, notification vector is not programmed 2) device is assigned to VM 3) device interrupts vcpu V, sets ON bit (notification vector not programmed, so pcpu P remains in idle) 4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set, kvm_vcpu_kick is skipped 5) vcpu 0 busy spins on vcpu V's response for several seconds, until RCU watchdog NMIs all vCPUs. To fix this, use the start_assignment kvm_x86_ops callback to kick vcpus out of the halt loop, so the notification vector is properly reprogrammed to the wakeup vector. Reported-by: Pei Zhang <pezhang@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>