Message ID | 1440676282-2152-1-git-send-email-p.fedin@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Aug 27, 2015 at 02:51:22PM +0300, Pavel Fedin wrote: > Commit 71760950bf3dc796e5e53ea3300dec724a09f593 > ("arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn") introduced > vgic_queue_irq_to_lr() function with additional vgic_dist_irq_is_pending() > check before setting LR_STATE_PENDING bit. In some cases it started > causing the following situation if the userland quickly drops the IRQ back > to inactive state for some reason: > 1. Userland injects an IRQ with level == 1, this ends up in > vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending() > for this IRQ. > 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly > (!!!) > 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending() > in this case will call vgic_dist_irq_clear_pending() and reset the > pending flag. > 4. vCPU finally wakes up. It succesfully rolls through through > __kvm_vgic_flush_hwstate(), which populates vGIC registers. However, > since neither pending nor active flags are now set for this IRQ, > vgic_queue_irq_to_lr() does not set any state bits on this LR at all. > Since this is level-sensitive IRQ, we end up in LR containing only > LR_EOI_INT bit, causing unnecessary immediate exit from the guest. > > This patch fixes the problem by adding forgotten vgic_cpu_irq_clear(). > This causes the IRQ not to be included into any lists, if it has been > picked up after getting dropped to inactive level. Since this is a > level-sensitive IRQ, this is correct behavior. > > The bug was caught on ARM64 kernel v4.1.6, running qemu "virt" guest, > where it was caused by emulated pl011. It's a bit weird to just sned this as a new patch without replying to my mail from yesterday with feedback, explaining changes from what I did etc. Anyway. > > Signed-off-by: Pavel Fedin <p.fedin@samsung.com> > --- > virt/kvm/arm/vgic.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c > index 34dad3c..bf155e3 100644 > --- a/virt/kvm/arm/vgic.c > +++ b/virt/kvm/arm/vgic.c > @@ -1111,7 +1111,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, > kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state); > vgic_irq_clear_active(vcpu, irq); > vgic_update_state(vcpu->kvm); > - } else if (vgic_dist_irq_is_pending(vcpu, irq)) { > + } else { > + WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq)); > vlr.state |= LR_STATE_PENDING; > kvm_debug("Set pending: 0x%x\n", vlr.state); > } > @@ -1567,8 +1568,10 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid, > } else { > if (level_triggered) { > vgic_dist_irq_clear_level(vcpu, irq_num); > - if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) > + if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) { > vgic_dist_irq_clear_pending(vcpu, irq_num); > + vgic_cpu_irq_clear(vcpu, irq_num); I think you're missing a potential change to the irq_pending_on_cpu field here, which you have to compute by calling vgic_update_state() like we do elsewhere when we change status bits (note that this is different from the incorrect approach I suggested yesterday where we always just clear the bit for that vcpu). > + } > } > > ret = false; > -- > 2.4.4 > -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello! > It's a bit weird to just sned this as a new patch without replying to my > mail from yesterday with feedback Sorry. But changes are actually minimal, and i remember that i replied to you with the promise of testing your suggestion. So, done, works fine. :) > I think you're missing a potential change to the irq_pending_on_cpu > field here, which you have to compute by calling vgic_update_state() > like we do elsewhere when we change status bits I have just checked this. vgic_update_state() never resets this bit. This bit is reset only in __kvm_vgic_flush_hwstate() and only if we have consumed completely everything. I have followed through the code and looks like it's perfectly safe to have this bit set while nothing is actually pendng. Following __kvm_vgic_flush_hwstate(), having this bit cleared is actually a shorthand for "no interrupt is pending at all". If it is set without any interrupt actually being pending (this ends up in pa_percpu and pa_shared being all zeroes), all three for_each_set_bit() loops will just not do anything, and we still get to "epilog:" label, just after a bit longer check. And, since we are here, the guest has already been disturbed. > different from the incorrect approach I suggested yesterday where we > always just clear the bit for that vcpu). Yes, it is extremely bad idea to clear it because this bit summarizes all interrupts for this vcpu, and clearing it means that we are going to lose everything. An alternate would be: clear the bit, THEN call vgic_update_state() which would set it back if necessary. But does this extra bit of complexity worth anything, given one paragraph above? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 28, 2015 at 12:11:17PM +0300, Pavel Fedin wrote: > Hello! > > > It's a bit weird to just sned this as a new patch without replying to my > > mail from yesterday with feedback > > Sorry. But changes are actually minimal, and i remember that i replied to you with the promise of > testing your suggestion. So, done, works fine. :) > > > I think you're missing a potential change to the irq_pending_on_cpu > > field here, which you have to compute by calling vgic_update_state() > > like we do elsewhere when we change status bits > > I have just checked this. vgic_update_state() never resets this bit. This bit is reset only in > __kvm_vgic_flush_hwstate() and only if we have consumed completely everything. I have followed > through the code and looks like it's perfectly safe to have this bit set while nothing is actually > pendng. Following __kvm_vgic_flush_hwstate(), having this bit cleared is actually a shorthand for > "no interrupt is pending at all". If it is set without any interrupt actually being pending (this > ends up in pa_percpu and pa_shared being all zeroes), all three for_each_set_bit() loops will just > not do anything, and we still get to "epilog:" label, just after a bit longer check. And, since we > are here, the guest has already been disturbed. ok, looks like it is functionally correct, but I'm not thrilled about us setting the state in the VGIC to something is pending on the CPU, when really there is not. In that case, we should explicitly mark the bit as a hint and not rely on its correctness and your patch should explain this in vgic_update_irq_pending(). The alternative is to just call compute_pending which does nothing more than a few bitwise and/or operations plus a couple of handfuls of load/stores on this IRQ injection path, so I don't see the problem doing this. Does the code look awful? If so, why? > > > different from the incorrect approach I suggested yesterday where we > > always just clear the bit for that vcpu). > > Yes, it is extremely bad idea to clear it because this bit summarizes all interrupts for this vcpu, > and clearing it means that we are going to lose everything. yes, I see this. > An alternate would be: clear the bit, THEN call vgic_update_state() which would set it back if > necessary. But does this extra bit of complexity worth anything, given one paragraph above? > I'm not sure your suggested approach works, because you could still loose state for other IRQs I think. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello! > The alternative is to just call compute_pending which does nothing more > than a few bitwise and/or operations plus a couple of handfuls of > load/stores on this IRQ injection path, so I don't see the problem doing > this. I looked at it, convinced. After all, resetting pending_on_cpu saves from unnecessary wakeups IIRC. Posting v2... Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 34dad3c..bf155e3 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1111,7 +1111,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state); vgic_irq_clear_active(vcpu, irq); vgic_update_state(vcpu->kvm); - } else if (vgic_dist_irq_is_pending(vcpu, irq)) { + } else { + WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq)); vlr.state |= LR_STATE_PENDING; kvm_debug("Set pending: 0x%x\n", vlr.state); } @@ -1567,8 +1568,10 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid, } else { if (level_triggered) { vgic_dist_irq_clear_level(vcpu, irq_num); - if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) + if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) { vgic_dist_irq_clear_pending(vcpu, irq_num); + vgic_cpu_irq_clear(vcpu, irq_num); + } } ret = false;
Commit 71760950bf3dc796e5e53ea3300dec724a09f593 ("arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn") introduced vgic_queue_irq_to_lr() function with additional vgic_dist_irq_is_pending() check before setting LR_STATE_PENDING bit. In some cases it started causing the following situation if the userland quickly drops the IRQ back to inactive state for some reason: 1. Userland injects an IRQ with level == 1, this ends up in vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending() for this IRQ. 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly (!!!) 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending() in this case will call vgic_dist_irq_clear_pending() and reset the pending flag. 4. vCPU finally wakes up. It succesfully rolls through through __kvm_vgic_flush_hwstate(), which populates vGIC registers. However, since neither pending nor active flags are now set for this IRQ, vgic_queue_irq_to_lr() does not set any state bits on this LR at all. Since this is level-sensitive IRQ, we end up in LR containing only LR_EOI_INT bit, causing unnecessary immediate exit from the guest. This patch fixes the problem by adding forgotten vgic_cpu_irq_clear(). This causes the IRQ not to be included into any lists, if it has been picked up after getting dropped to inactive level. Since this is a level-sensitive IRQ, this is correct behavior. The bug was caught on ARM64 kernel v4.1.6, running qemu "virt" guest, where it was caused by emulated pl011. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> --- virt/kvm/arm/vgic.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)