Message ID | 1440571563-7004-1-git-send-email-p.fedin@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 26 Aug 2015 09:46:03 +0300 Pavel Fedin <p.fedin@samsung.com> wrote: Hi Pavel, > Commit 71760950bf3dc796e5e53ea3300dec724a09f593 > ("arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn") introduced > vgic_queue_irq_to_lr() function which checks vgic_dist_irq_is_pending() > before setting LR_STATE_PENDING bit. However, in some cases, the following > race condition is possible: > 1. Userland injects an IRQ with level == 1, this ends up in > vgic_update_irq_pending(), which in turn calls > vgic_dist_irq_set_pending() for this IRQ. > 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly > (!!!) > 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending() > in this case will call vgic_dist_irq_clear_pending() and reset the > pending flag. So userspace drops the line to 0 *before* the guest had a chance to do anything? Well, this is not the expected behaviour for a level triggered interrupt, which should look like this: - device raises the interrupt line - guest takes the interrupt - guest pokes the device to clear the interrupt condition - device lowers the line The behaviour you describe is that of an edge triggered interrupt, and it is not surprising at all that you loose interrupts. This really feels like a userspace bug to me (I vaguely remember some QEMU issues regarding this a while ago, but my memory is a bit hazy). Christoffer? M.
Hello! > So userspace drops the line to 0 *before* the guest had a chance to do > anything? Well, this is not the expected behaviour for a level > triggered interrupt I know. But, still... Imagine that we have misconfigured the HW for some reason. The device pulses an IRQ line, but we think it's a level IRQ. What will happen in a real hardware? Not much, the interrupt will still be sampled. So, for better modelling the hardware, shouldn't we improve KVM's behavior here? Especially if before v4.1 it actually did not have this problem. > This really feels like a userspace bug to me (I vaguely remember some > QEMU issues regarding this a while ago, but my memory is a bit hazy). You know, may be it's really qemu's problem, to tell the truth i'm lazy to read the whole PL011 spec, but qemu appears to pulse the line without PL011 interrupt servicing at all. I know this because my kernel is patched, it uses software emulation of vCPU interface, because vGIC is broken on ThunderX. And LR state change and all the maintenance is done upon EOIR write (which is trapped). With this change consequences of losing an interrupt are much more severe, the IRQ line get stuck and stops working at all. Subsequent injections are blocked by vgic_can_sample_irq(), which returns false because vgic_irq_is_queued() returns true. Because vgic_irq_clear_queued() is called during maintenance procedure, which in this case never happens, because the interrupt is never EOIed, because it was never made PENDING in the LR. Actually that's how i found this. So, here is why i am describing these unrelated things here: with IRQ line processing completely locked up, line switches between 1 and 0 is still injected (vgic_update_irq_pending() is called with both values, i added some debug output in order to see this). The guest successfully boots up to a login prompt, everything is fine, just i cannot type anything on the console because serial port's interrupt is locked up. I suppose that this pulsing has to do with output FIFO. Could this be some bug in kernel's pl011 driver itself, which does something wrong and does not handle interrupts in a proper way during output? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26/08/15 11:58, Pavel Fedin wrote: > Hello! > >> So userspace drops the line to 0 *before* the guest had a chance to do >> anything? Well, this is not the expected behaviour for a level >> triggered interrupt > > I know. But, still... > Imagine that we have misconfigured the HW for some reason. The device pulses an IRQ line, but we > think it's a level IRQ. What will happen in a real hardware? Not much, the interrupt will still be > sampled. > So, for better modelling the hardware, shouldn't we improve KVM's behavior here? Especially if > before v4.1 it actually did not have this problem. I'm sorry, but that's actually a very accurate model of the HW. You misconfigure the line trigger, you loose interrupts. This happens on real HW all the time. And if you haven't seen that before, you haven't tried very hard. As for v4.1 not having that problem, the pl011 driver has gone though a lot if rework lately, and I wouldn't be surprised if it now exhibited a different behaviour thanks to the broken userspace behaviour. > >> This really feels like a userspace bug to me (I vaguely remember some >> QEMU issues regarding this a while ago, but my memory is a bit hazy). > > You know, may be it's really qemu's problem, to tell the truth i'm lazy to read the whole PL011 > spec, but qemu appears to pulse the line without PL011 interrupt servicing at all. I know this > because my kernel is patched, it uses software emulation of vCPU interface, because vGIC is broken > on ThunderX. And LR state change and all the maintenance is done upon EOIR write (which is trapped). > With this change consequences of losing an interrupt are much more severe, the IRQ line get stuck > and stops working at all. Subsequent injections are blocked by vgic_can_sample_irq(), which returns > false because vgic_irq_is_queued() returns true. Because vgic_irq_clear_queued() is called during > maintenance procedure, which in this case never happens, because the interrupt is never EOIed, > because it was never made PENDING in the LR. Actually that's how i found this. TL;DR. You're using a different code base, broken HW, and what is apparently a buggy userspace. Sorry, but I don't really want to introduce another bug in the VGIC code (we have too many already). And what you're suggesting is to actually introduce a bug. Thanks, M.
Hello! > As for v4.1 not having that problem, the pl011 driver has gone though a > lot if rework lately, and I wouldn't be surprised if it now exhibited a > different behaviour thanks to the broken userspace behaviour. Sorry, you misunderstood me. Or i wrote badly. I meant that _KVM_ did not have this particular problem in kernel v4.0, because: http://lxr.free-electrons.com/source/virt/kvm/arm/vgic.c?v=4.0#L998 you see, LR_STATE_PENDING is assigned unconditionally. Is this code correct? I believe yes. Compare with: http://lxr.free-electrons.com/source/virt/kvm/arm/vgic.c#L1104 Now it is possible to have neither PENDING nor ACTIVE irq. Does it even make sense? So what is wrong with the modification as follows? --- cut --- if (vgic_irq_is_active(vcpu, irq)) { vlr.state |= LR_STATE_ACTIVE; kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state); vgic_irq_clear_active(vcpu, irq); vgic_update_state(vcpu->kvm); } else { vlr.state |= LR_STATE_PENDING; kvm_debug("Set pending: 0x%x\n", vlr.state); } --- cut --- Alex, are you reading us? Can you explain, why you introduced that extra check? > And what you're suggesting is to actually introduce a bug. Why would that be a bug, if it was not a bug in kernel 4.0? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello! > Sorry, but I don't really want to introduce another bug > in the VGIC code (we have too many already). And what you're suggesting > is to actually introduce a bug. Another, alternate idea... So far, we have a situation when empty LR, containing only LR_INT_EOI bit, is queued. Can we say that this is wrong? If you agree, may be do something else instead? May be we should cancel such "ghost" interrupts early, avoiding immediate and completely unnecessary maintenance interrupts upon guest entry? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 26, 2015 at 09:27:21AM +0100, Marc Zyngier wrote: > On Wed, 26 Aug 2015 09:46:03 +0300 > Pavel Fedin <p.fedin@samsung.com> wrote: > > Hi Pavel, > > > Commit 71760950bf3dc796e5e53ea3300dec724a09f593 > > ("arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn") introduced > > vgic_queue_irq_to_lr() function which checks vgic_dist_irq_is_pending() > > before setting LR_STATE_PENDING bit. However, in some cases, the following > > race condition is possible: > > 1. Userland injects an IRQ with level == 1, this ends up in > > vgic_update_irq_pending(), which in turn calls > > vgic_dist_irq_set_pending() for this IRQ. > > 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly > > (!!!) > > 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending() > > in this case will call vgic_dist_irq_clear_pending() and reset the > > pending flag. > > So userspace drops the line to 0 *before* the guest had a chance to do > anything? Well, this is not the expected behaviour for a level > triggered interrupt, which should look like this: > > - device raises the interrupt line > - guest takes the interrupt > - guest pokes the device to clear the interrupt condition > - device lowers the line > > The behaviour you describe is that of an edge triggered interrupt, and > it is not surprising at all that you loose interrupts. > > This really feels like a userspace bug to me (I vaguely remember some > QEMU issues regarding this a while ago, but my memory is a bit hazy). > Christoffer? > I think it's perfectly valid for userspace to raise and lower a level triggered interrupt at will for some device emulation. But it is inconsistent to get to a point in the vgic code where we try to queue something which is neither active nor pending. See my reply to the original patch. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index fdcad86..90d1671 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1111,7 +1111,7 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state); vgic_irq_clear_active(vcpu, irq); vgic_update_state(vcpu->kvm); - } else if (vgic_dist_irq_is_pending(vcpu, irq)) { + } else { vlr.state |= LR_STATE_PENDING; kvm_debug("Set pending: 0x%x\n", vlr.state); }
Commit 71760950bf3dc796e5e53ea3300dec724a09f593 ("arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn") introduced vgic_queue_irq_to_lr() function which checks vgic_dist_irq_is_pending() before setting LR_STATE_PENDING bit. However, in some cases, the following race condition is possible: 1. Userland injects an IRQ with level == 1, this ends up in vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending() for this IRQ. 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly (!!!) 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending() in this case will call vgic_dist_irq_clear_pending() and reset the pending flag. 4. vCPU finally wakes up. It successfully rolls through through __kvm_vgic_flush_hwstate(), which populates vGIC registers. Before the aforementioned commit LR_STATE_PENDING bit was set unconditionally, and nothing bad happened. However, now vgic_queue_irq_to_lr() does not set any state bits on this LR at all, because vgic_dist_irq_is_pending() returns zero (it was reset in step 3). Since this is level-sensitive IRQ, we end up in LR containing only LR_EOI_INT bit. The guest will not get this interrupt. This patch fixes the problem by bringing back unconditional setting of LR_STATE_PENDING bit. The bug was caught on Cavium ThunderX machine, kernel v4.1.6, running qemu "virt" guest, where it affected pl011 driver. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> --- virt/kvm/arm/vgic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)