diff mbox

KVM: arm/arm64: BUG FIX: Do not inject spurious interrupts

Message ID 1440676282-2152-1-git-send-email-p.fedin@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Fedin Aug. 27, 2015, 11:51 a.m. UTC
Commit 71760950bf3dc796e5e53ea3300dec724a09f593
("arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn") introduced
vgic_queue_irq_to_lr() function with additional vgic_dist_irq_is_pending()
check before setting LR_STATE_PENDING bit. In some cases it started
causing the following situation if the userland quickly drops the IRQ back
to inactive state for some reason:
1. Userland injects an IRQ with level == 1, this ends up in
   vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending()
   for this IRQ.
2. vCPU gets kicked. But kernel does not manage to reschedule it quickly
   (!!!)
3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending()
   in this case will call vgic_dist_irq_clear_pending() and reset the
   pending flag.
4. vCPU finally wakes up. It succesfully rolls through through
   __kvm_vgic_flush_hwstate(), which populates vGIC registers. However,
   since neither pending nor active flags are now set for this IRQ,
   vgic_queue_irq_to_lr() does not set any state bits on this LR at all.
   Since this is level-sensitive IRQ, we end up in LR containing only
   LR_EOI_INT bit, causing unnecessary immediate exit from the guest.

This patch fixes the problem by adding forgotten vgic_cpu_irq_clear().
This causes the IRQ not to be included into any lists, if it has been
picked up after getting dropped to inactive level. Since this is a
level-sensitive IRQ, this is correct behavior.

The bug was caught on ARM64 kernel v4.1.6, running qemu "virt" guest,
where it was caused by emulated pl011.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
---
 virt/kvm/arm/vgic.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Christoffer Dall Aug. 27, 2015, 12:44 p.m. UTC | #1
On Thu, Aug 27, 2015 at 02:51:22PM +0300, Pavel Fedin wrote:
> Commit 71760950bf3dc796e5e53ea3300dec724a09f593
> ("arm/arm64: KVM: add a common vgic_queue_irq_to_lr fn") introduced
> vgic_queue_irq_to_lr() function with additional vgic_dist_irq_is_pending()
> check before setting LR_STATE_PENDING bit. In some cases it started
> causing the following situation if the userland quickly drops the IRQ back
> to inactive state for some reason:
> 1. Userland injects an IRQ with level == 1, this ends up in
>    vgic_update_irq_pending(), which in turn calls vgic_dist_irq_set_pending()
>    for this IRQ.
> 2. vCPU gets kicked. But kernel does not manage to reschedule it quickly
>    (!!!)
> 3. Userland quickly resets the IRQ to level == 0. vgic_update_irq_pending()
>    in this case will call vgic_dist_irq_clear_pending() and reset the
>    pending flag.
> 4. vCPU finally wakes up. It succesfully rolls through through
>    __kvm_vgic_flush_hwstate(), which populates vGIC registers. However,
>    since neither pending nor active flags are now set for this IRQ,
>    vgic_queue_irq_to_lr() does not set any state bits on this LR at all.
>    Since this is level-sensitive IRQ, we end up in LR containing only
>    LR_EOI_INT bit, causing unnecessary immediate exit from the guest.
> 
> This patch fixes the problem by adding forgotten vgic_cpu_irq_clear().
> This causes the IRQ not to be included into any lists, if it has been
> picked up after getting dropped to inactive level. Since this is a
> level-sensitive IRQ, this is correct behavior.
> 
> The bug was caught on ARM64 kernel v4.1.6, running qemu "virt" guest,
> where it was caused by emulated pl011.

It's a bit weird to just sned this as a new patch without replying to my
mail from yesterday with feedback, explaining changes from what I did
etc.  Anyway.

> 
> Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
> ---
>  virt/kvm/arm/vgic.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 34dad3c..bf155e3 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1111,7 +1111,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  		kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state);
>  		vgic_irq_clear_active(vcpu, irq);
>  		vgic_update_state(vcpu->kvm);
> -	} else if (vgic_dist_irq_is_pending(vcpu, irq)) {
> +	} else {
> +		WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq));
>  		vlr.state |= LR_STATE_PENDING;
>  		kvm_debug("Set pending: 0x%x\n", vlr.state);
>  	}
> @@ -1567,8 +1568,10 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	} else {
>  		if (level_triggered) {
>  			vgic_dist_irq_clear_level(vcpu, irq_num);
> -			if (!vgic_dist_irq_soft_pend(vcpu, irq_num))
> +			if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) {
>  				vgic_dist_irq_clear_pending(vcpu, irq_num);
> +				vgic_cpu_irq_clear(vcpu, irq_num);

I think you're missing a potential change to the irq_pending_on_cpu
field here, which you have to compute by calling vgic_update_state()
like we do elsewhere when we change status bits (note that this is
different from the incorrect approach I suggested yesterday where we
always just clear the bit for that vcpu).

> +			}
>  		}
>  
>  		ret = false;
> -- 
> 2.4.4
> 

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Fedin Aug. 28, 2015, 9:11 a.m. UTC | #2
Hello!

> It's a bit weird to just sned this as a new patch without replying to my
> mail from yesterday with feedback

 Sorry. But changes are actually minimal, and i remember that i replied to you with the promise of
testing your suggestion. So, done, works fine. :)

> I think you're missing a potential change to the irq_pending_on_cpu
> field here, which you have to compute by calling vgic_update_state()
> like we do elsewhere when we change status bits

 I have just checked this. vgic_update_state() never resets this bit. This bit is reset only in
__kvm_vgic_flush_hwstate() and only if we have consumed completely everything. I have followed
through the code and looks like it's perfectly safe to have this bit set while nothing is actually
pendng. Following __kvm_vgic_flush_hwstate(), having this bit cleared is actually a shorthand for
"no interrupt is pending at all". If it is set without any interrupt actually being pending (this
ends up in pa_percpu and pa_shared being all zeroes), all three for_each_set_bit() loops will just
not do anything, and we still get to "epilog:" label, just after a bit longer check. And, since we
are here, the guest has already been disturbed.

> different from the incorrect approach I suggested yesterday where we
> always just clear the bit for that vcpu).

 Yes, it is extremely bad idea to clear it because this bit summarizes all interrupts for this vcpu,
and clearing it means that we are going to lose everything.
 An alternate would be: clear the bit, THEN call vgic_update_state() which would set it back if
necessary. But does this extra bit of complexity worth anything, given one paragraph above?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall Sept. 14, 2015, 11:29 a.m. UTC | #3
On Fri, Aug 28, 2015 at 12:11:17PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > It's a bit weird to just sned this as a new patch without replying to my
> > mail from yesterday with feedback
> 
>  Sorry. But changes are actually minimal, and i remember that i replied to you with the promise of
> testing your suggestion. So, done, works fine. :)
> 
> > I think you're missing a potential change to the irq_pending_on_cpu
> > field here, which you have to compute by calling vgic_update_state()
> > like we do elsewhere when we change status bits
> 
>  I have just checked this. vgic_update_state() never resets this bit. This bit is reset only in
> __kvm_vgic_flush_hwstate() and only if we have consumed completely everything. I have followed
> through the code and looks like it's perfectly safe to have this bit set while nothing is actually
> pendng. Following __kvm_vgic_flush_hwstate(), having this bit cleared is actually a shorthand for
> "no interrupt is pending at all". If it is set without any interrupt actually being pending (this
> ends up in pa_percpu and pa_shared being all zeroes), all three for_each_set_bit() loops will just
> not do anything, and we still get to "epilog:" label, just after a bit longer check. And, since we
> are here, the guest has already been disturbed.

ok, looks like it is functionally correct, but I'm not thrilled about us
setting the state in the VGIC to something is pending on the CPU, when
really there is not.  In that case, we should explicitly mark the bit as
a hint and not rely on its correctness and your patch should explain
this in vgic_update_irq_pending().

The alternative is to just call compute_pending which does nothing more
than a few bitwise and/or operations plus a couple of handfuls of
load/stores on this IRQ injection path, so I don't see the problem doing
this.

Does the code look awful?  If so, why?

> 
> > different from the incorrect approach I suggested yesterday where we
> > always just clear the bit for that vcpu).
> 
>  Yes, it is extremely bad idea to clear it because this bit summarizes all interrupts for this vcpu,
> and clearing it means that we are going to lose everything.

yes, I see this.

>  An alternate would be: clear the bit, THEN call vgic_update_state() which would set it back if
> necessary. But does this extra bit of complexity worth anything, given one paragraph above?
> 
I'm not sure your suggested approach works, because you could still
loose state for other IRQs I think.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Fedin Sept. 25, 2015, 2:01 p.m. UTC | #4
Hello!

> The alternative is to just call compute_pending which does nothing more
> than a few bitwise and/or operations plus a couple of handfuls of
> load/stores on this IRQ injection path, so I don't see the problem doing
> this.

 I looked at it, convinced. After all, resetting pending_on_cpu saves from unnecessary wakeups IIRC.
Posting v2...

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 34dad3c..bf155e3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1111,7 +1111,8 @@  static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 		kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state);
 		vgic_irq_clear_active(vcpu, irq);
 		vgic_update_state(vcpu->kvm);
-	} else if (vgic_dist_irq_is_pending(vcpu, irq)) {
+	} else {
+		WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq));
 		vlr.state |= LR_STATE_PENDING;
 		kvm_debug("Set pending: 0x%x\n", vlr.state);
 	}
@@ -1567,8 +1568,10 @@  static int vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 	} else {
 		if (level_triggered) {
 			vgic_dist_irq_clear_level(vcpu, irq_num);
-			if (!vgic_dist_irq_soft_pend(vcpu, irq_num))
+			if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) {
 				vgic_dist_irq_clear_pending(vcpu, irq_num);
+				vgic_cpu_irq_clear(vcpu, irq_num);
+			}
 		}
 
 		ret = false;