diff mbox

[v3,51/59] KVM: arm/arm64: GICv4: Add doorbell interrupt handling

Message ID 8a35ee18-85a5-168b-b3d2-b9f51390b475@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier Aug. 4, 2017, 7:44 a.m. UTC
On 31/07/17 18:26, Marc Zyngier wrote:
> When a vPE is not running, a VLPI being made pending results in a
> doorbell interrupt being delivered. Let's handle this interrupt
> and update the pending_last flag that indicates that VLPIs are
> pending. The corresponding vcpu is also kicked into action.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic/vgic-v4.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> index 534d3051a078..6af3cde6d7d4 100644
> --- a/virt/kvm/arm/vgic/vgic-v4.c
> +++ b/virt/kvm/arm/vgic/vgic-v4.c
> @@ -21,6 +21,19 @@
>  
>  #include "vgic.h"
>  
> +static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> +{
> +	struct kvm_vcpu *vcpu = info;
> +
> +	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> +		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> +		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}

This code is so obviously broken that I completely overlooked it.

If we have take a doorbell interrupt, then it means nothing was
otherwise pending (because we'd have been kicked out of the blocking
state, and will have masked the doorbell). So checking for pending
interrupts is pointless.

Furthermore, calling kvm_vgic_vcpu_pending_irq() takes the ap_list
lock. If we take a doorbell interrupt while injecting a virtual
interrupt (from userspace, for example) on the same CPU, we end-up
in deadlock land. This would be solved by Christoffer's latest
crop of timer patches, but there is no point getting there the first
place.

The patchlet below solves it:


and I've queued it for the next round.

Thanks,

	M.

Comments

Christoffer Dall Aug. 28, 2017, 6:18 p.m. UTC | #1
On Fri, Aug 04, 2017 at 08:44:04AM +0100, Marc Zyngier wrote:
> On 31/07/17 18:26, Marc Zyngier wrote:
> > When a vPE is not running, a VLPI being made pending results in a
> > doorbell interrupt being delivered. Let's handle this interrupt
> > and update the pending_last flag that indicates that VLPIs are
> > pending. The corresponding vcpu is also kicked into action.
> > 
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  virt/kvm/arm/vgic/vgic-v4.c | 34 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> > index 534d3051a078..6af3cde6d7d4 100644
> > --- a/virt/kvm/arm/vgic/vgic-v4.c
> > +++ b/virt/kvm/arm/vgic/vgic-v4.c
> > @@ -21,6 +21,19 @@
> >  
> >  #include "vgic.h"
> >  
> > +static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> > +{
> > +	struct kvm_vcpu *vcpu = info;
> > +
> > +	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> > +		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > +		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > +		kvm_vcpu_kick(vcpu);
> > +	}
> 
> This code is so obviously broken that I completely overlooked it.
> 
> If we have take a doorbell interrupt, then it means nothing was
> otherwise pending (because we'd have been kicked out of the blocking
> state, and will have masked the doorbell). So checking for pending
> interrupts is pointless.
> 
> Furthermore, calling kvm_vgic_vcpu_pending_irq() takes the ap_list
> lock. If we take a doorbell interrupt while injecting a virtual
> interrupt (from userspace, for example) on the same CPU, we end-up
> in deadlock land. This would be solved by Christoffer's latest
> crop of timer patches, but there is no point getting there the first
> place.
> 
> The patchlet below solves it:
> 
> diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> index 15feb1151797..48e4d6ebeaa8 100644
> --- a/virt/kvm/arm/vgic/vgic-v4.c
> +++ b/virt/kvm/arm/vgic/vgic-v4.c
> @@ -94,11 +94,9 @@ static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
>  {
>  	struct kvm_vcpu *vcpu = info;
>  
> -	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> -		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> -		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> -		kvm_vcpu_kick(vcpu);
> -	}
> +	vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> +	kvm_vcpu_kick(vcpu);

I don't think you need the request and kick, because if you're getting
this doorbell, doesn't that also mean that the VCPU is not running in
the guest and you simply need to make sure the VCPU thread gets
scheduled again, so you could call kvm_vcpu_wake_up() instead.

Unless the request is there to ensure proper memory barriers around
setting pending_last?

In that case, is the read barrier taken care of by prepare_to_swait in
kvm_vcpu_block()?

Drew?


Thanks,
-Christoffer
Andrew Jones Aug. 30, 2017, 10:31 a.m. UTC | #2
On Mon, Aug 28, 2017 at 08:18:50PM +0200, Christoffer Dall wrote:
> On Fri, Aug 04, 2017 at 08:44:04AM +0100, Marc Zyngier wrote:
> > On 31/07/17 18:26, Marc Zyngier wrote:
> > > When a vPE is not running, a VLPI being made pending results in a
> > > doorbell interrupt being delivered. Let's handle this interrupt
> > > and update the pending_last flag that indicates that VLPIs are
> > > pending. The corresponding vcpu is also kicked into action.
> > > 
> > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > ---
> > >  virt/kvm/arm/vgic/vgic-v4.c | 34 ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 34 insertions(+)
> > > 
> > > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> > > index 534d3051a078..6af3cde6d7d4 100644
> > > --- a/virt/kvm/arm/vgic/vgic-v4.c
> > > +++ b/virt/kvm/arm/vgic/vgic-v4.c
> > > @@ -21,6 +21,19 @@
> > >  
> > >  #include "vgic.h"
> > >  
> > > +static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> > > +{
> > > +	struct kvm_vcpu *vcpu = info;
> > > +
> > > +	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> > > +		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > > +		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > > +		kvm_vcpu_kick(vcpu);
> > > +	}
> > 
> > This code is so obviously broken that I completely overlooked it.
> > 
> > If we have take a doorbell interrupt, then it means nothing was
> > otherwise pending (because we'd have been kicked out of the blocking
> > state, and will have masked the doorbell). So checking for pending
> > interrupts is pointless.
> > 
> > Furthermore, calling kvm_vgic_vcpu_pending_irq() takes the ap_list
> > lock. If we take a doorbell interrupt while injecting a virtual
> > interrupt (from userspace, for example) on the same CPU, we end-up
> > in deadlock land. This would be solved by Christoffer's latest
> > crop of timer patches, but there is no point getting there the first
> > place.
> > 
> > The patchlet below solves it:
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> > index 15feb1151797..48e4d6ebeaa8 100644
> > --- a/virt/kvm/arm/vgic/vgic-v4.c
> > +++ b/virt/kvm/arm/vgic/vgic-v4.c
> > @@ -94,11 +94,9 @@ static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> >  {
> >  	struct kvm_vcpu *vcpu = info;
> >  
> > -	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> > -		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > -		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > -		kvm_vcpu_kick(vcpu);
> > -	}
> > +	vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > +	kvm_vcpu_kick(vcpu);
> 
> I don't think you need the request and kick, because if you're getting
> this doorbell, doesn't that also mean that the VCPU is not running in
> the guest and you simply need to make sure the VCPU thread gets
> scheduled again, so you could call kvm_vcpu_wake_up() instead.

While we do that in kvm_timer_inject_irq_work(), which is scheduled from
the same function that vgic_v4_doorbell_handler() would be enabled from
(kvm_vcpu_block->kvm_arch_vcpu_blocking), just a wake up isn't sufficient
in this case.

> 
> Unless the request is there to ensure proper memory barriers around
> setting pending_last?

Right, unlike pending timers, we need the barriers in this handler.
Pending timers are safe because their pending test compares state set by
the VCPU thread itself to state acquired by the VCPU thread reading a host
sysreg itself. IOW, handlers making "set vcpu timer pending requests"
don't need to do anything but wake up the VCPU. Handlers that set some
sort of irq pending state, like vgic_v4_doorbell_handler(), do need to
worry about the visibility of that state though.

> 
> In that case, is the read barrier taken care of by prepare_to_swait in
> kvm_vcpu_block()?

Thanks for the bug report :-)

There's a barrier, but it's not properly paired. Currently we have

 VCPU                                        handler
 ----                                        -------
 for (;;) {
   WRITE_ONCE(task->state, INTERRUPTIBLE);   pending=true;
   smp_mb();                                 smp_wmb(); // kvm_make_request()
   if (pending) {                            WRITE_ONCE(vcpu->state, NORMAL);
     ... stop waiting ...
   }
   schedule();
 }

Proper barrier use with swait/swake should instead look like this

 VCPU                                        handler
 ----                                        -------
 for (;;) {
   WRITE_ONCE(task->state, INTERRUPTIBLE);
   smp_mb();
   if (READ_ONCE(task->state) == NORMAL) {   pending=true;
     smp_rmb();                              smp_wmb();
     if (pending)                            WRITE_ONCE(vcpu->state, NORMAL);
       ... stop waiting ...
     else
       continue;
   }
   schedule();
 }

But checking task state adds complexity and would only cover a small
window anyway (the window between prepare_to_swait() and
kvm_vcpu_check_block()). We need to cover a larger window, for this
particular case it's from kvm_arch_vcpu_blocking() to
kvm_vcpu_check_block(). Better use of VCPU requests is the answer:

 VCPU                                        handler
 ----                                        -------
 kvm_arch_vcpu_blocking();
 for (;;) {
   prepare_to_swait(); 
   if (test_bit(IRQ_PENDING)) {              pending=true;
     smp_rmb();                              smp_wmb();
     if (pending)                            set_bit(IRQ_PENDING);
       ... stop waiting ...
     else
       continue;
   }
   schedule();
 }

The handler side is covered by kvm_make_request() and the vcpu kick, so
this patch looks good. However we need a patch to kvm_arch_vcpu_runnable()
to fix the VCPU side.

 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-       return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
-               && !v->arch.power_off && !v->arch.pause);
+       if (v->arch.power_off || v->arch.pause)
+               return false;
+
+       if (kvm_test_request(KVM_REQ_IRQ_PENDING, vcpu)) {
+               smp_rmb();
+               return true;
+       }
 }

We can keep the kvm_vgic_vcpu_pending_irq() check, after the IRQ_PENDING
check, if there are cases where IRQ_PENDING wouldn't be set, but
kvm_vgic_vcpu_pending_irq() would return true. Either way we also get a
bit of an optimization with this fix.

I'll think/test some more, and then send a patch.

Thanks,
drew
Christoffer Dall Aug. 30, 2017, 11:55 a.m. UTC | #3
On Wed, Aug 30, 2017 at 12:31:41PM +0200, Andrew Jones wrote:
> On Mon, Aug 28, 2017 at 08:18:50PM +0200, Christoffer Dall wrote:
> > On Fri, Aug 04, 2017 at 08:44:04AM +0100, Marc Zyngier wrote:
> > > On 31/07/17 18:26, Marc Zyngier wrote:
> > > > When a vPE is not running, a VLPI being made pending results in a
> > > > doorbell interrupt being delivered. Let's handle this interrupt
> > > > and update the pending_last flag that indicates that VLPIs are
> > > > pending. The corresponding vcpu is also kicked into action.
> > > > 
> > > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > ---
> > > >  virt/kvm/arm/vgic/vgic-v4.c | 34 ++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 34 insertions(+)
> > > > 
> > > > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> > > > index 534d3051a078..6af3cde6d7d4 100644
> > > > --- a/virt/kvm/arm/vgic/vgic-v4.c
> > > > +++ b/virt/kvm/arm/vgic/vgic-v4.c
> > > > @@ -21,6 +21,19 @@
> > > >  
> > > >  #include "vgic.h"
> > > >  
> > > > +static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> > > > +{
> > > > +	struct kvm_vcpu *vcpu = info;
> > > > +
> > > > +	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> > > > +		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > > > +		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > > > +		kvm_vcpu_kick(vcpu);
> > > > +	}
> > > 
> > > This code is so obviously broken that I completely overlooked it.
> > > 
> > > If we have take a doorbell interrupt, then it means nothing was
> > > otherwise pending (because we'd have been kicked out of the blocking
> > > state, and will have masked the doorbell). So checking for pending
> > > interrupts is pointless.
> > > 
> > > Furthermore, calling kvm_vgic_vcpu_pending_irq() takes the ap_list
> > > lock. If we take a doorbell interrupt while injecting a virtual
> > > interrupt (from userspace, for example) on the same CPU, we end-up
> > > in deadlock land. This would be solved by Christoffer's latest
> > > crop of timer patches, but there is no point getting there the first
> > > place.
> > > 
> > > The patchlet below solves it:
> > > 
> > > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> > > index 15feb1151797..48e4d6ebeaa8 100644
> > > --- a/virt/kvm/arm/vgic/vgic-v4.c
> > > +++ b/virt/kvm/arm/vgic/vgic-v4.c
> > > @@ -94,11 +94,9 @@ static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> > >  {
> > >  	struct kvm_vcpu *vcpu = info;
> > >  
> > > -	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> > > -		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > > -		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > > -		kvm_vcpu_kick(vcpu);
> > > -	}
> > > +	vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > > +	kvm_vcpu_kick(vcpu);
> > 
> > I don't think you need the request and kick, because if you're getting
> > this doorbell, doesn't that also mean that the VCPU is not running in
> > the guest and you simply need to make sure the VCPU thread gets
> > scheduled again, so you could call kvm_vcpu_wake_up() instead.
> 
> While we do that in kvm_timer_inject_irq_work(), which is scheduled from
> the same function that vgic_v4_doorbell_handler() would be enabled from
> (kvm_vcpu_block->kvm_arch_vcpu_blocking), just a wake up isn't sufficient
> in this case.
> 
> > 
> > Unless the request is there to ensure proper memory barriers around
> > setting pending_last?
> 
> Right, unlike pending timers, we need the barriers in this handler.
> Pending timers are safe because their pending test compares state set by
> the VCPU thread itself to state acquired by the VCPU thread reading a host
> sysreg itself. IOW, handlers making "set vcpu timer pending requests"
> don't need to do anything but wake up the VCPU. Handlers that set some
> sort of irq pending state, like vgic_v4_doorbell_handler(), do need to
> worry about the visibility of that state though.
> 
> > 
> > In that case, is the read barrier taken care of by prepare_to_swait in
> > kvm_vcpu_block()?
> 
> Thanks for the bug report :-)
> 
> There's a barrier, but it's not properly paired. Currently we have
> 
>  VCPU                                        handler
>  ----                                        -------
>  for (;;) {
>    WRITE_ONCE(task->state, INTERRUPTIBLE);   pending=true;
>    smp_mb();                                 smp_wmb(); // kvm_make_request()
>    if (pending) {                            WRITE_ONCE(vcpu->state, NORMAL);
>      ... stop waiting ...
>    }
>    schedule();
>  }
> 
> Proper barrier use with swait/swake should instead look like this
> 
>  VCPU                                        handler
>  ----                                        -------
>  for (;;) {
>    WRITE_ONCE(task->state, INTERRUPTIBLE);
>    smp_mb();
>    if (READ_ONCE(task->state) == NORMAL) {   pending=true;
>      smp_rmb();                              smp_wmb();
>      if (pending)                            WRITE_ONCE(vcpu->state, NORMAL);
>        ... stop waiting ...
>      else
>        continue;
>    }
>    schedule();
>  }
> 
> But checking task state adds complexity and would only cover a small
> window anyway (the window between prepare_to_swait() and
> kvm_vcpu_check_block()). We need to cover a larger window, for this
> particular case it's from kvm_arch_vcpu_blocking() to
> kvm_vcpu_check_block(). Better use of VCPU requests is the answer:
> 
>  VCPU                                        handler
>  ----                                        -------
>  kvm_arch_vcpu_blocking();
>  for (;;) {
>    prepare_to_swait(); 
>    if (test_bit(IRQ_PENDING)) {              pending=true;
>      smp_rmb();                              smp_wmb();
>      if (pending)                            set_bit(IRQ_PENDING);
>        ... stop waiting ...
>      else
>        continue;
>    }
>    schedule();
>  }
> 
> The handler side is covered by kvm_make_request() and the vcpu kick, so
> this patch looks good. However we need a patch to kvm_arch_vcpu_runnable()
> to fix the VCPU side.
> 
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
> -       return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> -               && !v->arch.power_off && !v->arch.pause);
> +       if (v->arch.power_off || v->arch.pause)
> +               return false;

Why don't we need to ensure reads of these flags are observable when
waking up, if the thread waking us up had se the variables prior to
issuing the wake up call?

> +
> +       if (kvm_test_request(KVM_REQ_IRQ_PENDING, vcpu)) {
> +               smp_rmb();

This looks wrong.  What does this barrier ensure exactly?  Some kind of
read following whoever called this function?

> +               return true;
> +       }
>  }
> 
> We can keep the kvm_vgic_vcpu_pending_irq() check, after the IRQ_PENDING
> check, if there are cases where IRQ_PENDING wouldn't be set, but
> kvm_vgic_vcpu_pending_irq() would return true. Either way we also get a
> bit of an optimization with this fix.
> 
> I'll think/test some more, and then send a patch.
> 

I'll review it more thoroughly then.

Thanks,
-Christoffer
Andrew Jones Aug. 30, 2017, 12:28 p.m. UTC | #4
On Wed, Aug 30, 2017 at 01:55:10PM +0200, Christoffer Dall wrote:
> On Wed, Aug 30, 2017 at 12:31:41PM +0200, Andrew Jones wrote:
> > On Mon, Aug 28, 2017 at 08:18:50PM +0200, Christoffer Dall wrote:
> > > On Fri, Aug 04, 2017 at 08:44:04AM +0100, Marc Zyngier wrote:
> > > > On 31/07/17 18:26, Marc Zyngier wrote:
> > > > > When a vPE is not running, a VLPI being made pending results in a
> > > > > doorbell interrupt being delivered. Let's handle this interrupt
> > > > > and update the pending_last flag that indicates that VLPIs are
> > > > > pending. The corresponding vcpu is also kicked into action.
> > > > > 
> > > > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > > ---
> > > > >  virt/kvm/arm/vgic/vgic-v4.c | 34 ++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 34 insertions(+)
> > > > > 
> > > > > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> > > > > index 534d3051a078..6af3cde6d7d4 100644
> > > > > --- a/virt/kvm/arm/vgic/vgic-v4.c
> > > > > +++ b/virt/kvm/arm/vgic/vgic-v4.c
> > > > > @@ -21,6 +21,19 @@
> > > > >  
> > > > >  #include "vgic.h"
> > > > >  
> > > > > +static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> > > > > +{
> > > > > +	struct kvm_vcpu *vcpu = info;
> > > > > +
> > > > > +	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> > > > > +		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > > > > +		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > > > > +		kvm_vcpu_kick(vcpu);
> > > > > +	}
> > > > 
> > > > This code is so obviously broken that I completely overlooked it.
> > > > 
> > > > If we have take a doorbell interrupt, then it means nothing was
> > > > otherwise pending (because we'd have been kicked out of the blocking
> > > > state, and will have masked the doorbell). So checking for pending
> > > > interrupts is pointless.
> > > > 
> > > > Furthermore, calling kvm_vgic_vcpu_pending_irq() takes the ap_list
> > > > lock. If we take a doorbell interrupt while injecting a virtual
> > > > interrupt (from userspace, for example) on the same CPU, we end-up
> > > > in deadlock land. This would be solved by Christoffer's latest
> > > > crop of timer patches, but there is no point getting there the first
> > > > place.
> > > > 
> > > > The patchlet below solves it:
> > > > 
> > > > diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> > > > index 15feb1151797..48e4d6ebeaa8 100644
> > > > --- a/virt/kvm/arm/vgic/vgic-v4.c
> > > > +++ b/virt/kvm/arm/vgic/vgic-v4.c
> > > > @@ -94,11 +94,9 @@ static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> > > >  {
> > > >  	struct kvm_vcpu *vcpu = info;
> > > >  
> > > > -	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> > > > -		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > > > -		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > > > -		kvm_vcpu_kick(vcpu);
> > > > -	}
> > > > +	vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> > > > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > > > +	kvm_vcpu_kick(vcpu);
> > > 
> > > I don't think you need the request and kick, because if you're getting
> > > this doorbell, doesn't that also mean that the VCPU is not running in
> > > the guest and you simply need to make sure the VCPU thread gets
> > > scheduled again, so you could call kvm_vcpu_wake_up() instead.
> > 
> > While we do that in kvm_timer_inject_irq_work(), which is scheduled from
> > the same function that vgic_v4_doorbell_handler() would be enabled from
> > (kvm_vcpu_block->kvm_arch_vcpu_blocking), just a wake up isn't sufficient
> > in this case.
> > 
> > > 
> > > Unless the request is there to ensure proper memory barriers around
> > > setting pending_last?
> > 
> > Right, unlike pending timers, we need the barriers in this handler.
> > Pending timers are safe because their pending test compares state set by
> > the VCPU thread itself to state acquired by the VCPU thread reading a host
> > sysreg itself. IOW, handlers making "set vcpu timer pending requests"
> > don't need to do anything but wake up the VCPU. Handlers that set some
> > sort of irq pending state, like vgic_v4_doorbell_handler(), do need to
> > worry about the visibility of that state though.
> > 
> > > 
> > > In that case, is the read barrier taken care of by prepare_to_swait in
> > > kvm_vcpu_block()?
> > 
> > Thanks for the bug report :-)
> > 
> > There's a barrier, but it's not properly paired. Currently we have
> > 
> >  VCPU                                        handler
> >  ----                                        -------
> >  for (;;) {
> >    WRITE_ONCE(task->state, INTERRUPTIBLE);   pending=true;
> >    smp_mb();                                 smp_wmb(); // kvm_make_request()
> >    if (pending) {                            WRITE_ONCE(vcpu->state, NORMAL);
> >      ... stop waiting ...
> >    }
> >    schedule();
> >  }
> > 
> > Proper barrier use with swait/swake should instead look like this
> > 
> >  VCPU                                        handler
> >  ----                                        -------
> >  for (;;) {
> >    WRITE_ONCE(task->state, INTERRUPTIBLE);
> >    smp_mb();
> >    if (READ_ONCE(task->state) == NORMAL) {   pending=true;
> >      smp_rmb();                              smp_wmb();
> >      if (pending)                            WRITE_ONCE(vcpu->state, NORMAL);
> >        ... stop waiting ...
> >      else
> >        continue;
> >    }
> >    schedule();
> >  }
> > 
> > But checking task state adds complexity and would only cover a small
> > window anyway (the window between prepare_to_swait() and
> > kvm_vcpu_check_block()). We need to cover a larger window, for this
> > particular case it's from kvm_arch_vcpu_blocking() to
> > kvm_vcpu_check_block(). Better use of VCPU requests is the answer:
> > 
> >  VCPU                                        handler
> >  ----                                        -------
> >  kvm_arch_vcpu_blocking();
> >  for (;;) {
> >    prepare_to_swait(); 
> >    if (test_bit(IRQ_PENDING)) {              pending=true;
> >      smp_rmb();                              smp_wmb();
> >      if (pending)                            set_bit(IRQ_PENDING);
> >        ... stop waiting ...
> >      else
> >        continue;
> >    }
> >    schedule();
> >  }
> > 
> > The handler side is covered by kvm_make_request() and the vcpu kick, so
> > this patch looks good. However we need a patch to kvm_arch_vcpu_runnable()
> > to fix the VCPU side.
> > 
> >  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> >  {
> > -       return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> > -               && !v->arch.power_off && !v->arch.pause);
> > +       if (v->arch.power_off || v->arch.pause)
> > +               return false;
> 
> Why don't we need to ensure reads of these flags are observable when
> waking up, if the thread waking us up had se the variables prior to
> issuing the wake up call?

They're "special". I analyzed the cases for them and I *think* they're
all safe. I can do another round, or perhaps better would be to replace
them in some way with VCPU requests to make the analysis consistent.

> 
> > +
> > +       if (kvm_test_request(KVM_REQ_IRQ_PENDING, vcpu)) {
> > +               smp_rmb();
> 
> This looks wrong.  What does this barrier ensure exactly?  Some kind of
> read following whoever called this function?

Yes, it's similar to how kvm_check_request() is written. Actually, I
should probably just copy that more completely, i.e.

 if (kvm_test_request(KVM_REQ_IRQ_PENDING, vcpu)) {
    /*
     * Ensure the rest of the request is visible to the caller.
     * Pairs with the smp_wmb in kvm_make_request.
     */
     smp_mb__after_atomic();
     return true;
 }

> 
> > +               return true;
> > +       }
> >  }
> > 
> > We can keep the kvm_vgic_vcpu_pending_irq() check, after the IRQ_PENDING
> > check, if there are cases where IRQ_PENDING wouldn't be set, but
> > kvm_vgic_vcpu_pending_irq() would return true. Either way we also get a
> > bit of an optimization with this fix.
> > 
> > I'll think/test some more, and then send a patch.
> > 
> 
> I'll review it more thoroughly then.

Thanks,
drew
Marc Zyngier Aug. 31, 2017, 12:18 p.m. UTC | #5
On 28/08/17 19:18, Christoffer Dall wrote:
> On Fri, Aug 04, 2017 at 08:44:04AM +0100, Marc Zyngier wrote:
>> On 31/07/17 18:26, Marc Zyngier wrote:
>>> When a vPE is not running, a VLPI being made pending results in a
>>> doorbell interrupt being delivered. Let's handle this interrupt
>>> and update the pending_last flag that indicates that VLPIs are
>>> pending. The corresponding vcpu is also kicked into action.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  virt/kvm/arm/vgic/vgic-v4.c | 34 ++++++++++++++++++++++++++++++++++
>>>  1 file changed, 34 insertions(+)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
>>> index 534d3051a078..6af3cde6d7d4 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v4.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v4.c
>>> @@ -21,6 +21,19 @@
>>>  
>>>  #include "vgic.h"
>>>  
>>> +static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
>>> +{
>>> +	struct kvm_vcpu *vcpu = info;
>>> +
>>> +	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
>>> +		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
>>> +		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>>> +		kvm_vcpu_kick(vcpu);
>>> +	}
>>
>> This code is so obviously broken that I completely overlooked it.
>>
>> If we have take a doorbell interrupt, then it means nothing was
>> otherwise pending (because we'd have been kicked out of the blocking
>> state, and will have masked the doorbell). So checking for pending
>> interrupts is pointless.
>>
>> Furthermore, calling kvm_vgic_vcpu_pending_irq() takes the ap_list
>> lock. If we take a doorbell interrupt while injecting a virtual
>> interrupt (from userspace, for example) on the same CPU, we end-up
>> in deadlock land. This would be solved by Christoffer's latest
>> crop of timer patches, but there is no point getting there the first
>> place.
>>
>> The patchlet below solves it:
>>
>> diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
>> index 15feb1151797..48e4d6ebeaa8 100644
>> --- a/virt/kvm/arm/vgic/vgic-v4.c
>> +++ b/virt/kvm/arm/vgic/vgic-v4.c
>> @@ -94,11 +94,9 @@ static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
>>  {
>>  	struct kvm_vcpu *vcpu = info;
>>  
>> -	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
>> -		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
>> -		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>> -		kvm_vcpu_kick(vcpu);
>> -	}
>> +	vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
>> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>> +	kvm_vcpu_kick(vcpu);
> 
> I don't think you need the request and kick, because if you're getting
> this doorbell, doesn't that also mean that the VCPU is not running in
> the guest and you simply need to make sure the VCPU thread gets
> scheduled again, so you could call kvm_vcpu_wake_up() instead.
> 
> Unless the request is there to ensure proper memory barriers around
> setting pending_last?

We definitely need some form of barrier here (the write needs to be
observed on the CPU we're waking up on), and I wanted to model this in a
similar way to our injection path (where we do have the make_request +
kick pattern).

After all, we really have an interrupt pending that deserves attention.
Is there any reason why we'd want to model it differently from the rest
of the injection path?

Thanks,

	M.
Christoffer Dall Aug. 31, 2017, 12:41 p.m. UTC | #6
On Thu, Aug 31, 2017 at 01:18:20PM +0100, Marc Zyngier wrote:
> On 28/08/17 19:18, Christoffer Dall wrote:
> > On Fri, Aug 04, 2017 at 08:44:04AM +0100, Marc Zyngier wrote:
> >> On 31/07/17 18:26, Marc Zyngier wrote:
> >>> When a vPE is not running, a VLPI being made pending results in a
> >>> doorbell interrupt being delivered. Let's handle this interrupt
> >>> and update the pending_last flag that indicates that VLPIs are
> >>> pending. The corresponding vcpu is also kicked into action.
> >>>
> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>> ---
> >>>  virt/kvm/arm/vgic/vgic-v4.c | 34 ++++++++++++++++++++++++++++++++++
> >>>  1 file changed, 34 insertions(+)
> >>>
> >>> diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> >>> index 534d3051a078..6af3cde6d7d4 100644
> >>> --- a/virt/kvm/arm/vgic/vgic-v4.c
> >>> +++ b/virt/kvm/arm/vgic/vgic-v4.c
> >>> @@ -21,6 +21,19 @@
> >>>  
> >>>  #include "vgic.h"
> >>>  
> >>> +static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> >>> +{
> >>> +	struct kvm_vcpu *vcpu = info;
> >>> +
> >>> +	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> >>> +		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> >>> +		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >>> +		kvm_vcpu_kick(vcpu);
> >>> +	}
> >>
> >> This code is so obviously broken that I completely overlooked it.
> >>
> >> If we have take a doorbell interrupt, then it means nothing was
> >> otherwise pending (because we'd have been kicked out of the blocking
> >> state, and will have masked the doorbell). So checking for pending
> >> interrupts is pointless.
> >>
> >> Furthermore, calling kvm_vgic_vcpu_pending_irq() takes the ap_list
> >> lock. If we take a doorbell interrupt while injecting a virtual
> >> interrupt (from userspace, for example) on the same CPU, we end-up
> >> in deadlock land. This would be solved by Christoffer's latest
> >> crop of timer patches, but there is no point getting there the first
> >> place.
> >>
> >> The patchlet below solves it:
> >>
> >> diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
> >> index 15feb1151797..48e4d6ebeaa8 100644
> >> --- a/virt/kvm/arm/vgic/vgic-v4.c
> >> +++ b/virt/kvm/arm/vgic/vgic-v4.c
> >> @@ -94,11 +94,9 @@ static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
> >>  {
> >>  	struct kvm_vcpu *vcpu = info;
> >>  
> >> -	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
> >> -		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> >> -		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >> -		kvm_vcpu_kick(vcpu);
> >> -	}
> >> +	vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
> >> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >> +	kvm_vcpu_kick(vcpu);
> > 
> > I don't think you need the request and kick, because if you're getting
> > this doorbell, doesn't that also mean that the VCPU is not running in
> > the guest and you simply need to make sure the VCPU thread gets
> > scheduled again, so you could call kvm_vcpu_wake_up() instead.
> > 
> > Unless the request is there to ensure proper memory barriers around
> > setting pending_last?
> 
> We definitely need some form of barrier here (the write needs to be
> observed on the CPU we're waking up on), and I wanted to model this in a
> similar way to our injection path (where we do have the make_request +
> kick pattern).
> 
> After all, we really have an interrupt pending that deserves attention.
> Is there any reason why we'd want to model it differently from the rest
> of the injection path?
> 

The difference is that you'll never need this if the VCPU is running,
this is only when the VCPU is on a waitqueue and sleeping, and therefore
you don't actually need a kick, but you just need to wake up the VCPU
thread.

But kvm_vcpu_kick() also wakes up, it's just a question of using the
most appropriate hammer, making it clear what exactly is going on.

But, as Drew pointed out, we use the request framework to ensure proper
barriers (when he fixes it), after waking up a VCPU, so in this cae it's
to make sure the VCPU thread observed pending_last==true after being
woken up.

Interestingly, we don't need that for the timer, since we don't change
random in-memory values, but we read the physical counter which is the
thing that changes when waking up the thread.

So, sorry for a lot of noise, your code is fine, and Drew fixes KVM, I
think.

Thanks,
-Christoffer
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic/vgic-v4.c b/virt/kvm/arm/vgic/vgic-v4.c
index 15feb1151797..48e4d6ebeaa8 100644
--- a/virt/kvm/arm/vgic/vgic-v4.c
+++ b/virt/kvm/arm/vgic/vgic-v4.c
@@ -94,11 +94,9 @@  static irqreturn_t vgic_v4_doorbell_handler(int irq, void *info)
 {
 	struct kvm_vcpu *vcpu = info;
 
-	if (!kvm_vgic_vcpu_pending_irq(vcpu)) {
-		vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
-		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
-		kvm_vcpu_kick(vcpu);
-	}
+	vcpu->arch.vgic_cpu.vgic_v3.its_vpe.pending_last = true;
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
+	kvm_vcpu_kick(vcpu);
 
 	return IRQ_HANDLED;
 }