[v3] xen/arm: fix rank/vgic lock inversion bug

Message ID	1483486167-24607-1-git-send-email-sstabellini@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Stefano Stabellini <sstabellini@kernel.org> To: julien.grall@arm.com Date: Tue, 3 Jan 2017 15:29:27 -0800 Message-Id: <1483486167-24607-1-git-send-email-sstabellini@kernel.org> Cc: xen-devel@lists.xenproject.org, sstabellini@kernel.org Subject: [Xen-devel] [PATCH v3] xen/arm: fix rank/vgic lock inversion bug Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Stefano Stabellini Jan. 3, 2017, 11:29 p.m. UTC

Always set the new physical irq affinity at the beginning of
vgic_migrate_irq, in all cases.

No need to set physical irq affinity in gic_update_one_lr anymore,
solving the lock inversion problem.

After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is
possible to receive a physical interrupt on pcpu 1, which Xen is
supposed to inject into vcpu 1, before the LR on pcpu 0 has been
cleared. In this case the irq is still marked as
GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on
vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1
simultaneously, drop the interrupt.

Coverity-ID: 1381855
Coverity-ID: 1381853

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/arm/gic.c  |  6 +-----
 xen/arch/arm/vgic.c | 19 +++++++++++--------
 2 files changed, 12 insertions(+), 13 deletions(-)

Julien Grall Jan. 27, 2017, 6:11 p.m. UTC | #1

(CC Artem as ARM coverity admin)

Hi Stefano,

On 03/01/17 23:29, Stefano Stabellini wrote:
> Always set the new physical irq affinity at the beginning of
> vgic_migrate_irq, in all cases.
>
> No need to set physical irq affinity in gic_update_one_lr anymore,
> solving the lock inversion problem.
>
> After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is
> possible to receive a physical interrupt on pcpu 1, which Xen is
> supposed to inject into vcpu 1, before the LR on pcpu 0 has been
> cleared. In this case the irq is still marked as
> GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on
> vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1
> simultaneously, drop the interrupt.

I am not sure to understand how this is related to the fix of the 
rank/vgic lock inversion bug. Am I right to think that this bug was 
already present?

>
> Coverity-ID: 1381855
> Coverity-ID: 1381853

I am bit confused... somehow those numbers disappeared from the main 
Coverity page. Which means Coverity think they have been fixed. However 
this patch is not yet upstreamed. Artem, are you testing Xen upstream?

>
> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
> ---
>  xen/arch/arm/gic.c  |  6 +-----
>  xen/arch/arm/vgic.c | 19 +++++++++++--------
>  2 files changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index a5348f2..767fc9e 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -504,11 +504,7 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>              gic_raise_guest_irq(v, irq, p->priority);
>          else {
>              list_del_init(&p->inflight);
> -            if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> -            {
> -                struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
> -                irq_set_affinity(p->desc, cpumask_of(v_target->processor));
> -            }
> +            clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
>          }
>      }
>  }
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 364d5f0..11ffb9b 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -264,20 +264,17 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq)
>      if ( p->desc == NULL )
>          return;
>
> +    irq_set_affinity(p->desc, cpumask_of(new->processor));
> +
>      /* migration already in progress, no need to do anything */
>      if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
>          return;
> +    if ( list_empty(&p->inflight) )
> +        return;
>
>      perfc_incr(vgic_irq_migrates);
>
>      spin_lock_irqsave(&old->arch.vgic.lock, flags);
> -
> -    if ( list_empty(&p->inflight) )
> -    {
> -        irq_set_affinity(p->desc, cpumask_of(new->processor));
> -        spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
> -        return;
> -    }
>      /* If the IRQ is still lr_pending, re-inject it to the new vcpu */
>      if ( !list_empty(&p->lr_queue) )
>      {
> @@ -286,7 +283,6 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq)
>          list_del_init(&p->inflight);
>          irq_set_affinity(p->desc, cpumask_of(new->processor));

This call is not necessary anymore.

>          spin_unlock_irqrestore(&old->arch.vgic.lock, flags);

Now all the paths finish by spin_unlock; return; Can you send a patch do 
the cleanup?

> -        vgic_vcpu_inject_irq(new, irq);

The comment on top of the if says: "If the IRQ is still lr_pending, 
re-inject it to the new vCPU". However, you remove interrupt from the 
list but never re-inject it.

Looking at the context, I think we should keep the call to 
vgic_vcpu_inject_irq otherwise we lost the interrupt for ever.

>          return;
>      }
>      /* if the IRQ is in a GICH_LR register, set GIC_IRQ_GUEST_MIGRATING
> @@ -495,6 +491,13 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int virq)
>          return;
>      }
>
> +    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
> +    {
> +        /* Drop the interrupt, because it is still inflight on another vcpu */
> +        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);

If I understand correctly, the problem arise because LRs are cleared 
lazily and the other vCPU is still running. It is a short window and I 
don't think should happen often.

I would suggest to kick the other vCPU with an SGI to get the LRs 
cleared. Any opinions?

> +        return;
> +    }
> +
>      set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
>
>      if ( !list_empty(&n->inflight) )
>

Cheers,

Stefano Stabellini Jan. 31, 2017, 11:49 p.m. UTC | #2

On Fri, 27 Jan 2017, Julien Grall wrote:
> (CC Artem as ARM coverity admin)
> 
> Hi Stefano,
> 
> On 03/01/17 23:29, Stefano Stabellini wrote:
> > Always set the new physical irq affinity at the beginning of
> > vgic_migrate_irq, in all cases.
> > 
> > No need to set physical irq affinity in gic_update_one_lr anymore,
> > solving the lock inversion problem.
> > 
> > After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is
> > possible to receive a physical interrupt on pcpu 1, which Xen is
> > supposed to inject into vcpu 1, before the LR on pcpu 0 has been
> > cleared. In this case the irq is still marked as
> > GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on
> > vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1
> > simultaneously, drop the interrupt.
> 
> I am not sure to understand how this is related to the fix of the rank/vgic
> lock inversion bug. Am I right to think that this bug was already present?

Yes, it is related: without this patch, this situation cannot happen,
because irq_set_affinity is called from gic_update_one_lr.


> > Coverity-ID: 1381855
> > Coverity-ID: 1381853
> 
> I am bit confused... somehow those numbers disappeared from the main Coverity
> page. Which means Coverity think they have been fixed. However this patch is
> not yet upstreamed. Artem, are you testing Xen upstream?
> 
> > 
> > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
> > ---
> >  xen/arch/arm/gic.c  |  6 +-----
> >  xen/arch/arm/vgic.c | 19 +++++++++++--------
> >  2 files changed, 12 insertions(+), 13 deletions(-)
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index a5348f2..767fc9e 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -504,11 +504,7 @@ static void gic_update_one_lr(struct vcpu *v, int i)
> >              gic_raise_guest_irq(v, irq, p->priority);
> >          else {
> >              list_del_init(&p->inflight);
> > -            if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> > -            {
> > -                struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
> > -                irq_set_affinity(p->desc, cpumask_of(v_target->processor));
> > -            }
> > +            clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
> >          }
> >      }
> >  }
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 364d5f0..11ffb9b 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -264,20 +264,17 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu
> > *new, unsigned int irq)
> >      if ( p->desc == NULL )
> >          return;
> > 
> > +    irq_set_affinity(p->desc, cpumask_of(new->processor));
> > +
> >      /* migration already in progress, no need to do anything */
> >      if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> >          return;
> > +    if ( list_empty(&p->inflight) )
> > +        return;
> > 
> >      perfc_incr(vgic_irq_migrates);
> > 
> >      spin_lock_irqsave(&old->arch.vgic.lock, flags);
> > -
> > -    if ( list_empty(&p->inflight) )
> > -    {
> > -        irq_set_affinity(p->desc, cpumask_of(new->processor));
> > -        spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
> > -        return;
> > -    }
> >      /* If the IRQ is still lr_pending, re-inject it to the new vcpu */
> >      if ( !list_empty(&p->lr_queue) )
> >      {
> > @@ -286,7 +283,6 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu
> > *new, unsigned int irq)
> >          list_del_init(&p->inflight);
> >          irq_set_affinity(p->desc, cpumask_of(new->processor));
> 
> This call is not necessary anymore.
>
> >          spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
> 
> Now all the paths finish by spin_unlock; return; Can you send a patch do the
> cleanup?

Sure


> > -        vgic_vcpu_inject_irq(new, irq);
> 
> The comment on top of the if says: "If the IRQ is still lr_pending, re-inject
> it to the new vCPU". However, you remove interrupt from the list but never
> re-inject it.
> 
> Looking at the context, I think we should keep the call to
> vgic_vcpu_inject_irq otherwise we lost the interrupt for ever.

It looks like I made a mistake while generating the patch: I meant to
remove the redundant irq_set_affinity call, instead of the necessary
vgic_vcpu_inject_irq call. I don't know how it happened, sorry. Thanks
for catching the error.

 
> >          return;
> >      }
> >      /* if the IRQ is in a GICH_LR register, set GIC_IRQ_GUEST_MIGRATING
> > @@ -495,6 +491,13 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int
> > virq)
> >          return;
> >      }
> > 
> > +    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
> > +    {
> > +        /* Drop the interrupt, because it is still inflight on another vcpu
> > */
> > +        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
> 
> If I understand correctly, the problem arise because LRs are cleared lazily
> and the other vCPU is still running. It is a short window and I don't think
> should happen often.
> 
> I would suggest to kick the other vCPU with an SGI to get the LRs cleared. Any
> opinions?

I am fine with sending an SGI, but we still need
http://marc.info/?l=xen-devel&m=148237295020488 to know the destination
of the SGI.

The problem is what happens next: is it safe to busy-wait until the LR
is cleared?  It is safe only if we cannot receive a second interrupt
notification while the first one has not been deactivated yet by the
guest:

- guest vcpu0 reads GICC_IAR for virq0
- guest vcpu1 change target for virq0 to vcpu1
- Xen traps the write and changes the physical affinity of virq0 (pirq0)
  to vcpu1 (pcpu1)
- Xen receives a second virq0 (pirq0) interrupt notification in pcpu1,
  but virq0 is still MIGRATING and guest vcpu0 has not EOIed it yet
- Xen sends an SGI to vcpu0 (pcpu0), but the interrupt has not been
  EOIed yet, thus, the LR doesn't get cleared

Can this happen? I read 3.2.3 and 3.2.4 a few times but I am still
unsure, it looks like it can. If it can happen, we cannot busy-wait.


> > +        return;
> > +    }
> > +
> >      set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
> > 
> >      if ( !list_empty(&n->inflight) )

Julien Grall Feb. 1, 2017, 4:31 p.m. UTC | #3

Hi Stefano,

CC Andre to get more feedback.

On 31/01/2017 23:49, Stefano Stabellini wrote:
> On Fri, 27 Jan 2017, Julien Grall wrote:
>> On 03/01/17 23:29, Stefano Stabellini wrote:
>>> Always set the new physical irq affinity at the beginning of
>>> vgic_migrate_irq, in all cases.
>>>
>>> No need to set physical irq affinity in gic_update_one_lr anymore,
>>> solving the lock inversion problem.
>>>
>>> After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is
>>> possible to receive a physical interrupt on pcpu 1, which Xen is
>>> supposed to inject into vcpu 1, before the LR on pcpu 0 has been
>>> cleared. In this case the irq is still marked as
>>> GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on
>>> vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1
>>> simultaneously, drop the interrupt.
>>
>> I am not sure to understand how this is related to the fix of the rank/vgic
>> lock inversion bug. Am I right to think that this bug was already present?
>
> Yes, it is related: without this patch, this situation cannot happen,
> because irq_set_affinity is called from gic_update_one_lr.

I am not convinced about that. gic_update_one_lr will take the lock of 
the current vCPU whilst vgic_vcpu_inject_irq is taking the lock of the 
vCPU where the vIRQ has been routed.

The function the memory access of the list_del_init could potentially be 
re-ordered with irq_set_affinity.

However, what is saving us is the memory barrier hidden in spin_lock 
call in gicv{2,3}_irq_set_affinity and the other one in 
vgic_vcpu_inject_irq.

So I agree this is currently working, but I would argue it is by luck.

[...]

>>> -        vgic_vcpu_inject_irq(new, irq);
>>
>> The comment on top of the if says: "If the IRQ is still lr_pending, re-inject
>> it to the new vCPU". However, you remove interrupt from the list but never
>> re-inject it.
>>
>> Looking at the context, I think we should keep the call to
>> vgic_vcpu_inject_irq otherwise we lost the interrupt for ever.
>
> It looks like I made a mistake while generating the patch: I meant to
> remove the redundant irq_set_affinity call, instead of the necessary
> vgic_vcpu_inject_irq call. I don't know how it happened, sorry. Thanks
> for catching the error.
>
>
>>>          return;
>>>      }
>>>      /* if the IRQ is in a GICH_LR register, set GIC_IRQ_GUEST_MIGRATING
>>> @@ -495,6 +491,13 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int
>>> virq)
>>>          return;
>>>      }
>>>
>>> +    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
>>> +    {
>>> +        /* Drop the interrupt, because it is still inflight on another vcpu
>>> */
>>> +        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
>>
>> If I understand correctly, the problem arise because LRs are cleared lazily
>> and the other vCPU is still running. It is a short window and I don't think
>> should happen often.
>>
>> I would suggest to kick the other vCPU with an SGI to get the LRs cleared. Any
>> opinions?
>
> I am fine with sending an SGI, but we still need
> http://marc.info/?l=xen-devel&m=148237295020488 to know the destination
> of the SGI.

Hmmm correct.

>
> The problem is what happens next: is it safe to busy-wait until the LR
> is cleared?

Technically we already have this problem as LRs are cleared with the 
vgic lock taken.

So if by mistake you receive the interrupt on the wrong pCPU, you may 
busy wait on the spinlock for some time.

Note, I am not saying we should do the same here :). I would argue it 
would depends how much time you expect to be taken for clearing the LRs.

 > It is safe only if we cannot receive a second interrupt
> notification while the first one has not been deactivated yet by the
> guest:
>
> - guest vcpu0 reads GICC_IAR for virq0
> - guest vcpu1 change target for virq0 to vcpu1
> - Xen traps the write and changes the physical affinity of virq0 (pirq0)
>   to vcpu1 (pcpu1)
> - Xen receives a second virq0 (pirq0) interrupt notification in pcpu1,
>   but virq0 is still MIGRATING and guest vcpu0 has not EOIed it yet
> - Xen sends an SGI to vcpu0 (pcpu0), but the interrupt has not been
>   EOIed yet, thus, the LR doesn't get cleared
>
> Can this happen? I read 3.2.3 and 3.2.4a few times but I am still
> unsure, it looks like it can. If it can happen, we cannot busy-wait.

With 3.2.3 and 3.2.4, you are referring to the spec IHI 0048B, right?

I will leave aside PPIs and SGIs as they target one specific CPU and 
migration will never happen.

So we have to worry about SPIs and LPIs (thought they are not yet 
supported). From the note in 3.2.3:
"For any processor, if an interrupt is active and pending, the GIC does 
not signal an interrupt exception request
for the interrupt to any processor until the active status is cleared."

Given that SPIs have an activate state and will be shared, this scenario 
cannot happen.

For LPIs, there is no activate state. So as soon as they are EOIed, they 
might come up again. Depending on how will we handle irq migration, your 
scenario will become true. I am not sure if we should take into account 
LPIs right now.

To be honest, I don't much like the idea of kicking the other vCPU. But 
I don't have a better idea in order to clear the LRs. I have CCed Andre 
who has more knowledge on the GIC than me. He may have a clever idea 
here :).

Cheers,

Stefano Stabellini Feb. 1, 2017, 11:23 p.m. UTC | #4

On Wed, 1 Feb 2017, Julien Grall wrote:
> Hi Stefano,
> 
> CC Andre to get more feedback.
> 
> On 31/01/2017 23:49, Stefano Stabellini wrote:
> > On Fri, 27 Jan 2017, Julien Grall wrote:
> > > On 03/01/17 23:29, Stefano Stabellini wrote:
> > > > Always set the new physical irq affinity at the beginning of
> > > > vgic_migrate_irq, in all cases.
> > > > 
> > > > No need to set physical irq affinity in gic_update_one_lr anymore,
> > > > solving the lock inversion problem.
> > > > 
> > > > After migrating an interrupt from vcpu/pcpu 0 to vcpu/pcpu 1, it is
> > > > possible to receive a physical interrupt on pcpu 1, which Xen is
> > > > supposed to inject into vcpu 1, before the LR on pcpu 0 has been
> > > > cleared. In this case the irq is still marked as
> > > > GIC_IRQ_GUEST_MIGRATING, and struct pending_irq is still "inflight" on
> > > > vcpu 0. As the irq cannot be "inflight" on vcpu 0 and vcpu 1
> > > > simultaneously, drop the interrupt.
> > > 
> > > I am not sure to understand how this is related to the fix of the
> > > rank/vgic
> > > lock inversion bug. Am I right to think that this bug was already present?
> > 
> > Yes, it is related: without this patch, this situation cannot happen,
> > because irq_set_affinity is called from gic_update_one_lr.
> 
> I am not convinced about that. gic_update_one_lr will take the lock of the
> current vCPU whilst vgic_vcpu_inject_irq is taking the lock of the vCPU where
> the vIRQ has been routed.
> 
> The function the memory access of the list_del_init could potentially be
> re-ordered with irq_set_affinity.
> 
> However, what is saving us is the memory barrier hidden in spin_lock call in
> gicv{2,3}_irq_set_affinity and the other one in vgic_vcpu_inject_irq.
> 
> So I agree this is currently working, but I would argue it is by luck.
 
I agree we should have a memory barrier after list_del_init, I think I
even mentioned it in one of the past emails on related threads. One way
or another, we'll make the problem go away with this patch (or other
implementations of it).


> > > > -        vgic_vcpu_inject_irq(new, irq);
> > > 
> > > The comment on top of the if says: "If the IRQ is still lr_pending,
> > > re-inject
> > > it to the new vCPU". However, you remove interrupt from the list but never
> > > re-inject it.
> > > 
> > > Looking at the context, I think we should keep the call to
> > > vgic_vcpu_inject_irq otherwise we lost the interrupt for ever.
> > 
> > It looks like I made a mistake while generating the patch: I meant to
> > remove the redundant irq_set_affinity call, instead of the necessary
> > vgic_vcpu_inject_irq call. I don't know how it happened, sorry. Thanks
> > for catching the error.
> > 
> > 
> > > >          return;
> > > >      }
> > > >      /* if the IRQ is in a GICH_LR register, set GIC_IRQ_GUEST_MIGRATING
> > > > @@ -495,6 +491,13 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned
> > > > int
> > > > virq)
> > > >          return;
> > > >      }
> > > > 
> > > > +    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
> > > > +    {
> > > > +        /* Drop the interrupt, because it is still inflight on another
> > > > vcpu
> > > > */
> > > > +        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
> > > 
> > > If I understand correctly, the problem arise because LRs are cleared
> > > lazily
> > > and the other vCPU is still running. It is a short window and I don't
> > > think
> > > should happen often.
> > > 
> > > I would suggest to kick the other vCPU with an SGI to get the LRs cleared.
> > > Any
> > > opinions?
> > 
> > I am fine with sending an SGI, but we still need
> > http://marc.info/?l=xen-devel&m=148237295020488 to know the destination
> > of the SGI.
> 
> Hmmm correct.
> 
> > 
> > The problem is what happens next: is it safe to busy-wait until the LR
> > is cleared?
> 
> Technically we already have this problem as LRs are cleared with the vgic lock
> taken.
> 
> So if by mistake you receive the interrupt on the wrong pCPU, you may busy
> wait on the spinlock for some time.
>
> Note, I am not saying we should do the same here :). I would argue it would
> depends how much time you expect to be taken for clearing the LRs.

It can take an arbitrarily long time, I think. We cannot rely on it
being short, unless below.


> > It is safe only if we cannot receive a second interrupt
> > notification while the first one has not been deactivated yet by the
> > guest:
> > 
> > - guest vcpu0 reads GICC_IAR for virq0
> > - guest vcpu1 change target for virq0 to vcpu1
> > - Xen traps the write and changes the physical affinity of virq0 (pirq0)
> >   to vcpu1 (pcpu1)
> > - Xen receives a second virq0 (pirq0) interrupt notification in pcpu1,
> >   but virq0 is still MIGRATING and guest vcpu0 has not EOIed it yet
> > - Xen sends an SGI to vcpu0 (pcpu0), but the interrupt has not been
> >   EOIed yet, thus, the LR doesn't get cleared
> > 
> > Can this happen? I read 3.2.3 and 3.2.4a few times but I am still
> > unsure, it looks like it can. If it can happen, we cannot busy-wait.
> 
> With 3.2.3 and 3.2.4, you are referring to the spec IHI 0048B, right?

Right


> I will leave aside PPIs and SGIs as they target one specific CPU and migration
> will never happen.

Yes


> So we have to worry about SPIs and LPIs (thought they are not yet supported).
> From the note in 3.2.3:
> "For any processor, if an interrupt is active and pending, the GIC does not
> signal an interrupt exception request
> for the interrupt to any processor until the active status is cleared."
> 
> Given that SPIs have an activate state and will be shared, this scenario
> cannot happen.

Yes, but just above:

  A GICv1 implementation might ensure that only one processor can make a
  1-N interrupt active, removing the requirement for a lock on the ISR.
  This is not required by the architecture, and generic GIC code must not
  rely on this behavior.

then, the spec also says:

  The GIC maintains a state machine for each supported interrupt on each
  CPU interface.

So I am thinking that the statement you quoted wouldn't apply, because
the second interrupt wouldn't mark the state of the interrupt as "active
and pending" because, the interrupt being targeted to another processor,
it would simply mark it as "pending" in the CPU interface of the other
processor.


> For LPIs, there is no activate state. So as soon as they are EOIed, they might
> come up again. Depending on how will we handle irq migration, your scenario
> will become true. I am not sure if we should take into account LPIs right now.
> 
> To be honest, I don't much like the idea of kicking the other vCPU. But I
> don't have a better idea in order to clear the LRs.

Me neither, that's why I was proposing a different solution instead. We
still have the option to take the right lock in vgic_migrate_irq: 

http://marc.info/?l=xen-devel&m=148237289620471

The code is more complex, but I think it's safe in all cases.

Julien Grall Feb. 2, 2017, 12:10 p.m. UTC | #5

Hi Stefano,

On 01/02/17 23:23, Stefano Stabellini wrote:
> On Wed, 1 Feb 2017, Julien Grall wrote:
>> On 31/01/2017 23:49, Stefano Stabellini wrote:
>>> On Fri, 27 Jan 2017, Julien Grall wrote:
>>>> On 03/01/17 23:29, Stefano Stabellini wrote:
>> So we have to worry about SPIs and LPIs (thought they are not yet supported).
>> From the note in 3.2.3:
>> "For any processor, if an interrupt is active and pending, the GIC does not
>> signal an interrupt exception request
>> for the interrupt to any processor until the active status is cleared."
>>
>> Given that SPIs have an activate state and will be shared, this scenario
>> cannot happen.
>
> Yes, but just above:
>
>   A GICv1 implementation might ensure that only one processor can make a
>   1-N interrupt active, removing the requirement for a lock on the ISR.
>   This is not required by the architecture, and generic GIC code must not
>   rely on this behavior.

That's for a GICv1 implementation. However, only GICv2 and onwards 
support virtualization.

>
> then, the spec also says:
>
>   The GIC maintains a state machine for each supported interrupt on each
>   CPU interface.
>
> So I am thinking that the statement you quoted wouldn't apply, because
> the second interrupt wouldn't mark the state of the interrupt as "active
> and pending" because, the interrupt being targeted to another processor,
> it would simply mark it as "pending" in the CPU interface of the other
> processor.

I don't have the proper quote in the spec, but I can confirm the 
interrupt can only be active on a single CPU at the time. It is also 
implied in the description of the 1-N model (see 1.4.3): " Only one 
processor handles this interrupt".

>
>
>> For LPIs, there is no activate state. So as soon as they are EOIed, they might
>> come up again. Depending on how will we handle irq migration, your scenario
>> will become true. I am not sure if we should take into account LPIs right now.
>>
>> To be honest, I don't much like the idea of kicking the other vCPU. But I
>> don't have a better idea in order to clear the LRs.
>
> Me neither, that's why I was proposing a different solution instead. We
> still have the option to take the right lock in vgic_migrate_irq:
>
> http://marc.info/?l=xen-devel&m=148237289620471
>
> The code is more complex, but I think it's safe in all cases.

It is not only complex but also really confusing as we would have a 
variable protected by two locks, both lock does not need to be taken at 
the same time.

I may have an idea to avoid completely the lock in vgic_get_target_vcpu. 
The lock is only here to read the target vcpu in the rank, the rest does 
not need a lock, right? So could not we read the target vcpu atomically 
instead?

Cheers,

Artem Mygaiev Feb. 2, 2017, 2:10 p.m. UTC | #6

Hello Julien, Stefano

[coverity-related question]

On 27.01.17 20:11, Julien Grall wrote:
> (CC Artem as ARM coverity admin)
>> Coverity-ID: 1381855
>> Coverity-ID: 1381853
>
> I am bit confused... somehow those numbers disappeared from the main Coverity page. Which means Coverity think they have been fixed. However this patch is not yet upstreamed. Artem, are you testing Xen upstream?
I have re-run the Coverity Scan with upstream but still those are still marked as fixed in Coverity due to code changes, i.e. it is not forced to "fixed" state manually.

I went through the code and didn't notice any changes that would lead to this and I do not understand how this got fixed. I can try and bisect unless you have some ideas...

Stefano Stabellini Feb. 2, 2017, 9:57 p.m. UTC | #7

On Thu, 2 Feb 2017, Artem Mygaiev wrote:
> Hello Julien, Stefano
> 
> [coverity-related question]
> 
> On 27.01.17 20:11, Julien Grall wrote:
> > (CC Artem as ARM coverity admin)
> >> Coverity-ID: 1381855
> >> Coverity-ID: 1381853
> >
> > I am bit confused... somehow those numbers disappeared from the main Coverity page. Which means Coverity think they have been fixed. However this patch is not yet upstreamed. Artem, are you testing Xen upstream?
> I have re-run the Coverity Scan with upstream but still those are still marked as fixed in Coverity due to code changes, i.e. it is not forced to "fixed" state manually.
> 
> I went through the code and didn't notice any changes that would lead to this and I do not understand how this got fixed. I can try and bisect unless you have some ideas...

Hi Artem, thanks for looking into it, but I don't know if it's worth
spending too much time on finding the reason why Coverity isn't spotting
it anymore: I think we understand the problem well enough to fix it
properly.

Stefano Stabellini Feb. 2, 2017, 10:56 p.m. UTC | #8

On Thu, 2 Feb 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 01/02/17 23:23, Stefano Stabellini wrote:
> > On Wed, 1 Feb 2017, Julien Grall wrote:
> > > On 31/01/2017 23:49, Stefano Stabellini wrote:
> > > > On Fri, 27 Jan 2017, Julien Grall wrote:
> > > > > On 03/01/17 23:29, Stefano Stabellini wrote:
> > > So we have to worry about SPIs and LPIs (thought they are not yet
> > > supported).
> > > From the note in 3.2.3:
> > > "For any processor, if an interrupt is active and pending, the GIC does
> > > not
> > > signal an interrupt exception request
> > > for the interrupt to any processor until the active status is cleared."
> > > 
> > > Given that SPIs have an activate state and will be shared, this scenario
> > > cannot happen.
> > 
> > Yes, but just above:
> > 
> >   A GICv1 implementation might ensure that only one processor can make a
> >   1-N interrupt active, removing the requirement for a lock on the ISR.
> >   This is not required by the architecture, and generic GIC code must not
> >   rely on this behavior.
> 
> That's for a GICv1 implementation. However, only GICv2 and onwards support
> virtualization.
> 
> > 
> > then, the spec also says:
> > 
> >   The GIC maintains a state machine for each supported interrupt on each
> >   CPU interface.
> > 
> > So I am thinking that the statement you quoted wouldn't apply, because
> > the second interrupt wouldn't mark the state of the interrupt as "active
> > and pending" because, the interrupt being targeted to another processor,
> > it would simply mark it as "pending" in the CPU interface of the other
> > processor.
> 
> I don't have the proper quote in the spec, but I can confirm the interrupt can
> only be active on a single CPU at the time. It is also implied in the
> description of the 1-N model (see 1.4.3): " Only one processor handles this
> interrupt".

Thanks, in that case I feel much better about this approach :-)


> > > For LPIs, there is no activate state. So as soon as they are EOIed, they
> > > might
> > > come up again. Depending on how will we handle irq migration, your
> > > scenario
> > > will become true. I am not sure if we should take into account LPIs right
> > > now.
> > > 
> > > To be honest, I don't much like the idea of kicking the other vCPU. But I
> > > don't have a better idea in order to clear the LRs.

What if we skip the interrupt if it's an LPI, and we kick the other vcpu
and wait if it's an SPI? Software should be more tolerant of lost
interrupts in case of LPIs. We are also considering rate-limiting them
anyway, which implies the possibility of skipping some LPIs at times.


> > Me neither, that's why I was proposing a different solution instead. We
> > still have the option to take the right lock in vgic_migrate_irq:
> > 
> > http://marc.info/?l=xen-devel&m=148237289620471
> > 
> > The code is more complex, but I think it's safe in all cases.
> 
> It is not only complex but also really confusing as we would have a variable
> protected by two locks, both lock does not need to be taken at the same time.

Yes, but there is a large in-code comment about it :-)


> I may have an idea to avoid completely the lock in vgic_get_target_vcpu. The
> lock is only here to read the target vcpu in the rank, the rest does not need
> a lock, right? So could not we read the target vcpu atomically instead?

Yes, I think that could solve the lock inversion bug:

- remove the spin_lock in vgic_get_target_vcpu and replace it with an atomic read
- replace rank->vcpu writes with atomic writes

However, it would not solve the other issu affecting the current code:
http://marc.info/?l=xen-devel&m=148218667104072, which is related to the
problem you mentioned about irq_set_affinity and list_del_init in
gic_update_one_lr not being separated by a barrier. Arguably, that bug
could be solved separately. It would be easier to solve that bug by one
of these approaches:

1) use the same (vgic) lock in gic_update_one_lr and vgic_migrate_irq
2) remove the irq_set_affinity call from gic_update_one_lr

where 1) is the approach taken by v2 of this series and 2) is the
approach taken by this patch.

Julien Grall Feb. 8, 2017, 7:08 p.m. UTC | #9

Hi Stefano,

On 02/02/17 22:56, Stefano Stabellini wrote:
> On Thu, 2 Feb 2017, Julien Grall wrote:
>> On 01/02/17 23:23, Stefano Stabellini wrote:
>>> On Wed, 1 Feb 2017, Julien Grall wrote:
>>>> On 31/01/2017 23:49, Stefano Stabellini wrote:
>>>>> On Fri, 27 Jan 2017, Julien Grall wrote:
>>>>>> On 03/01/17 23:29, Stefano Stabellini wrote:
>>>> For LPIs, there is no activate state. So as soon as they are EOIed, they
>>>> might
>>>> come up again. Depending on how will we handle irq migration, your
>>>> scenario
>>>> will become true. I am not sure if we should take into account LPIs right
>>>> now.
>>>>
>>>> To be honest, I don't much like the idea of kicking the other vCPU. But I
>>>> don't have a better idea in order to clear the LRs.
>
> What if we skip the interrupt if it's an LPI, and we kick the other vcpu
> and wait if it's an SPI? Software should be more tolerant of lost
> interrupts in case of LPIs. We are also considering rate-limiting them
> anyway, which implies the possibility of skipping some LPIs at times.

I will skip the answer here, as your suggestion to solve the inversion 
lock sounds better.

>
>
>>> Me neither, that's why I was proposing a different solution instead. We
>>> still have the option to take the right lock in vgic_migrate_irq:
>>>
>>> http://marc.info/?l=xen-devel&m=148237289620471
>>>
>>> The code is more complex, but I think it's safe in all cases.
>>
>> It is not only complex but also really confusing as we would have a variable
>> protected by two locks, both lock does not need to be taken at the same time.
>
> Yes, but there is a large in-code comment about it :-)
>
>
>> I may have an idea to avoid completely the lock in vgic_get_target_vcpu. The
>> lock is only here to read the target vcpu in the rank, the rest does not need
>> a lock, right? So could not we read the target vcpu atomically instead?
>
> Yes, I think that could solve the lock inversion bug:
>
> - remove the spin_lock in vgic_get_target_vcpu and replace it with an atomic read
> - replace rank->vcpu writes with atomic writes
>
> However, it would not solve the other issu affecting the current code:
> http://marc.info/?l=xen-devel&m=148218667104072, which is related to the
> problem you mentioned about irq_set_affinity and list_del_init in
> gic_update_one_lr not being separated by a barrier. Arguably, that bug
> could be solved separately.  It would be easier to solve that bug by one
> of these approaches:
>
> 1) use the same (vgic) lock in gic_update_one_lr and vgic_migrate_irq
> 2) remove the irq_set_affinity call from gic_update_one_lr
>
> where 1) is the approach taken by v2 of this series and 2) is the
> approach taken by this patch.

I think there is a easier solution. You want to make sure that any 
writes (particularly list_del_init) before routing the IRQ are visible 
to the other processors. So a simple barrier in both side should be 
enough here.

Cheers,

Stefano Stabellini Feb. 11, 2017, 1:37 a.m. UTC | #10

On Wed, 8 Feb 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 02/02/17 22:56, Stefano Stabellini wrote:
> > On Thu, 2 Feb 2017, Julien Grall wrote:
> > > On 01/02/17 23:23, Stefano Stabellini wrote:
> > > > On Wed, 1 Feb 2017, Julien Grall wrote:
> > > > > On 31/01/2017 23:49, Stefano Stabellini wrote:
> > > > > > On Fri, 27 Jan 2017, Julien Grall wrote:
> > > > > > > On 03/01/17 23:29, Stefano Stabellini wrote:
> > > > > For LPIs, there is no activate state. So as soon as they are EOIed,
> > > > > they
> > > > > might
> > > > > come up again. Depending on how will we handle irq migration, your
> > > > > scenario
> > > > > will become true. I am not sure if we should take into account LPIs
> > > > > right
> > > > > now.
> > > > > 
> > > > > To be honest, I don't much like the idea of kicking the other vCPU.
> > > > > But I
> > > > > don't have a better idea in order to clear the LRs.
> > 
> > What if we skip the interrupt if it's an LPI, and we kick the other vcpu
> > and wait if it's an SPI? Software should be more tolerant of lost
> > interrupts in case of LPIs. We are also considering rate-limiting them
> > anyway, which implies the possibility of skipping some LPIs at times.
> 
> I will skip the answer here, as your suggestion to solve the inversion lock
> sounds better.

OK


> > > > Me neither, that's why I was proposing a different solution instead. We
> > > > still have the option to take the right lock in vgic_migrate_irq:
> > > > 
> > > > http://marc.info/?l=xen-devel&m=148237289620471
> > > > 
> > > > The code is more complex, but I think it's safe in all cases.
> > > 
> > > It is not only complex but also really confusing as we would have a
> > > variable
> > > protected by two locks, both lock does not need to be taken at the same
> > > time.
> > 
> > Yes, but there is a large in-code comment about it :-)
> > 
> > 
> > > I may have an idea to avoid completely the lock in vgic_get_target_vcpu.
> > > The
> > > lock is only here to read the target vcpu in the rank, the rest does not
> > > need
> > > a lock, right? So could not we read the target vcpu atomically instead?
> > 
> > Yes, I think that could solve the lock inversion bug:
> > 
> > - remove the spin_lock in vgic_get_target_vcpu and replace it with an atomic
> > read
> > - replace rank->vcpu writes with atomic writes
> > 
> > However, it would not solve the other issu affecting the current code:
> > http://marc.info/?l=xen-devel&m=148218667104072, which is related to the
> > problem you mentioned about irq_set_affinity and list_del_init in
> > gic_update_one_lr not being separated by a barrier. Arguably, that bug
> > could be solved separately.  It would be easier to solve that bug by one
> > of these approaches:
> > 
> > 1) use the same (vgic) lock in gic_update_one_lr and vgic_migrate_irq
> > 2) remove the irq_set_affinity call from gic_update_one_lr
> > 
> > where 1) is the approach taken by v2 of this series and 2) is the
> > approach taken by this patch.
> 
> I think there is a easier solution. You want to make sure that any writes
> (particularly list_del_init) before routing the IRQ are visible to the other
> processors. So a simple barrier in both side should be enough here.

You are right

[v3] xen/arm: fix rank/vgic lock inversion bug

Commit Message

Comments

Patch