diff mbox

[v5,3/3] xen/arm: vgic_migrate_irq: do not race against GIC_IRQ_GUEST_MIGRATING

Message ID 1488406545-26164-3-git-send-email-sstabellini@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Stefano Stabellini March 1, 2017, 10:15 p.m. UTC
A potential race condition occurs when vgic_migrate_irq is called a
second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case,
vgic_migrate_irq takes a different vgic lock from gic_update_one_lr.
vgic_migrate_irq running concurrently with gic_update_one_lr could cause
data corruptions, as they both access the inflight list.

This patch fixes this problem. In vgic_migrate_irq after setting the new
vcpu target, it checks both GIC_IRQ_GUEST_MIGRATING and
GIC_IRQ_GUEST_VISIBLE. If they are both set we can just return because
we have already set the new target: when gic_update_one_lr reaches
the GIC_IRQ_GUEST_MIGRATING test, it will do the right thing.

Otherwise, if GIC_IRQ_GUEST_MIGRATING is set but GIC_IRQ_GUEST_VISIBLE
is not, gic_update_one_lr is running at the very same time on another
pcpu, so it just waits until it completes (GIC_IRQ_GUEST_MIGRATING is
cleared).

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/arm/gic.c  |  5 ++++-
 xen/arch/arm/vgic.c | 16 ++++++++++++++--
 2 files changed, 18 insertions(+), 3 deletions(-)

Comments

Julien Grall March 3, 2017, 5:47 p.m. UTC | #1
Hi Stefano,

On 01/03/17 22:15, Stefano Stabellini wrote:
> A potential race condition occurs when vgic_migrate_irq is called a
> second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case,
> vgic_migrate_irq takes a different vgic lock from gic_update_one_lr.

Hmmm, vgic_migrate_irq will bail out before accessing inflight list if 
GIC_IRQ_GUEST_MIGRATING is already set:

/* migrating already in progress, no need to do anything */
if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status )
   return;

And test_bit is atomic. So I don't understand what is the corruption 
problem you mention.

> vgic_migrate_irq running concurrently with gic_update_one_lr could cause
> data corruptions, as they both access the inflight list.
>
> This patch fixes this problem. In vgic_migrate_irq after setting the new
> vcpu target, it checks both GIC_IRQ_GUEST_MIGRATING and
> GIC_IRQ_GUEST_VISIBLE. If they are both set we can just return because
> we have already set the new target: when gic_update_one_lr reaches
> the GIC_IRQ_GUEST_MIGRATING test, it will do the right thing.
>
> Otherwise, if GIC_IRQ_GUEST_MIGRATING is set but GIC_IRQ_GUEST_VISIBLE
> is not, gic_update_one_lr is running at the very same time on another
> pcpu, so it just waits until it completes (GIC_IRQ_GUEST_MIGRATING is
> cleared).
>
> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
> ---
>  xen/arch/arm/gic.c  |  5 ++++-
>  xen/arch/arm/vgic.c | 16 ++++++++++++++--
>  2 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 16bb150..a805300 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -508,10 +508,13 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>               * next pcpu, inflight is already cleared. No concurrent
>               * accesses to inflight. */
>              smp_mb();
> -            if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> +            if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
>              {
>                  struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
>                  irq_set_affinity(p->desc, cpumask_of(v_target->processor));
> +                /* Set the new affinity, then clear MIGRATING. */
> +                smp_mb();
> +                clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
>              }
>          }
>      }
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index a323e7e..9141b34 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -252,13 +252,25 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu *new,
>      spin_lock_irqsave(&old->arch.vgic.lock, flags);
>      write_atomic(t_vcpu, new->vcpu_id);
>
> -    /* migration already in progress, no need to do anything */
> -    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> +    /* Set the new target, then check MIGRATING and VISIBLE, it pairs
> +     * with the barrier in gic_update_one_lr. */
> +    smp_mb();
> +
> +    /* no need to do anything, gic_update_one_lr will take care of it */
> +    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) &&
> +         test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
>      {
>          spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
>          return;
>      }
>
> +    /* gic_update_one_lr is currently running, wait until its completion */
> +    while ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> +    {
> +        cpu_relax();
> +        smp_rmb();
> +    }
> +
>      if ( list_empty(&p->inflight) )
>      {
>          irq_set_affinity(p->desc, cpumask_of(new->processor));
>

Cheers,
Stefano Stabellini March 29, 2017, 11:47 p.m. UTC | #2
On Fri, 3 Mar 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 01/03/17 22:15, Stefano Stabellini wrote:
> > A potential race condition occurs when vgic_migrate_irq is called a
> > second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case,
> > vgic_migrate_irq takes a different vgic lock from gic_update_one_lr.
> 
> Hmmm, vgic_migrate_irq will bail out before accessing inflight list if
> GIC_IRQ_GUEST_MIGRATING is already set:
> 
> /* migrating already in progress, no need to do anything */
> if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status )
>   return;
> 
> And test_bit is atomic. So I don't understand what is the corruption problem
> you mention.

The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set
and vgic_migrate_irq is called to move the irq again, even though the
first migration is not complete yet. This could happen:


      CPU 0                                    CPU 1
  gic_update_one_lr
  test_and_clear_bit MIGRATING
  read target (old)
                                            write target (new)
                                            vgic_migrate_irq
                                              test_bit MIGRATING
                                              irq_set_affinity (new)
                                              return
  irq_set_affinity (old)


After this patch this would happen:

      CPU 0                                    CPU 1
  gic_update_one_lr
  test_bit MIGRATING
  read target (old)
                                            write target (new)
                                            vgic_migrate_irq
                                              test MIGRATING && GIC_IRQ_GUEST_VISIBLE (false)
                                              wait until !MIGRATING
  irq_set_affinity (old)
  clear_bit MIGRATING
                                              irq_set_affinity (new)


> > vgic_migrate_irq running concurrently with gic_update_one_lr could cause
> > data corruptions, as they both access the inflight list.
> > 
> > This patch fixes this problem. In vgic_migrate_irq after setting the new
> > vcpu target, it checks both GIC_IRQ_GUEST_MIGRATING and
> > GIC_IRQ_GUEST_VISIBLE. If they are both set we can just return because
> > we have already set the new target: when gic_update_one_lr reaches
> > the GIC_IRQ_GUEST_MIGRATING test, it will do the right thing.
> > 
> > Otherwise, if GIC_IRQ_GUEST_MIGRATING is set but GIC_IRQ_GUEST_VISIBLE
> > is not, gic_update_one_lr is running at the very same time on another
> > pcpu, so it just waits until it completes (GIC_IRQ_GUEST_MIGRATING is
> > cleared).
> > 
> > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
> > ---
> >  xen/arch/arm/gic.c  |  5 ++++-
> >  xen/arch/arm/vgic.c | 16 ++++++++++++++--
> >  2 files changed, 18 insertions(+), 3 deletions(-)
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 16bb150..a805300 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -508,10 +508,13 @@ static void gic_update_one_lr(struct vcpu *v, int i)
> >               * next pcpu, inflight is already cleared. No concurrent
> >               * accesses to inflight. */
> >              smp_mb();
> > -            if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> > +            if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> >              {
> >                  struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
> >                  irq_set_affinity(p->desc, cpumask_of(v_target->processor));
> > +                /* Set the new affinity, then clear MIGRATING. */
> > +                smp_mb();
> > +                clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
> >              }
> >          }
> >      }
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index a323e7e..9141b34 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -252,13 +252,25 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu
> > *new,
> >      spin_lock_irqsave(&old->arch.vgic.lock, flags);
> >      write_atomic(t_vcpu, new->vcpu_id);
> > 
> > -    /* migration already in progress, no need to do anything */
> > -    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> > +    /* Set the new target, then check MIGRATING and VISIBLE, it pairs
> > +     * with the barrier in gic_update_one_lr. */
> > +    smp_mb();
> > +
> > +    /* no need to do anything, gic_update_one_lr will take care of it */
> > +    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) &&
> > +         test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
> >      {
> >          spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
> >          return;
> >      }
> > 
> > +    /* gic_update_one_lr is currently running, wait until its completion */
> > +    while ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
> > +    {
> > +        cpu_relax();
> > +        smp_rmb();
> > +    }
> > +
> >      if ( list_empty(&p->inflight) )
> >      {
> >          irq_set_affinity(p->desc, cpumask_of(new->processor));
> > 
> 
> Cheers,
> 
> -- 
> Julien Grall
>
Julien Grall March 31, 2017, 4:02 p.m. UTC | #3
Hi Stefano,

On 30/03/17 00:47, Stefano Stabellini wrote:
> On Fri, 3 Mar 2017, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 01/03/17 22:15, Stefano Stabellini wrote:
>>> A potential race condition occurs when vgic_migrate_irq is called a
>>> second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case,
>>> vgic_migrate_irq takes a different vgic lock from gic_update_one_lr.
>>
>> Hmmm, vgic_migrate_irq will bail out before accessing inflight list if
>> GIC_IRQ_GUEST_MIGRATING is already set:
>>
>> /* migrating already in progress, no need to do anything */
>> if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status )
>>   return;
>>
>> And test_bit is atomic. So I don't understand what is the corruption problem
>> you mention.
>
> The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set
> and vgic_migrate_irq is called to move the irq again, even though the
> first migration is not complete yet.

What you described is not a data corruption to me. The host IRQ will be 
routed to the wrong pCPU and then what? The IRQ will still trigger, ok 
on the wrong pCPU, it will be slower but we are capable to handle that.

The use case you describe would only happen if a guest is trying to 
change the routing multiple times while an interrupt is pending. So to 
be honest, a sane guest would not do that. But this would only affect 
stupid guest.

So I don't think this is worth to support considering how this patch 
will increase the code complexity in a component that is already a 
nightmare to handle.

> This could happen:
>
>
>       CPU 0                                    CPU 1
>   gic_update_one_lr
>   test_and_clear_bit MIGRATING
>   read target (old)
>                                             write target (new)
>                                             vgic_migrate_irq
>                                               test_bit MIGRATING
>                                               irq_set_affinity (new)
>                                               return
>   irq_set_affinity (old)

Cheers,
Stefano Stabellini March 31, 2017, 8:24 p.m. UTC | #4
On Fri, 31 Mar 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 30/03/17 00:47, Stefano Stabellini wrote:
> > On Fri, 3 Mar 2017, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 01/03/17 22:15, Stefano Stabellini wrote:
> > > > A potential race condition occurs when vgic_migrate_irq is called a
> > > > second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case,
> > > > vgic_migrate_irq takes a different vgic lock from gic_update_one_lr.
> > > 
> > > Hmmm, vgic_migrate_irq will bail out before accessing inflight list if
> > > GIC_IRQ_GUEST_MIGRATING is already set:
> > > 
> > > /* migrating already in progress, no need to do anything */
> > > if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status )
> > >   return;
> > > 
> > > And test_bit is atomic. So I don't understand what is the corruption
> > > problem
> > > you mention.
> > 
> > The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set
> > and vgic_migrate_irq is called to move the irq again, even though the
> > first migration is not complete yet.
> 
> What you described is not a data corruption to me.

No, it is not, thanks to the previous two patches. The commit
description needs an update.


> The host IRQ will be routed
> to the wrong pCPU and then what? The IRQ will still trigger, ok on the wrong
> pCPU, it will be slower but we are capable to handle that.
> 
> The use case you describe would only happen if a guest is trying to change the
> routing multiple times while an interrupt is pending. So to be honest, a sane
> guest would not do that. But this would only affect stupid guest.
> 
> So I don't think this is worth to support considering how this patch will
> increase the code complexity in a component that is already a nightmare to
> handle.

I think we have to fix this because it is not predictable. Latency could
be much higher, depending on who wins the race. It also uses more Xen
resources -- the time that Xen spends to send and to handle SGIs could
be used for something  else. I think it is more important to be
predictable than correct. Especially given that a sane guest shouldn't
do this, I prefer to refuse a "nested" migration we cannot handle (even
though it is a mistake) than provide unreliable latency.

In other words, I think we need to fix this one way or another, but we
might be able to come up with a better fix than this.



> > This could happen:
> > 
> > 
> >       CPU 0                                    CPU 1
> >   gic_update_one_lr
> >   test_and_clear_bit MIGRATING
> >   read target (old)
> >                                             write target (new)
> >                                             vgic_migrate_irq
> >                                               test_bit MIGRATING
> >                                               irq_set_affinity (new)
> >                                               return
> >   irq_set_affinity (old)
Julien Grall April 3, 2017, 11:03 a.m. UTC | #5
Hi Stefano,

On 31/03/17 21:24, Stefano Stabellini wrote:
> On Fri, 31 Mar 2017, Julien Grall wrote:
>> On 30/03/17 00:47, Stefano Stabellini wrote:
>>> On Fri, 3 Mar 2017, Julien Grall wrote:
>> What you described is not a data corruption to me.
>
> No, it is not, thanks to the previous two patches. The commit
> description needs an update.
>
>
>> The host IRQ will be routed
>> to the wrong pCPU and then what? The IRQ will still trigger, ok on the wrong
>> pCPU, it will be slower but we are capable to handle that.
>>
>> The use case you describe would only happen if a guest is trying to change the
>> routing multiple times while an interrupt is pending. So to be honest, a sane
>> guest would not do that. But this would only affect stupid guest.
>>
>> So I don't think this is worth to support considering how this patch will
>> increase the code complexity in a component that is already a nightmare to
>> handle.
>
> I think we have to fix this because it is not predictable. Latency could
> be much higher, depending on who wins the race. It also uses more Xen
> resources -- the time that Xen spends to send and to handle SGIs could
> be used for something  else.  I think it is more important to be
> predictable than correct. Especially given that a sane guest shouldn't
> do this, I prefer to refuse a "nested" migration we cannot handle (even
> though it is a mistake) than provide unreliable latency.

Good point. We already have a couple of place in the vGIC we don't 
handle and print a message instead (see ACTIVER, I*PENDR registers).

I would prefer to refuse "nested" migration and warn the guest. If 
someone complain, then we can think about it.

Cheers,
Stefano Stabellini April 3, 2017, 9:24 p.m. UTC | #6
On Mon, 3 Apr 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 31/03/17 21:24, Stefano Stabellini wrote:
> > On Fri, 31 Mar 2017, Julien Grall wrote:
> > > On 30/03/17 00:47, Stefano Stabellini wrote:
> > > > On Fri, 3 Mar 2017, Julien Grall wrote:
> > > What you described is not a data corruption to me.
> > 
> > No, it is not, thanks to the previous two patches. The commit
> > description needs an update.
> > 
> > 
> > > The host IRQ will be routed
> > > to the wrong pCPU and then what? The IRQ will still trigger, ok on the
> > > wrong
> > > pCPU, it will be slower but we are capable to handle that.
> > > 
> > > The use case you describe would only happen if a guest is trying to change
> > > the
> > > routing multiple times while an interrupt is pending. So to be honest, a
> > > sane
> > > guest would not do that. But this would only affect stupid guest.
> > > 
> > > So I don't think this is worth to support considering how this patch will
> > > increase the code complexity in a component that is already a nightmare to
> > > handle.
> > 
> > I think we have to fix this because it is not predictable. Latency could
> > be much higher, depending on who wins the race. It also uses more Xen
> > resources -- the time that Xen spends to send and to handle SGIs could
> > be used for something  else.  I think it is more important to be
> > predictable than correct. Especially given that a sane guest shouldn't
> > do this, I prefer to refuse a "nested" migration we cannot handle (even
> > though it is a mistake) than provide unreliable latency.
> 
> Good point. We already have a couple of place in the vGIC we don't handle and
> print a message instead (see ACTIVER, I*PENDR registers).
> 
> I would prefer to refuse "nested" migration and warn the guest. If someone
> complain, then we can think about it.

That's fine by me.
diff mbox

Patch

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 16bb150..a805300 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -508,10 +508,13 @@  static void gic_update_one_lr(struct vcpu *v, int i)
              * next pcpu, inflight is already cleared. No concurrent
              * accesses to inflight. */
             smp_mb();
-            if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
+            if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
             {
                 struct vcpu *v_target = vgic_get_target_vcpu(v, irq);
                 irq_set_affinity(p->desc, cpumask_of(v_target->processor));
+                /* Set the new affinity, then clear MIGRATING. */
+                smp_mb();
+                clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
             }
         }
     }
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index a323e7e..9141b34 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -252,13 +252,25 @@  void vgic_migrate_irq(struct vcpu *old, struct vcpu *new,
     spin_lock_irqsave(&old->arch.vgic.lock, flags);
     write_atomic(t_vcpu, new->vcpu_id);
 
-    /* migration already in progress, no need to do anything */
-    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
+    /* Set the new target, then check MIGRATING and VISIBLE, it pairs
+     * with the barrier in gic_update_one_lr. */
+    smp_mb();
+
+    /* no need to do anything, gic_update_one_lr will take care of it */
+    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) &&
+         test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
     {
         spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
         return;
     }
 
+    /* gic_update_one_lr is currently running, wait until its completion */
+    while ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
+    {
+        cpu_relax();
+        smp_rmb();
+    }
+
     if ( list_empty(&p->inflight) )
     {
         irq_set_affinity(p->desc, cpumask_of(new->processor));