Message ID | 1488406545-26164-3-git-send-email-sstabellini@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Stefano, On 01/03/17 22:15, Stefano Stabellini wrote: > A potential race condition occurs when vgic_migrate_irq is called a > second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case, > vgic_migrate_irq takes a different vgic lock from gic_update_one_lr. Hmmm, vgic_migrate_irq will bail out before accessing inflight list if GIC_IRQ_GUEST_MIGRATING is already set: /* migrating already in progress, no need to do anything */ if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status ) return; And test_bit is atomic. So I don't understand what is the corruption problem you mention. > vgic_migrate_irq running concurrently with gic_update_one_lr could cause > data corruptions, as they both access the inflight list. > > This patch fixes this problem. In vgic_migrate_irq after setting the new > vcpu target, it checks both GIC_IRQ_GUEST_MIGRATING and > GIC_IRQ_GUEST_VISIBLE. If they are both set we can just return because > we have already set the new target: when gic_update_one_lr reaches > the GIC_IRQ_GUEST_MIGRATING test, it will do the right thing. > > Otherwise, if GIC_IRQ_GUEST_MIGRATING is set but GIC_IRQ_GUEST_VISIBLE > is not, gic_update_one_lr is running at the very same time on another > pcpu, so it just waits until it completes (GIC_IRQ_GUEST_MIGRATING is > cleared). > > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> > --- > xen/arch/arm/gic.c | 5 ++++- > xen/arch/arm/vgic.c | 16 ++++++++++++++-- > 2 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index 16bb150..a805300 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -508,10 +508,13 @@ static void gic_update_one_lr(struct vcpu *v, int i) > * next pcpu, inflight is already cleared. No concurrent > * accesses to inflight. */ > smp_mb(); > - if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > + if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > { > struct vcpu *v_target = vgic_get_target_vcpu(v, irq); > irq_set_affinity(p->desc, cpumask_of(v_target->processor)); > + /* Set the new affinity, then clear MIGRATING. */ > + smp_mb(); > + clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status); > } > } > } > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c > index a323e7e..9141b34 100644 > --- a/xen/arch/arm/vgic.c > +++ b/xen/arch/arm/vgic.c > @@ -252,13 +252,25 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu *new, > spin_lock_irqsave(&old->arch.vgic.lock, flags); > write_atomic(t_vcpu, new->vcpu_id); > > - /* migration already in progress, no need to do anything */ > - if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > + /* Set the new target, then check MIGRATING and VISIBLE, it pairs > + * with the barrier in gic_update_one_lr. */ > + smp_mb(); > + > + /* no need to do anything, gic_update_one_lr will take care of it */ > + if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) && > + test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) ) > { > spin_unlock_irqrestore(&old->arch.vgic.lock, flags); > return; > } > > + /* gic_update_one_lr is currently running, wait until its completion */ > + while ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > + { > + cpu_relax(); > + smp_rmb(); > + } > + > if ( list_empty(&p->inflight) ) > { > irq_set_affinity(p->desc, cpumask_of(new->processor)); > Cheers,
On Fri, 3 Mar 2017, Julien Grall wrote: > Hi Stefano, > > On 01/03/17 22:15, Stefano Stabellini wrote: > > A potential race condition occurs when vgic_migrate_irq is called a > > second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case, > > vgic_migrate_irq takes a different vgic lock from gic_update_one_lr. > > Hmmm, vgic_migrate_irq will bail out before accessing inflight list if > GIC_IRQ_GUEST_MIGRATING is already set: > > /* migrating already in progress, no need to do anything */ > if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status ) > return; > > And test_bit is atomic. So I don't understand what is the corruption problem > you mention. The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set and vgic_migrate_irq is called to move the irq again, even though the first migration is not complete yet. This could happen: CPU 0 CPU 1 gic_update_one_lr test_and_clear_bit MIGRATING read target (old) write target (new) vgic_migrate_irq test_bit MIGRATING irq_set_affinity (new) return irq_set_affinity (old) After this patch this would happen: CPU 0 CPU 1 gic_update_one_lr test_bit MIGRATING read target (old) write target (new) vgic_migrate_irq test MIGRATING && GIC_IRQ_GUEST_VISIBLE (false) wait until !MIGRATING irq_set_affinity (old) clear_bit MIGRATING irq_set_affinity (new) > > vgic_migrate_irq running concurrently with gic_update_one_lr could cause > > data corruptions, as they both access the inflight list. > > > > This patch fixes this problem. In vgic_migrate_irq after setting the new > > vcpu target, it checks both GIC_IRQ_GUEST_MIGRATING and > > GIC_IRQ_GUEST_VISIBLE. If they are both set we can just return because > > we have already set the new target: when gic_update_one_lr reaches > > the GIC_IRQ_GUEST_MIGRATING test, it will do the right thing. > > > > Otherwise, if GIC_IRQ_GUEST_MIGRATING is set but GIC_IRQ_GUEST_VISIBLE > > is not, gic_update_one_lr is running at the very same time on another > > pcpu, so it just waits until it completes (GIC_IRQ_GUEST_MIGRATING is > > cleared). > > > > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> > > --- > > xen/arch/arm/gic.c | 5 ++++- > > xen/arch/arm/vgic.c | 16 ++++++++++++++-- > > 2 files changed, 18 insertions(+), 3 deletions(-) > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index 16bb150..a805300 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -508,10 +508,13 @@ static void gic_update_one_lr(struct vcpu *v, int i) > > * next pcpu, inflight is already cleared. No concurrent > > * accesses to inflight. */ > > smp_mb(); > > - if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > > + if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > > { > > struct vcpu *v_target = vgic_get_target_vcpu(v, irq); > > irq_set_affinity(p->desc, cpumask_of(v_target->processor)); > > + /* Set the new affinity, then clear MIGRATING. */ > > + smp_mb(); > > + clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status); > > } > > } > > } > > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c > > index a323e7e..9141b34 100644 > > --- a/xen/arch/arm/vgic.c > > +++ b/xen/arch/arm/vgic.c > > @@ -252,13 +252,25 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu > > *new, > > spin_lock_irqsave(&old->arch.vgic.lock, flags); > > write_atomic(t_vcpu, new->vcpu_id); > > > > - /* migration already in progress, no need to do anything */ > > - if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > > + /* Set the new target, then check MIGRATING and VISIBLE, it pairs > > + * with the barrier in gic_update_one_lr. */ > > + smp_mb(); > > + > > + /* no need to do anything, gic_update_one_lr will take care of it */ > > + if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) && > > + test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) ) > > { > > spin_unlock_irqrestore(&old->arch.vgic.lock, flags); > > return; > > } > > > > + /* gic_update_one_lr is currently running, wait until its completion */ > > + while ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > > + { > > + cpu_relax(); > > + smp_rmb(); > > + } > > + > > if ( list_empty(&p->inflight) ) > > { > > irq_set_affinity(p->desc, cpumask_of(new->processor)); > > > > Cheers, > > -- > Julien Grall >
Hi Stefano, On 30/03/17 00:47, Stefano Stabellini wrote: > On Fri, 3 Mar 2017, Julien Grall wrote: >> Hi Stefano, >> >> On 01/03/17 22:15, Stefano Stabellini wrote: >>> A potential race condition occurs when vgic_migrate_irq is called a >>> second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case, >>> vgic_migrate_irq takes a different vgic lock from gic_update_one_lr. >> >> Hmmm, vgic_migrate_irq will bail out before accessing inflight list if >> GIC_IRQ_GUEST_MIGRATING is already set: >> >> /* migrating already in progress, no need to do anything */ >> if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status ) >> return; >> >> And test_bit is atomic. So I don't understand what is the corruption problem >> you mention. > > The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set > and vgic_migrate_irq is called to move the irq again, even though the > first migration is not complete yet. What you described is not a data corruption to me. The host IRQ will be routed to the wrong pCPU and then what? The IRQ will still trigger, ok on the wrong pCPU, it will be slower but we are capable to handle that. The use case you describe would only happen if a guest is trying to change the routing multiple times while an interrupt is pending. So to be honest, a sane guest would not do that. But this would only affect stupid guest. So I don't think this is worth to support considering how this patch will increase the code complexity in a component that is already a nightmare to handle. > This could happen: > > > CPU 0 CPU 1 > gic_update_one_lr > test_and_clear_bit MIGRATING > read target (old) > write target (new) > vgic_migrate_irq > test_bit MIGRATING > irq_set_affinity (new) > return > irq_set_affinity (old) Cheers,
On Fri, 31 Mar 2017, Julien Grall wrote: > Hi Stefano, > > On 30/03/17 00:47, Stefano Stabellini wrote: > > On Fri, 3 Mar 2017, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 01/03/17 22:15, Stefano Stabellini wrote: > > > > A potential race condition occurs when vgic_migrate_irq is called a > > > > second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case, > > > > vgic_migrate_irq takes a different vgic lock from gic_update_one_lr. > > > > > > Hmmm, vgic_migrate_irq will bail out before accessing inflight list if > > > GIC_IRQ_GUEST_MIGRATING is already set: > > > > > > /* migrating already in progress, no need to do anything */ > > > if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status ) > > > return; > > > > > > And test_bit is atomic. So I don't understand what is the corruption > > > problem > > > you mention. > > > > The scenario is a bit convoluted: GIC_IRQ_GUEST_MIGRATING is already set > > and vgic_migrate_irq is called to move the irq again, even though the > > first migration is not complete yet. > > What you described is not a data corruption to me. No, it is not, thanks to the previous two patches. The commit description needs an update. > The host IRQ will be routed > to the wrong pCPU and then what? The IRQ will still trigger, ok on the wrong > pCPU, it will be slower but we are capable to handle that. > > The use case you describe would only happen if a guest is trying to change the > routing multiple times while an interrupt is pending. So to be honest, a sane > guest would not do that. But this would only affect stupid guest. > > So I don't think this is worth to support considering how this patch will > increase the code complexity in a component that is already a nightmare to > handle. I think we have to fix this because it is not predictable. Latency could be much higher, depending on who wins the race. It also uses more Xen resources -- the time that Xen spends to send and to handle SGIs could be used for something else. I think it is more important to be predictable than correct. Especially given that a sane guest shouldn't do this, I prefer to refuse a "nested" migration we cannot handle (even though it is a mistake) than provide unreliable latency. In other words, I think we need to fix this one way or another, but we might be able to come up with a better fix than this. > > This could happen: > > > > > > CPU 0 CPU 1 > > gic_update_one_lr > > test_and_clear_bit MIGRATING > > read target (old) > > write target (new) > > vgic_migrate_irq > > test_bit MIGRATING > > irq_set_affinity (new) > > return > > irq_set_affinity (old)
Hi Stefano, On 31/03/17 21:24, Stefano Stabellini wrote: > On Fri, 31 Mar 2017, Julien Grall wrote: >> On 30/03/17 00:47, Stefano Stabellini wrote: >>> On Fri, 3 Mar 2017, Julien Grall wrote: >> What you described is not a data corruption to me. > > No, it is not, thanks to the previous two patches. The commit > description needs an update. > > >> The host IRQ will be routed >> to the wrong pCPU and then what? The IRQ will still trigger, ok on the wrong >> pCPU, it will be slower but we are capable to handle that. >> >> The use case you describe would only happen if a guest is trying to change the >> routing multiple times while an interrupt is pending. So to be honest, a sane >> guest would not do that. But this would only affect stupid guest. >> >> So I don't think this is worth to support considering how this patch will >> increase the code complexity in a component that is already a nightmare to >> handle. > > I think we have to fix this because it is not predictable. Latency could > be much higher, depending on who wins the race. It also uses more Xen > resources -- the time that Xen spends to send and to handle SGIs could > be used for something else. I think it is more important to be > predictable than correct. Especially given that a sane guest shouldn't > do this, I prefer to refuse a "nested" migration we cannot handle (even > though it is a mistake) than provide unreliable latency. Good point. We already have a couple of place in the vGIC we don't handle and print a message instead (see ACTIVER, I*PENDR registers). I would prefer to refuse "nested" migration and warn the guest. If someone complain, then we can think about it. Cheers,
On Mon, 3 Apr 2017, Julien Grall wrote: > Hi Stefano, > > On 31/03/17 21:24, Stefano Stabellini wrote: > > On Fri, 31 Mar 2017, Julien Grall wrote: > > > On 30/03/17 00:47, Stefano Stabellini wrote: > > > > On Fri, 3 Mar 2017, Julien Grall wrote: > > > What you described is not a data corruption to me. > > > > No, it is not, thanks to the previous two patches. The commit > > description needs an update. > > > > > > > The host IRQ will be routed > > > to the wrong pCPU and then what? The IRQ will still trigger, ok on the > > > wrong > > > pCPU, it will be slower but we are capable to handle that. > > > > > > The use case you describe would only happen if a guest is trying to change > > > the > > > routing multiple times while an interrupt is pending. So to be honest, a > > > sane > > > guest would not do that. But this would only affect stupid guest. > > > > > > So I don't think this is worth to support considering how this patch will > > > increase the code complexity in a component that is already a nightmare to > > > handle. > > > > I think we have to fix this because it is not predictable. Latency could > > be much higher, depending on who wins the race. It also uses more Xen > > resources -- the time that Xen spends to send and to handle SGIs could > > be used for something else. I think it is more important to be > > predictable than correct. Especially given that a sane guest shouldn't > > do this, I prefer to refuse a "nested" migration we cannot handle (even > > though it is a mistake) than provide unreliable latency. > > Good point. We already have a couple of place in the vGIC we don't handle and > print a message instead (see ACTIVER, I*PENDR registers). > > I would prefer to refuse "nested" migration and warn the guest. If someone > complain, then we can think about it. That's fine by me.
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 16bb150..a805300 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -508,10 +508,13 @@ static void gic_update_one_lr(struct vcpu *v, int i) * next pcpu, inflight is already cleared. No concurrent * accesses to inflight. */ smp_mb(); - if ( test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) + if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) { struct vcpu *v_target = vgic_get_target_vcpu(v, irq); irq_set_affinity(p->desc, cpumask_of(v_target->processor)); + /* Set the new affinity, then clear MIGRATING. */ + smp_mb(); + clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status); } } } diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index a323e7e..9141b34 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -252,13 +252,25 @@ void vgic_migrate_irq(struct vcpu *old, struct vcpu *new, spin_lock_irqsave(&old->arch.vgic.lock, flags); write_atomic(t_vcpu, new->vcpu_id); - /* migration already in progress, no need to do anything */ - if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) + /* Set the new target, then check MIGRATING and VISIBLE, it pairs + * with the barrier in gic_update_one_lr. */ + smp_mb(); + + /* no need to do anything, gic_update_one_lr will take care of it */ + if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) && + test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) ) { spin_unlock_irqrestore(&old->arch.vgic.lock, flags); return; } + /* gic_update_one_lr is currently running, wait until its completion */ + while ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) + { + cpu_relax(); + smp_rmb(); + } + if ( list_empty(&p->inflight) ) { irq_set_affinity(p->desc, cpumask_of(new->processor));
A potential race condition occurs when vgic_migrate_irq is called a second time, while GIC_IRQ_GUEST_MIGRATING is already set. In that case, vgic_migrate_irq takes a different vgic lock from gic_update_one_lr. vgic_migrate_irq running concurrently with gic_update_one_lr could cause data corruptions, as they both access the inflight list. This patch fixes this problem. In vgic_migrate_irq after setting the new vcpu target, it checks both GIC_IRQ_GUEST_MIGRATING and GIC_IRQ_GUEST_VISIBLE. If they are both set we can just return because we have already set the new target: when gic_update_one_lr reaches the GIC_IRQ_GUEST_MIGRATING test, it will do the right thing. Otherwise, if GIC_IRQ_GUEST_MIGRATING is set but GIC_IRQ_GUEST_VISIBLE is not, gic_update_one_lr is running at the very same time on another pcpu, so it just waits until it completes (GIC_IRQ_GUEST_MIGRATING is cleared). Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> --- xen/arch/arm/gic.c | 5 ++++- xen/arch/arm/vgic.c | 16 ++++++++++++++-- 2 files changed, 18 insertions(+), 3 deletions(-)