diff mbox

[1/2] xen: fix a (latent) cpupool-related race during domain destroy

Message ID 146847849719.25458.17913062138178199099.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli July 14, 2016, 6:41 a.m. UTC
So, during domain destruction, we do:
 cpupool_rm_domain()    [ in domain_destroy() ]
 sched_destroy_domain() [ in complete_domain_destroy() ]

Therefore, there's a window during which, from the
scheduler's point of view, a domain is still there, but
without it being part of any cpupool.

In fact, cpupool_rm_domain() does d->cpupool=NULL,
and we don't allow anything like that to hold, for
any domain with the only exception of the idle one.
And if we stuble upon such a domain, there are ASSERT()s
and BUG_ON()s that do trigger.

This never happens, right now, but only because none
of the functions containing one of those sanity checks
are called during the above described window. However,
Credit2 goes (during load balancing) through the list
of domains assigned to a certain runqueue, and not only
the ones that are running or runnable. If one of those
domains had cpupool_rm_domain() called upon itself
already, and we call one of those functions which checks
for d->cpupool!=NULL... Boom!

For example, while doing Credit2 development, calling
something similar to __vcpu_has_soft_affinity() from
balance_load(), makes `xl shutdown <domid>' reliably
crash (this is how I discovered this).

On the other hand, cpupool_rm_domain() "only" does
cpupool related bookkeeping, and there's no harm
postponing it a little bit.

Finally, considering that, during domain initialization,
we do:
 cpupool_add_domain()
 sched_init_domain()

It looks like it makes much more sense for the domain
destroy path to look like the opposite of it, i.e.:
 sched_destroy_domain()
 cpupool_rm_domain()

This patch does that, and it's considered worth, as it
fixes a bug, even if only a latent one.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/common/domain.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Andrew Cooper July 14, 2016, 9:37 a.m. UTC | #1
On 14/07/16 07:41, Dario Faggioli wrote:
> So, during domain destruction, we do:
>  cpupool_rm_domain()    [ in domain_destroy() ]
>  sched_destroy_domain() [ in complete_domain_destroy() ]
>
> Therefore, there's a window during which, from the
> scheduler's point of view, a domain is still there, but
> without it being part of any cpupool.
>
> In fact, cpupool_rm_domain() does d->cpupool=NULL,
> and we don't allow anything like that to hold, for
> any domain with the only exception of the idle one.
> And if we stuble upon such a domain, there are ASSERT()s
> and BUG_ON()s that do trigger.
>
> This never happens, right now, but only because none
> of the functions containing one of those sanity checks
> are called during the above described window. However,
> Credit2 goes (during load balancing) through the list
> of domains assigned to a certain runqueue, and not only
> the ones that are running or runnable. If one of those
> domains had cpupool_rm_domain() called upon itself
> already, and we call one of those functions which checks
> for d->cpupool!=NULL... Boom!
>
> For example, while doing Credit2 development, calling
> something similar to __vcpu_has_soft_affinity() from
> balance_load(), makes `xl shutdown <domid>' reliably
> crash (this is how I discovered this).
>
> On the other hand, cpupool_rm_domain() "only" does
> cpupool related bookkeeping, and there's no harm
> postponing it a little bit.
>
> Finally, considering that, during domain initialization,
> we do:
>  cpupool_add_domain()
>  sched_init_domain()
>
> It looks like it makes much more sense for the domain
> destroy path to look like the opposite of it, i.e.:
>  sched_destroy_domain()
>  cpupool_rm_domain()
>
> This patch does that, and it's considered worth, as it
> fixes a bug, even if only a latent one.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

As the cpupool bookkeeping is very closely related to the scheduler
bookkeeping, how about having the sched_*_domain() functions involve the
cpupool_*_domain() functions?

That way it can't get out-of-order like this in the future.

~Andrew
Dario Faggioli July 14, 2016, 2:54 p.m. UTC | #2
On Thu, 2016-07-14 at 10:37 +0100, Andrew Cooper wrote:
> On 14/07/16 07:41, Dario Faggioli wrote:
> > 
> > So, during domain destruction, we do:
> >  cpupool_rm_domain()    [ in domain_destroy() ]
> >  sched_destroy_domain() [ in complete_domain_destroy() ]
> > 
> > Therefore, there's a window during which, from the
> > scheduler's point of view, a domain is still there, but
> > without it being part of any cpupool.
> > 
> > [...]
> >
> > On the other hand, cpupool_rm_domain() "only" does
> > cpupool related bookkeeping, and there's no harm
> > postponing it a little bit.
> > 
> > Finally, considering that, during domain initialization,
> > we do:
> >  cpupool_add_domain()
> >  sched_init_domain()
> > 
> > It looks like it makes much more sense for the domain
> > destroy path to look like the opposite of it, i.e.:
> >  sched_destroy_domain()
> >  cpupool_rm_domain()
> > 
> > This patch does that, and it's considered worth, as it
> > fixes a bug, even if only a latent one.
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> As the cpupool bookkeeping is very closely related to the scheduler
> bookkeeping, how about having the sched_*_domain() functions involve
> the
> cpupool_*_domain() functions?
> 
That's certainly a good point.

At minimum, I certainly can (and probably should have :-P) put a couple
of ASSERT()-s in place.

However, both cpupool_add_domain() and cpupool_rm_domain() are called
only once, and I guess I can make them go into sched_init_domain() and
sched_destroy_domain(), respectively.

Dario
Andrew Cooper July 14, 2016, 2:55 p.m. UTC | #3
On 14/07/16 15:54, Dario Faggioli wrote:
> On Thu, 2016-07-14 at 10:37 +0100, Andrew Cooper wrote:
>> On 14/07/16 07:41, Dario Faggioli wrote:
>>> So, during domain destruction, we do:
>>>  cpupool_rm_domain()    [ in domain_destroy() ]
>>>  sched_destroy_domain() [ in complete_domain_destroy() ]
>>>
>>> Therefore, there's a window during which, from the
>>> scheduler's point of view, a domain is still there, but
>>> without it being part of any cpupool.
>>>
>>> [...]
>>>
>>> On the other hand, cpupool_rm_domain() "only" does
>>> cpupool related bookkeeping, and there's no harm
>>> postponing it a little bit.
>>>
>>> Finally, considering that, during domain initialization,
>>> we do:
>>>  cpupool_add_domain()
>>>  sched_init_domain()
>>>
>>> It looks like it makes much more sense for the domain
>>> destroy path to look like the opposite of it, i.e.:
>>>  sched_destroy_domain()
>>>  cpupool_rm_domain()
>>>
>>> This patch does that, and it's considered worth, as it
>>> fixes a bug, even if only a latent one.
>>>
>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> As the cpupool bookkeeping is very closely related to the scheduler
>> bookkeeping, how about having the sched_*_domain() functions involve
>> the
>> cpupool_*_domain() functions?
>>
> That's certainly a good point.
>
> At minimum, I certainly can (and probably should have :-P) put a couple
> of ASSERT()-s in place.
>
> However, both cpupool_add_domain() and cpupool_rm_domain() are called
> only once, and I guess I can make them go into sched_init_domain() and
> sched_destroy_domain(), respectively.

I think that would be the most robust solution.

~Andrew
diff mbox

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 42c07ee..f8096d3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -811,6 +811,8 @@  static void complete_domain_destroy(struct rcu_head *head)
         destroy_waitqueue_vcpu(v);
     }
 
+    cpupool_rm_domain(d);
+
     grant_table_destroy(d);
 
     arch_domain_destroy(d);
@@ -868,8 +870,6 @@  void domain_destroy(struct domain *d)
 
     TRACE_1D(TRC_DOM0_DOM_REM, d->domain_id);
 
-    cpupool_rm_domain(d);
-
     /* Delete from task list and task hashtable. */
     spin_lock(&domlist_update_lock);
     pd = &domain_list;