diff mbox

[v2,1/2] xen: fix a (latent) cpupool-related race during domain destroy

Message ID 146851308019.22413.8905002507733716302.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli July 14, 2016, 4:18 p.m. UTC
So, during domain destruction, we do:
 cpupool_rm_domain()    [ in domain_destroy() ]
 sched_destroy_domain() [ in complete_domain_destroy() ]

Therefore, there's a window during which, from the
scheduler's point of view, a domain stilsts outside
of any cpupool.

In fact, cpupool_rm_domain() does d->cpupool=NULL,
and we don't allow that to hold true, for anything
but the idle domain (and there are, in fact, ASSERT()s
and BUG_ON()s to that effect).

Currently, we never really check d->cpupool during the
window, but that does not mean the race is not there.
For instance, Credit2 at some point (during load balancing)
iterates on the list of domains, and if we add logic that
needs checking d->cpupool, and any one of them had
cpupool_rm_domain() called on itself already... Boom!

(In fact, calling __vcpu_has_soft_affinity() from inside
balance_load() makes `xl shutdown <domid>' reliably
crash, and this is how I discovered this.)

On the other hand, cpupool_rm_domain() "only" does
cpupool related bookkeeping, and there's no harm
postponing it a little bit.

Also, considering that, during domain initialization,
we do:
 cpupool_add_domain()
 sched_init_domain()

It makes sense for the destruction path to look like
the opposite of it, i.e.:
 sched_destroy_domain()
 cpupool_rm_domain()

And hence that's what this patch does.

Actually, for better robustness, what we really do is
moving both cpupool_add_domain() and cpupool_rm_domain()
inside sched_init_domain() and sched_destroy_domain(),
respectively (and also add a couple of ASSERT()-s).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes from v1:
 * followed Andrew's suggestion of moving the cpupool functions inside the
   sched functions;
 * reworded the changelog.
---
 xen/common/domain.c     |    7 +------
 xen/common/schedule.c   |   13 ++++++++++++-
 xen/include/xen/sched.h |    2 +-
 3 files changed, 14 insertions(+), 8 deletions(-)

Comments

Andrew Cooper July 14, 2016, 5:08 p.m. UTC | #1
On 14/07/16 17:18, Dario Faggioli wrote:
> So, during domain destruction, we do:
>  cpupool_rm_domain()    [ in domain_destroy() ]
>  sched_destroy_domain() [ in complete_domain_destroy() ]
>
> Therefore, there's a window during which, from the
> scheduler's point of view, a domain stilsts outside
> of any cpupool.
>
> In fact, cpupool_rm_domain() does d->cpupool=NULL,
> and we don't allow that to hold true, for anything
> but the idle domain (and there are, in fact, ASSERT()s
> and BUG_ON()s to that effect).
>
> Currently, we never really check d->cpupool during the
> window, but that does not mean the race is not there.
> For instance, Credit2 at some point (during load balancing)
> iterates on the list of domains, and if we add logic that
> needs checking d->cpupool, and any one of them had
> cpupool_rm_domain() called on itself already... Boom!
>
> (In fact, calling __vcpu_has_soft_affinity() from inside
> balance_load() makes `xl shutdown <domid>' reliably
> crash, and this is how I discovered this.)
>
> On the other hand, cpupool_rm_domain() "only" does
> cpupool related bookkeeping, and there's no harm
> postponing it a little bit.
>
> Also, considering that, during domain initialization,
> we do:
>  cpupool_add_domain()
>  sched_init_domain()
>
> It makes sense for the destruction path to look like
> the opposite of it, i.e.:
>  sched_destroy_domain()
>  cpupool_rm_domain()
>
> And hence that's what this patch does.
>
> Actually, for better robustness, what we really do is
> moving both cpupool_add_domain() and cpupool_rm_domain()
> inside sched_init_domain() and sched_destroy_domain(),
> respectively (and also add a couple of ASSERT()-s).
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

This definitely looks better, although I will leave the in-depth review
to people more familiar with the scheduler internals.
Juergen Gross July 15, 2016, 9:38 a.m. UTC | #2
On 14/07/16 18:18, Dario Faggioli wrote:
> So, during domain destruction, we do:
>  cpupool_rm_domain()    [ in domain_destroy() ]
>  sched_destroy_domain() [ in complete_domain_destroy() ]
> 
> Therefore, there's a window during which, from the
> scheduler's point of view, a domain stilsts outside
> of any cpupool.
> 
> In fact, cpupool_rm_domain() does d->cpupool=NULL,
> and we don't allow that to hold true, for anything
> but the idle domain (and there are, in fact, ASSERT()s
> and BUG_ON()s to that effect).
> 
> Currently, we never really check d->cpupool during the
> window, but that does not mean the race is not there.
> For instance, Credit2 at some point (during load balancing)
> iterates on the list of domains, and if we add logic that
> needs checking d->cpupool, and any one of them had
> cpupool_rm_domain() called on itself already... Boom!
> 
> (In fact, calling __vcpu_has_soft_affinity() from inside
> balance_load() makes `xl shutdown <domid>' reliably
> crash, and this is how I discovered this.)
> 
> On the other hand, cpupool_rm_domain() "only" does
> cpupool related bookkeeping, and there's no harm
> postponing it a little bit.
> 
> Also, considering that, during domain initialization,
> we do:
>  cpupool_add_domain()
>  sched_init_domain()
> 
> It makes sense for the destruction path to look like
> the opposite of it, i.e.:
>  sched_destroy_domain()
>  cpupool_rm_domain()
> 
> And hence that's what this patch does.
> 
> Actually, for better robustness, what we really do is
> moving both cpupool_add_domain() and cpupool_rm_domain()
> inside sched_init_domain() and sched_destroy_domain(),
> respectively (and also add a couple of ASSERT()-s).
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Hmm, are you aware of commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
which explicitly moved cpupool_rm_domain() at the place where you are
removing it again? Please verify that the scenario mentioned in the
description of that commit is still working with your patch.


Juergen
Dario Faggioli July 15, 2016, 10:14 a.m. UTC | #3
On Fri, 2016-07-15 at 11:38 +0200, Juergen Gross wrote:
> Hmm, are you aware of commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
> which explicitly moved cpupool_rm_domain() at the place where you are
> removing it again? Please verify that the scenario mentioned in the
> description of that commit is still working with your patch.
> 
Sorry, but I only partly see the problem.

In particular, I'm probably not fully understanding, from that commit
changelog, what is the set of operations/command that I should run to
check whether or not I reintroduced the issue back.

What I did so far is as follows:

root@Zhaman:~# xl cpupool-list 
Name               CPUs   Sched     Active   Domain count
Pool-0              12    credit       y          1
Pool-credit          4    credit       y          1
root@Zhaman:~# xl list -c
Name                                        ID   Mem VCPUs	State	Time(s)         Cpupool
Domain-0                                     0  1019    16     r-----      34.5          Pool-0
vm1                                          1  4096     4     -b----       9.7     Pool-credit
root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from cpupool: Device or resource busy
Some cpus may have not or only partially been removed from 'Pool-credit'.
If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and retry.
root@Zhaman:~# xl cpupool-list -c
Name               CPU list
Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
Pool-credit        9
root@Zhaman:~# xl shutdown vm1
Shutting down domain 1
root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
root@Zhaman:~# xl cpupool-list -c
Name               CPU list
Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
Pool-credit

If (with vm1 still in Pool-credit), I do this, it indeed fails:

root@Zhaman:~# xl shutdown vm1 & xl cpupool-cpu-remove Pool-credit all
[1] 3275
Shutting down domain 2
libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from cpupool: Device or resource busy
Some cpus may have not or only partially been removed from 'Pool-credit'.
If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and retry.
[1]+  Done                    xl shutdown vm1
root@Zhaman:~# xl cpupool-list -c
Name               CPU list
Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
Pool-credit        9

But that does not look too strange to me, as it's entirely possible
that the domain has not been moved yet, when we try to remove the last
cpu. And in fact, after the domain has properly shutdown:

root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
root@Zhaman:~# xl cpupool-list 
Name               CPUs   Sched     Active   Domain count
Pool-0              12    credit       y          1
Pool-credit          0    credit       y          0

And in fact, looking at the code introduced by that commit, the
important part, to me, seems to be the moving of the domain to
cpupool0, which is indeed the right thing to do. OTOH, what I am seeing
and fixing, happens (well, could happen) all the times, even when the
domain being shutdown is already in cpupool0, and (as you say yourself
in your changelog) there not such issue as removing the last cpu of
cpupool0.

What am I missing?

Thanks and Regards,
Dario
Juergen Gross July 15, 2016, 10:36 a.m. UTC | #4
On 15/07/16 12:14, Dario Faggioli wrote:
> On Fri, 2016-07-15 at 11:38 +0200, Juergen Gross wrote:
>> Hmm, are you aware of commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
>> which explicitly moved cpupool_rm_domain() at the place where you are
>> removing it again? Please verify that the scenario mentioned in the
>> description of that commit is still working with your patch.
>>
> Sorry, but I only partly see the problem.
> 
> In particular, I'm probably not fully understanding, from that commit
> changelog, what is the set of operations/command that I should run to
> check whether or not I reintroduced the issue back.

You need to create a domain in a cpupool and destroy it again while
some dom0 process still is holding a reference to it (resulting in a
zombie domain). Then try to destroy the cpupool.

> 
> What I did so far is as follows:
> 
> root@Zhaman:~# xl cpupool-list 
> Name               CPUs   Sched     Active   Domain count
> Pool-0              12    credit       y          1
> Pool-credit          4    credit       y          1
> root@Zhaman:~# xl list -c
> Name                                        ID   Mem VCPUs	State	Time(s)         Cpupool
> Domain-0                                     0  1019    16     r-----      34.5          Pool-0
> vm1                                          1  4096     4     -b----       9.7     Pool-credit
> root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
> libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from cpupool: Device or resource busy
> Some cpus may have not or only partially been removed from 'Pool-credit'.
> If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and retry.
> root@Zhaman:~# xl cpupool-list -c
> Name               CPU list
> Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
> Pool-credit        9
> root@Zhaman:~# xl shutdown vm1
> Shutting down domain 1
> root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
> root@Zhaman:~# xl cpupool-list -c
> Name               CPU list
> Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
> Pool-credit
> 
> If (with vm1 still in Pool-credit), I do this, it indeed fails:
> 
> root@Zhaman:~# xl shutdown vm1 & xl cpupool-cpu-remove Pool-credit all
> [1] 3275
> Shutting down domain 2
> libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from cpupool: Device or resource busy
> Some cpus may have not or only partially been removed from 'Pool-credit'.
> If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and retry.
> [1]+  Done                    xl shutdown vm1
> root@Zhaman:~# xl cpupool-list -c
> Name               CPU list
> Pool-0             0,1,2,3,4,5,10,11,12,13,14,15
> Pool-credit        9
> 
> But that does not look too strange to me, as it's entirely possible
> that the domain has not been moved yet, when we try to remove the last
> cpu. And in fact, after the domain has properly shutdown:
> 
> root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all
> root@Zhaman:~# xl cpupool-list 
> Name               CPUs   Sched     Active   Domain count
> Pool-0              12    credit       y          1
> Pool-credit          0    credit       y          0
> 
> And in fact, looking at the code introduced by that commit, the
> important part, to me, seems to be the moving of the domain to
> cpupool0, which is indeed the right thing to do. OTOH, what I am seeing
> and fixing, happens (well, could happen) all the times, even when the
> domain being shutdown is already in cpupool0, and (as you say yourself
> in your changelog) there not such issue as removing the last cpu of
> cpupool0.
> 
> What am I missing?

The domain being a zombie domain might change the picture. Moving it to
cpupool0 was failing before my patch and it might do so again with your
patch applied.


Juergen
diff mbox

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 42c07ee..339ee56 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -379,10 +379,7 @@  struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
         goto fail;
     init_status |= INIT_arch;
 
-    if ( (err = cpupool_add_domain(d, poolid)) != 0 )
-        goto fail;
-
-    if ( (err = sched_init_domain(d)) != 0 )
+    if ( (err = sched_init_domain(d, poolid)) != 0 )
         goto fail;
 
     if ( (err = late_hwdom_init(d)) != 0 )
@@ -868,8 +865,6 @@  void domain_destroy(struct domain *d)
 
     TRACE_1D(TRC_DOM0_DOM_REM, d->domain_id);
 
-    cpupool_rm_domain(d);
-
     /* Delete from task list and task hashtable. */
     spin_lock(&domlist_update_lock);
     pd = &domain_list;
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 7ac12d3..852f840 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -379,8 +379,15 @@  void sched_destroy_vcpu(struct vcpu *v)
     SCHED_OP(VCPU2OP(v), free_vdata, v->sched_priv);
 }
 
-int sched_init_domain(struct domain *d)
+int sched_init_domain(struct domain *d, int poolid)
 {
+    int ret;
+
+    ASSERT(d->cpupool == NULL);
+
+    if ( (ret = cpupool_add_domain(d, poolid)) )
+        return ret;
+
     SCHED_STAT_CRANK(dom_init);
     TRACE_1D(TRC_SCHED_DOM_ADD, d->domain_id);
     return SCHED_OP(DOM2OP(d), init_domain, d);
@@ -388,9 +395,13 @@  int sched_init_domain(struct domain *d)
 
 void sched_destroy_domain(struct domain *d)
 {
+    ASSERT(d->cpupool != NULL || is_idle_domain(d));
+
     SCHED_STAT_CRANK(dom_destroy);
     TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id);
     SCHED_OP(DOM2OP(d), destroy_domain, d);
+
+    cpupool_rm_domain(d);
 }
 
 void vcpu_sleep_nosync(struct vcpu *v)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 46c82e7..888bc19 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -637,7 +637,7 @@  void noreturn asm_domain_crash_synchronous(unsigned long addr);
 void scheduler_init(void);
 int  sched_init_vcpu(struct vcpu *v, unsigned int processor);
 void sched_destroy_vcpu(struct vcpu *v);
-int  sched_init_domain(struct domain *d);
+int  sched_init_domain(struct domain *d, int poolid);
 void sched_destroy_domain(struct domain *d);
 int sched_move_domain(struct domain *d, struct cpupool *c);
 long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *);