Message ID | 146851308019.22413.8905002507733716302.stgit@Solace.fritz.box (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14/07/16 17:18, Dario Faggioli wrote: > So, during domain destruction, we do: > cpupool_rm_domain() [ in domain_destroy() ] > sched_destroy_domain() [ in complete_domain_destroy() ] > > Therefore, there's a window during which, from the > scheduler's point of view, a domain stilsts outside > of any cpupool. > > In fact, cpupool_rm_domain() does d->cpupool=NULL, > and we don't allow that to hold true, for anything > but the idle domain (and there are, in fact, ASSERT()s > and BUG_ON()s to that effect). > > Currently, we never really check d->cpupool during the > window, but that does not mean the race is not there. > For instance, Credit2 at some point (during load balancing) > iterates on the list of domains, and if we add logic that > needs checking d->cpupool, and any one of them had > cpupool_rm_domain() called on itself already... Boom! > > (In fact, calling __vcpu_has_soft_affinity() from inside > balance_load() makes `xl shutdown <domid>' reliably > crash, and this is how I discovered this.) > > On the other hand, cpupool_rm_domain() "only" does > cpupool related bookkeeping, and there's no harm > postponing it a little bit. > > Also, considering that, during domain initialization, > we do: > cpupool_add_domain() > sched_init_domain() > > It makes sense for the destruction path to look like > the opposite of it, i.e.: > sched_destroy_domain() > cpupool_rm_domain() > > And hence that's what this patch does. > > Actually, for better robustness, what we really do is > moving both cpupool_add_domain() and cpupool_rm_domain() > inside sched_init_domain() and sched_destroy_domain(), > respectively (and also add a couple of ASSERT()-s). > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Cc: Juergen Gross <jgross@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > Cc: George Dunlap <George.Dunlap@eu.citrix.com> > Cc: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> This definitely looks better, although I will leave the in-depth review to people more familiar with the scheduler internals.
On 14/07/16 18:18, Dario Faggioli wrote: > So, during domain destruction, we do: > cpupool_rm_domain() [ in domain_destroy() ] > sched_destroy_domain() [ in complete_domain_destroy() ] > > Therefore, there's a window during which, from the > scheduler's point of view, a domain stilsts outside > of any cpupool. > > In fact, cpupool_rm_domain() does d->cpupool=NULL, > and we don't allow that to hold true, for anything > but the idle domain (and there are, in fact, ASSERT()s > and BUG_ON()s to that effect). > > Currently, we never really check d->cpupool during the > window, but that does not mean the race is not there. > For instance, Credit2 at some point (during load balancing) > iterates on the list of domains, and if we add logic that > needs checking d->cpupool, and any one of them had > cpupool_rm_domain() called on itself already... Boom! > > (In fact, calling __vcpu_has_soft_affinity() from inside > balance_load() makes `xl shutdown <domid>' reliably > crash, and this is how I discovered this.) > > On the other hand, cpupool_rm_domain() "only" does > cpupool related bookkeeping, and there's no harm > postponing it a little bit. > > Also, considering that, during domain initialization, > we do: > cpupool_add_domain() > sched_init_domain() > > It makes sense for the destruction path to look like > the opposite of it, i.e.: > sched_destroy_domain() > cpupool_rm_domain() > > And hence that's what this patch does. > > Actually, for better robustness, what we really do is > moving both cpupool_add_domain() and cpupool_rm_domain() > inside sched_init_domain() and sched_destroy_domain(), > respectively (and also add a couple of ASSERT()-s). > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Hmm, are you aware of commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a which explicitly moved cpupool_rm_domain() at the place where you are removing it again? Please verify that the scenario mentioned in the description of that commit is still working with your patch. Juergen
On Fri, 2016-07-15 at 11:38 +0200, Juergen Gross wrote: > Hmm, are you aware of commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a > which explicitly moved cpupool_rm_domain() at the place where you are > removing it again? Please verify that the scenario mentioned in the > description of that commit is still working with your patch. > Sorry, but I only partly see the problem. In particular, I'm probably not fully understanding, from that commit changelog, what is the set of operations/command that I should run to check whether or not I reintroduced the issue back. What I did so far is as follows: root@Zhaman:~# xl cpupool-list Name CPUs Sched Active Domain count Pool-0 12 credit y 1 Pool-credit 4 credit y 1 root@Zhaman:~# xl list -c Name ID Mem VCPUs State Time(s) Cpupool Domain-0 0 1019 16 r----- 34.5 Pool-0 vm1 1 4096 4 -b---- 9.7 Pool-credit root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from cpupool: Device or resource busy Some cpus may have not or only partially been removed from 'Pool-credit'. If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and retry. root@Zhaman:~# xl cpupool-list -c Name CPU list Pool-0 0,1,2,3,4,5,10,11,12,13,14,15 Pool-credit 9 root@Zhaman:~# xl shutdown vm1 Shutting down domain 1 root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all root@Zhaman:~# xl cpupool-list -c Name CPU list Pool-0 0,1,2,3,4,5,10,11,12,13,14,15 Pool-credit If (with vm1 still in Pool-credit), I do this, it indeed fails: root@Zhaman:~# xl shutdown vm1 & xl cpupool-cpu-remove Pool-credit all [1] 3275 Shutting down domain 2 libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from cpupool: Device or resource busy Some cpus may have not or only partially been removed from 'Pool-credit'. If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and retry. [1]+ Done xl shutdown vm1 root@Zhaman:~# xl cpupool-list -c Name CPU list Pool-0 0,1,2,3,4,5,10,11,12,13,14,15 Pool-credit 9 But that does not look too strange to me, as it's entirely possible that the domain has not been moved yet, when we try to remove the last cpu. And in fact, after the domain has properly shutdown: root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all root@Zhaman:~# xl cpupool-list Name CPUs Sched Active Domain count Pool-0 12 credit y 1 Pool-credit 0 credit y 0 And in fact, looking at the code introduced by that commit, the important part, to me, seems to be the moving of the domain to cpupool0, which is indeed the right thing to do. OTOH, what I am seeing and fixing, happens (well, could happen) all the times, even when the domain being shutdown is already in cpupool0, and (as you say yourself in your changelog) there not such issue as removing the last cpu of cpupool0. What am I missing? Thanks and Regards, Dario
On 15/07/16 12:14, Dario Faggioli wrote: > On Fri, 2016-07-15 at 11:38 +0200, Juergen Gross wrote: >> Hmm, are you aware of commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a >> which explicitly moved cpupool_rm_domain() at the place where you are >> removing it again? Please verify that the scenario mentioned in the >> description of that commit is still working with your patch. >> > Sorry, but I only partly see the problem. > > In particular, I'm probably not fully understanding, from that commit > changelog, what is the set of operations/command that I should run to > check whether or not I reintroduced the issue back. You need to create a domain in a cpupool and destroy it again while some dom0 process still is holding a reference to it (resulting in a zombie domain). Then try to destroy the cpupool. > > What I did so far is as follows: > > root@Zhaman:~# xl cpupool-list > Name CPUs Sched Active Domain count > Pool-0 12 credit y 1 > Pool-credit 4 credit y 1 > root@Zhaman:~# xl list -c > Name ID Mem VCPUs State Time(s) Cpupool > Domain-0 0 1019 16 r----- 34.5 Pool-0 > vm1 1 4096 4 -b---- 9.7 Pool-credit > root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all > libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from cpupool: Device or resource busy > Some cpus may have not or only partially been removed from 'Pool-credit'. > If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and retry. > root@Zhaman:~# xl cpupool-list -c > Name CPU list > Pool-0 0,1,2,3,4,5,10,11,12,13,14,15 > Pool-credit 9 > root@Zhaman:~# xl shutdown vm1 > Shutting down domain 1 > root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all > root@Zhaman:~# xl cpupool-list -c > Name CPU list > Pool-0 0,1,2,3,4,5,10,11,12,13,14,15 > Pool-credit > > If (with vm1 still in Pool-credit), I do this, it indeed fails: > > root@Zhaman:~# xl shutdown vm1 & xl cpupool-cpu-remove Pool-credit all > [1] 3275 > Shutting down domain 2 > libxl: error: libxl.c:6998:libxl_cpupool_cpuremove: Error removing cpu 9 from cpupool: Device or resource busy > Some cpus may have not or only partially been removed from 'Pool-credit'. > If a cpu can't be added to another cpupool, add it to 'Pool-credit' again and retry. > [1]+ Done xl shutdown vm1 > root@Zhaman:~# xl cpupool-list -c > Name CPU list > Pool-0 0,1,2,3,4,5,10,11,12,13,14,15 > Pool-credit 9 > > But that does not look too strange to me, as it's entirely possible > that the domain has not been moved yet, when we try to remove the last > cpu. And in fact, after the domain has properly shutdown: > > root@Zhaman:~# xl cpupool-cpu-remove Pool-credit all > root@Zhaman:~# xl cpupool-list > Name CPUs Sched Active Domain count > Pool-0 12 credit y 1 > Pool-credit 0 credit y 0 > > And in fact, looking at the code introduced by that commit, the > important part, to me, seems to be the moving of the domain to > cpupool0, which is indeed the right thing to do. OTOH, what I am seeing > and fixing, happens (well, could happen) all the times, even when the > domain being shutdown is already in cpupool0, and (as you say yourself > in your changelog) there not such issue as removing the last cpu of > cpupool0. > > What am I missing? The domain being a zombie domain might change the picture. Moving it to cpupool0 was failing before my patch and it might do so again with your patch applied. Juergen
diff --git a/xen/common/domain.c b/xen/common/domain.c index 42c07ee..339ee56 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -379,10 +379,7 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags, goto fail; init_status |= INIT_arch; - if ( (err = cpupool_add_domain(d, poolid)) != 0 ) - goto fail; - - if ( (err = sched_init_domain(d)) != 0 ) + if ( (err = sched_init_domain(d, poolid)) != 0 ) goto fail; if ( (err = late_hwdom_init(d)) != 0 ) @@ -868,8 +865,6 @@ void domain_destroy(struct domain *d) TRACE_1D(TRC_DOM0_DOM_REM, d->domain_id); - cpupool_rm_domain(d); - /* Delete from task list and task hashtable. */ spin_lock(&domlist_update_lock); pd = &domain_list; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 7ac12d3..852f840 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -379,8 +379,15 @@ void sched_destroy_vcpu(struct vcpu *v) SCHED_OP(VCPU2OP(v), free_vdata, v->sched_priv); } -int sched_init_domain(struct domain *d) +int sched_init_domain(struct domain *d, int poolid) { + int ret; + + ASSERT(d->cpupool == NULL); + + if ( (ret = cpupool_add_domain(d, poolid)) ) + return ret; + SCHED_STAT_CRANK(dom_init); TRACE_1D(TRC_SCHED_DOM_ADD, d->domain_id); return SCHED_OP(DOM2OP(d), init_domain, d); @@ -388,9 +395,13 @@ int sched_init_domain(struct domain *d) void sched_destroy_domain(struct domain *d) { + ASSERT(d->cpupool != NULL || is_idle_domain(d)); + SCHED_STAT_CRANK(dom_destroy); TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id); SCHED_OP(DOM2OP(d), destroy_domain, d); + + cpupool_rm_domain(d); } void vcpu_sleep_nosync(struct vcpu *v) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 46c82e7..888bc19 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -637,7 +637,7 @@ void noreturn asm_domain_crash_synchronous(unsigned long addr); void scheduler_init(void); int sched_init_vcpu(struct vcpu *v, unsigned int processor); void sched_destroy_vcpu(struct vcpu *v); -int sched_init_domain(struct domain *d); +int sched_init_domain(struct domain *d, int poolid); void sched_destroy_domain(struct domain *d); int sched_move_domain(struct domain *d, struct cpupool *c); long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *);
So, during domain destruction, we do: cpupool_rm_domain() [ in domain_destroy() ] sched_destroy_domain() [ in complete_domain_destroy() ] Therefore, there's a window during which, from the scheduler's point of view, a domain stilsts outside of any cpupool. In fact, cpupool_rm_domain() does d->cpupool=NULL, and we don't allow that to hold true, for anything but the idle domain (and there are, in fact, ASSERT()s and BUG_ON()s to that effect). Currently, we never really check d->cpupool during the window, but that does not mean the race is not there. For instance, Credit2 at some point (during load balancing) iterates on the list of domains, and if we add logic that needs checking d->cpupool, and any one of them had cpupool_rm_domain() called on itself already... Boom! (In fact, calling __vcpu_has_soft_affinity() from inside balance_load() makes `xl shutdown <domid>' reliably crash, and this is how I discovered this.) On the other hand, cpupool_rm_domain() "only" does cpupool related bookkeeping, and there's no harm postponing it a little bit. Also, considering that, during domain initialization, we do: cpupool_add_domain() sched_init_domain() It makes sense for the destruction path to look like the opposite of it, i.e.: sched_destroy_domain() cpupool_rm_domain() And hence that's what this patch does. Actually, for better robustness, what we really do is moving both cpupool_add_domain() and cpupool_rm_domain() inside sched_init_domain() and sched_destroy_domain(), respectively (and also add a couple of ASSERT()-s). Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Cc: Juergen Gross <jgross@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> --- Changes from v1: * followed Andrew's suggestion of moving the cpupool functions inside the sched functions; * reworded the changelog. --- xen/common/domain.c | 7 +------ xen/common/schedule.c | 13 ++++++++++++- xen/include/xen/sched.h | 2 +- 3 files changed, 14 insertions(+), 8 deletions(-)