Message ID | 20201221181446.7791-3-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen/domain: More structured teardown | expand |
Hi Andrew, On 21/12/2020 18:14, Andrew Cooper wrote: > There is no common equivelent of domain_reliquish_resources(), which has s/equivelent/equivalent/ > caused various pieces of common cleanup to live in inappropriate > places. > > Perhaps most obviously, evtchn_destroy() is called for every continuation of > domain_reliquish_resources(), which can easily be thousands of times. s/reliquish/relinquish/ > > Create domain_teardown() to be a new top level facility, and call it from the > appropriate positions in domain_kill() and domain_create()'s error path. > > No change in behaviour yet. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Wei Liu <wl@xen.org> > CC: Stefano Stabellini <sstabellini@kernel.org> > CC: Julien Grall <julien@xen.org> > CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> > --- > xen/common/domain.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++ > xen/include/xen/sched.h | 8 +++++++ > 2 files changed, 67 insertions(+) > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index ce3667f1b4..ef1987335b 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s) > custom_param("extra_guest_irqs", parse_extra_guest_irqs); > > /* > + * Release resources held by a domain. There may or may not be live > + * references to the domain, and it may or may not be fully constructed. > + * > + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used > + * to determine if live references to the domain exist, and also whether > + * continuations are permitted. > + * > + * If d->is_dying is DOMDYING_dead, this must not return non-zero. > + */ > +static int domain_teardown(struct domain *d) > +{ > + BUG_ON(!d->is_dying); > + > + /* > + * This hypercall can take minutes of wallclock time to complete. This > + * logic implements a co-routine, stashing state in struct domain across > + * hypercall continuation boundaries. > + */ > + switch ( d->teardown.val ) > + { > + /* > + * Record the current progress. Subsequent hypercall continuations > + * will logically restart work from this point. > + * > + * PROGRESS() markers must not be in the middle of loops. The loop > + * variable isn't preserved across a continuation. > + * > + * To avoid redundant work, there should be a marker before each > + * function which may return -ERESTART. > + */ > +#define PROGRESS(x) \ > + d->teardown.val = PROG_ ## x; \ > + /* Fallthrough */ \ > + case PROG_ ## x > + > + enum { > + PROG_done = 1, > + }; > + > + case 0: > + PROGRESS(done): > + break; > + > +#undef PROGRESS > + > + default: > + BUG(); > + } > + > + return 0; > +} > + > +/* > * Destroy a domain once all references to it have been dropped. Used either > * from the RCU path, or from the domain_create() error path before the domain > * is inserted into the domlist. > @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid, > if ( init_status & INIT_watchdog ) > watchdog_domain_destroy(d); > > + /* Must not hit a continuation in this context. */ > + ASSERT(domain_teardown(d) == 0); The ASSERT() will become a NOP in production build, so domain_teardown_down() will not be called. However, I think it would be better if we pass an extra argument to indicated wheter the code is allowed to preempt. This would make the preemption check more obvious in evtchn_destroy() compare to the current d->is_dying != DOMDYING_dead. > + > _domain_destroy(d); > > return ERR_PTR(err); > @@ -733,6 +789,9 @@ int domain_kill(struct domain *d) > domain_set_outstanding_pages(d, 0); > /* fallthrough */ > case DOMDYING_dying: > + rc = domain_teardown(d); > + if ( rc ) > + break; > rc = evtchn_destroy(d); > if ( rc ) > break; > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > index faf5fda36f..3f35c537b8 100644 > --- a/xen/include/xen/sched.h > +++ b/xen/include/xen/sched.h > @@ -525,6 +525,14 @@ struct domain > /* Argo interdomain communication support */ > struct argo_domain *argo; > #endif > + > + /* > + * Continuation information for domain_teardown(). All fields entirely > + * private. > + */ > + struct { > + unsigned int val; > + } teardown; > }; > > static inline struct page_list_head *page_to_list( > Cheers,
On 21/12/2020 18:36, Julien Grall wrote: >> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid, >> if ( init_status & INIT_watchdog ) >> watchdog_domain_destroy(d); >> + /* Must not hit a continuation in this context. */ >> + ASSERT(domain_teardown(d) == 0); > The ASSERT() will become a NOP in production build, so > domain_teardown_down() will not be called. Urgh - its not really a nop, but it's evaluation isn't symmetric between debug and release builds. I'll need an extra local variable. > > However, I think it would be better if we pass an extra argument to > indicated wheter the code is allowed to preempt. This would make the > preemption check more obvious in evtchn_destroy() compare to the > current d->is_dying != DOMDYING_dead. We can have a predicate if you'd prefer, but plumbing an extra parameter is wasteful, and can only cause confusion if it is out of sync with d->is_dying. ~Andrew
On 21.12.2020 19:45, Andrew Cooper wrote: > On 21/12/2020 18:36, Julien Grall wrote: >>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid, >>> if ( init_status & INIT_watchdog ) >>> watchdog_domain_destroy(d); >>> + /* Must not hit a continuation in this context. */ >>> + ASSERT(domain_teardown(d) == 0); >> The ASSERT() will become a NOP in production build, so >> domain_teardown_down() will not be called. > > Urgh - its not really a nop, but it's evaluation isn't symmetric between > debug and release builds. I'll need an extra local variable. Or use ASSERT_UNREACHABLE(). (I admit I don't really like the resulting constructs, and would like to propose an alternative, even if I fear it'll be controversial.) >> However, I think it would be better if we pass an extra argument to >> indicated wheter the code is allowed to preempt. This would make the >> preemption check more obvious in evtchn_destroy() compare to the >> current d->is_dying != DOMDYING_dead. > > We can have a predicate if you'd prefer, but plumbing an extra parameter > is wasteful, and can only cause confusion if it is out of sync with > d->is_dying. I agree here - it wasn't so long ago that event_channel.c gained a DOMDYING_dead check, and I don't see why we shouldn't extend this approach to here and elsewhere. Jan
Hi, On 22/12/2020 07:50, Jan Beulich wrote: > On 21.12.2020 19:45, Andrew Cooper wrote: >> On 21/12/2020 18:36, Julien Grall wrote: >>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid, >>>> if ( init_status & INIT_watchdog ) >>>> watchdog_domain_destroy(d); >>>> + /* Must not hit a continuation in this context. */ >>>> + ASSERT(domain_teardown(d) == 0); >>> The ASSERT() will become a NOP in production build, so >>> domain_teardown_down() will not be called. >> >> Urgh - its not really a nop, but it's evaluation isn't symmetric between >> debug and release builds. I'll need an extra local variable. > > Or use ASSERT_UNREACHABLE(). (I admit I don't really like the > resulting constructs, and would like to propose an alternative, > even if I fear it'll be controversial.) > >>> However, I think it would be better if we pass an extra argument to >>> indicated wheter the code is allowed to preempt. This would make the >>> preemption check more obvious in evtchn_destroy() compare to the >>> current d->is_dying != DOMDYING_dead. >> >> We can have a predicate if you'd prefer, but plumbing an extra parameter >> is wasteful, and can only cause confusion if it is out of sync with >> d->is_dying. > > I agree here - it wasn't so long ago that event_channel.c gained > a DOMDYING_dead check, and I don't see why we shouldn't extend > this approach to here and elsewhere. I think the d->is_dying != DOMYING_dead is difficult to understand even with the comment on top. This was ok in one place, but now it will spread everywhere. So at least, I would suggest to introduce a wrapper that is better named. There is also a futureproof concern. At the moment, we are considering the preemption will not be needed in domain_create(). I am ready to bet that the assumption is going to be broken sooner or later. As we don't have an extra parameter, we would need to investigate all the callers in helpers to see where we need to drop the d->is_dying != DOMYING_dead. I am sure you will agree that there is a risk to screw up. With an extra parameter, it is easier to re-enable preemption when needed. Anyway, both of you seem to agree on the approach here. If that's what you want, then so it be, but that's not something I will feel confident to Ack. Although, I will not Nack it. Cheers,
On 21.12.2020 19:14, Andrew Cooper wrote: > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s) > custom_param("extra_guest_irqs", parse_extra_guest_irqs); > > /* > + * Release resources held by a domain. There may or may not be live > + * references to the domain, and it may or may not be fully constructed. > + * > + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used > + * to determine if live references to the domain exist, and also whether > + * continuations are permitted. > + * > + * If d->is_dying is DOMDYING_dead, this must not return non-zero. > + */ > +static int domain_teardown(struct domain *d) > +{ > + BUG_ON(!d->is_dying); > + > + /* > + * This hypercall can take minutes of wallclock time to complete. This > + * logic implements a co-routine, stashing state in struct domain across > + * hypercall continuation boundaries. > + */ > + switch ( d->teardown.val ) > + { > + /* > + * Record the current progress. Subsequent hypercall continuations > + * will logically restart work from this point. > + * > + * PROGRESS() markers must not be in the middle of loops. The loop > + * variable isn't preserved across a continuation. > + * > + * To avoid redundant work, there should be a marker before each > + * function which may return -ERESTART. > + */ > +#define PROGRESS(x) \ > + d->teardown.val = PROG_ ## x; \ > + /* Fallthrough */ \ > + case PROG_ ## x > + > + enum { > + PROG_done = 1, > + }; > + > + case 0: > + PROGRESS(done): > + break; > + > +#undef PROGRESS > + > + default: > + BUG(); > + } > + > + return 0; > +} While as an initial step this may be okay, I envision ordering issues with domain_relinquish_resources() - there may be per-arch things that want doing before certain common ones, and others which should only be done when certain common teardown was already performed. Therefore rather than this being a "common equivalent of domain_relinquish_resources()", I'd rather see (and call) it a (designated) replacement, where individual per-arch functions would get called at appropriate times. Jan
On 22.12.2020 11:25, Julien Grall wrote: > On 22/12/2020 07:50, Jan Beulich wrote: >> On 21.12.2020 19:45, Andrew Cooper wrote: >>> On 21/12/2020 18:36, Julien Grall wrote: >>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid, >>>>> if ( init_status & INIT_watchdog ) >>>>> watchdog_domain_destroy(d); >>>>> + /* Must not hit a continuation in this context. */ >>>>> + ASSERT(domain_teardown(d) == 0); >>>> The ASSERT() will become a NOP in production build, so >>>> domain_teardown_down() will not be called. >>> >>> Urgh - its not really a nop, but it's evaluation isn't symmetric between >>> debug and release builds. I'll need an extra local variable. >> >> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the >> resulting constructs, and would like to propose an alternative, >> even if I fear it'll be controversial.) >> >>>> However, I think it would be better if we pass an extra argument to >>>> indicated wheter the code is allowed to preempt. This would make the >>>> preemption check more obvious in evtchn_destroy() compare to the >>>> current d->is_dying != DOMDYING_dead. >>> >>> We can have a predicate if you'd prefer, but plumbing an extra parameter >>> is wasteful, and can only cause confusion if it is out of sync with >>> d->is_dying. >> >> I agree here - it wasn't so long ago that event_channel.c gained >> a DOMDYING_dead check, and I don't see why we shouldn't extend >> this approach to here and elsewhere. > > I think the d->is_dying != DOMYING_dead is difficult to understand even > with the comment on top. This was ok in one place, but now it will > spread everywhere. So at least, I would suggest to introduce a wrapper > that is better named. > > There is also a futureproof concern. At the moment, we are considering > the preemption will not be needed in domain_create(). I am ready to bet > that the assumption is going to be broken sooner or later. This is a fair consideration, yet I'm having trouble seeing what it might be that would cause domain_create() to require preemption. The function is supposed to only produce an empty container. But yes, if e.g. vCPU creation was to move here, the situation would indeed change. Jan
Hi Jan, On 22/12/2020 10:53, Jan Beulich wrote: > On 22.12.2020 11:25, Julien Grall wrote: >> On 22/12/2020 07:50, Jan Beulich wrote: >>> On 21.12.2020 19:45, Andrew Cooper wrote: >>>> On 21/12/2020 18:36, Julien Grall wrote: >>>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid, >>>>>> if ( init_status & INIT_watchdog ) >>>>>> watchdog_domain_destroy(d); >>>>>> + /* Must not hit a continuation in this context. */ >>>>>> + ASSERT(domain_teardown(d) == 0); >>>>> The ASSERT() will become a NOP in production build, so >>>>> domain_teardown_down() will not be called. >>>> >>>> Urgh - its not really a nop, but it's evaluation isn't symmetric between >>>> debug and release builds. I'll need an extra local variable. >>> >>> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the >>> resulting constructs, and would like to propose an alternative, >>> even if I fear it'll be controversial.) >>> >>>>> However, I think it would be better if we pass an extra argument to >>>>> indicated wheter the code is allowed to preempt. This would make the >>>>> preemption check more obvious in evtchn_destroy() compare to the >>>>> current d->is_dying != DOMDYING_dead. >>>> >>>> We can have a predicate if you'd prefer, but plumbing an extra parameter >>>> is wasteful, and can only cause confusion if it is out of sync with >>>> d->is_dying. >>> >>> I agree here - it wasn't so long ago that event_channel.c gained >>> a DOMDYING_dead check, and I don't see why we shouldn't extend >>> this approach to here and elsewhere. >> >> I think the d->is_dying != DOMYING_dead is difficult to understand even >> with the comment on top. This was ok in one place, but now it will >> spread everywhere. So at least, I would suggest to introduce a wrapper >> that is better named. >> >> There is also a futureproof concern. At the moment, we are considering >> the preemption will not be needed in domain_create(). I am ready to bet >> that the assumption is going to be broken sooner or later. > > This is a fair consideration, yet I'm having trouble seeing what it > might be that would cause domain_create() to require preemption. > The function is supposed to only produce an empty container. But yes, > if e.g. vCPU creation was to move here, the situation would indeed > change. We are not really creating an "empty container". There are at least one initial pool of memory allocated for HAP (256 pages) which need to be freed if we fail to create the domain. There is a second pool in discussion for the IOMMU (see [1]) which will also initially allocate 256 pages. To me it looks like domain_create() is getting quite big and preemption is going to be required sooner or later. Cheers, [1] <20201005094905.2929-6-paul@xen.org> > > Jan >
On 22/12/2020 10:53, Jan Beulich wrote: > On 22.12.2020 11:25, Julien Grall wrote: >> On 22/12/2020 07:50, Jan Beulich wrote: >>> On 21.12.2020 19:45, Andrew Cooper wrote: >>>> On 21/12/2020 18:36, Julien Grall wrote: >>>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid, >>>>>> if ( init_status & INIT_watchdog ) >>>>>> watchdog_domain_destroy(d); >>>>>> + /* Must not hit a continuation in this context. */ >>>>>> + ASSERT(domain_teardown(d) == 0); >>>>> The ASSERT() will become a NOP in production build, so >>>>> domain_teardown_down() will not be called. >>>> Urgh - its not really a nop, but it's evaluation isn't symmetric between >>>> debug and release builds. I'll need an extra local variable. >>> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the >>> resulting constructs, and would like to propose an alternative, >>> even if I fear it'll be controversial.) >>> >>>>> However, I think it would be better if we pass an extra argument to >>>>> indicated wheter the code is allowed to preempt. This would make the >>>>> preemption check more obvious in evtchn_destroy() compare to the >>>>> current d->is_dying != DOMDYING_dead. >>>> We can have a predicate if you'd prefer, but plumbing an extra parameter >>>> is wasteful, and can only cause confusion if it is out of sync with >>>> d->is_dying. >>> I agree here - it wasn't so long ago that event_channel.c gained >>> a DOMDYING_dead check, and I don't see why we shouldn't extend >>> this approach to here and elsewhere. >> I think the d->is_dying != DOMYING_dead is difficult to understand even >> with the comment on top. This was ok in one place, but now it will >> spread everywhere. So at least, I would suggest to introduce a wrapper >> that is better named. >> >> There is also a futureproof concern. At the moment, we are considering >> the preemption will not be needed in domain_create(). I am ready to bet >> that the assumption is going to be broken sooner or later. > This is a fair consideration, yet I'm having trouble seeing what it > might be that would cause domain_create() to require preemption. > The function is supposed to only produce an empty container. But yes, > if e.g. vCPU creation was to move here, the situation would indeed > change. As discussed, I no longer think that is a good plan, especially if we want a sane mechanism for not allocating AMX memory for domains not configured to use AMX. domain_create() (and vcpu_create() to a lesser extent) are the functions which can't become preemptible, because they are the allocation and setup of the objects which would be used to store continuation information for other hypercalls. The only option if these get too complicated is to split the complexity out into other hypercalls. ~Andrew
On 22/12/2020 10:35, Jan Beulich wrote: > On 21.12.2020 19:14, Andrew Cooper wrote: >> --- a/xen/common/domain.c >> +++ b/xen/common/domain.c >> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s) >> custom_param("extra_guest_irqs", parse_extra_guest_irqs); >> >> /* >> + * Release resources held by a domain. There may or may not be live >> + * references to the domain, and it may or may not be fully constructed. >> + * >> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used >> + * to determine if live references to the domain exist, and also whether >> + * continuations are permitted. >> + * >> + * If d->is_dying is DOMDYING_dead, this must not return non-zero. >> + */ >> +static int domain_teardown(struct domain *d) >> +{ >> + BUG_ON(!d->is_dying); >> + >> + /* >> + * This hypercall can take minutes of wallclock time to complete. This >> + * logic implements a co-routine, stashing state in struct domain across >> + * hypercall continuation boundaries. >> + */ >> + switch ( d->teardown.val ) >> + { >> + /* >> + * Record the current progress. Subsequent hypercall continuations >> + * will logically restart work from this point. >> + * >> + * PROGRESS() markers must not be in the middle of loops. The loop >> + * variable isn't preserved across a continuation. >> + * >> + * To avoid redundant work, there should be a marker before each >> + * function which may return -ERESTART. >> + */ >> +#define PROGRESS(x) \ >> + d->teardown.val = PROG_ ## x; \ >> + /* Fallthrough */ \ >> + case PROG_ ## x >> + >> + enum { >> + PROG_done = 1, >> + }; >> + >> + case 0: >> + PROGRESS(done): >> + break; >> + >> +#undef PROGRESS >> + >> + default: >> + BUG(); >> + } >> + >> + return 0; >> +} > While as an initial step this may be okay, I envision ordering issues > with domain_relinquish_resources() - there may be per-arch things that > want doing before certain common ones, and others which should only > be done when certain common teardown was already performed. Therefore > rather than this being a "common equivalent of > domain_relinquish_resources()", I'd rather see (and call) it a > (designated) replacement, where individual per-arch functions would > get called at appropriate times. Over time, I do expect it to replace domain_relinquish_resources(). I'm not sure if that will take the form of a specific arch_domain_teardown() call (and reserving some space in teardown.val for arch use), or whether it will be a load of possibly-CONFIG'd stubs. I do also have a work in progress supporting per-vcpu continuations to be able to remove the TODO's introduced into the paging() path. However there is a bit more plumbing work required than I have time right now, so this will have to wait a little until I've got some of the more urgent 4.15 work done. ~Andrew
On 22.12.2020 12:46, Andrew Cooper wrote: > On 22/12/2020 10:35, Jan Beulich wrote: >> On 21.12.2020 19:14, Andrew Cooper wrote: >>> --- a/xen/common/domain.c >>> +++ b/xen/common/domain.c >>> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s) >>> custom_param("extra_guest_irqs", parse_extra_guest_irqs); >>> >>> /* >>> + * Release resources held by a domain. There may or may not be live >>> + * references to the domain, and it may or may not be fully constructed. >>> + * >>> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used >>> + * to determine if live references to the domain exist, and also whether >>> + * continuations are permitted. >>> + * >>> + * If d->is_dying is DOMDYING_dead, this must not return non-zero. >>> + */ >>> +static int domain_teardown(struct domain *d) >>> +{ >>> + BUG_ON(!d->is_dying); >>> + >>> + /* >>> + * This hypercall can take minutes of wallclock time to complete. This >>> + * logic implements a co-routine, stashing state in struct domain across >>> + * hypercall continuation boundaries. >>> + */ >>> + switch ( d->teardown.val ) >>> + { >>> + /* >>> + * Record the current progress. Subsequent hypercall continuations >>> + * will logically restart work from this point. >>> + * >>> + * PROGRESS() markers must not be in the middle of loops. The loop >>> + * variable isn't preserved across a continuation. >>> + * >>> + * To avoid redundant work, there should be a marker before each >>> + * function which may return -ERESTART. >>> + */ >>> +#define PROGRESS(x) \ >>> + d->teardown.val = PROG_ ## x; \ >>> + /* Fallthrough */ \ >>> + case PROG_ ## x >>> + >>> + enum { >>> + PROG_done = 1, >>> + }; >>> + >>> + case 0: >>> + PROGRESS(done): >>> + break; >>> + >>> +#undef PROGRESS >>> + >>> + default: >>> + BUG(); >>> + } >>> + >>> + return 0; >>> +} >> While as an initial step this may be okay, I envision ordering issues >> with domain_relinquish_resources() - there may be per-arch things that >> want doing before certain common ones, and others which should only >> be done when certain common teardown was already performed. Therefore >> rather than this being a "common equivalent of >> domain_relinquish_resources()", I'd rather see (and call) it a >> (designated) replacement, where individual per-arch functions would >> get called at appropriate times. > > Over time, I do expect it to replace domain_relinquish_resources(). I'm > not sure if that will take the form of a specific arch_domain_teardown() > call (and reserving some space in teardown.val for arch use), or whether > it will be a load of possibly-CONFIG'd stubs. I'd consider helpful if you could slightly re-word the description then. Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
diff --git a/xen/common/domain.c b/xen/common/domain.c index ce3667f1b4..ef1987335b 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s) custom_param("extra_guest_irqs", parse_extra_guest_irqs); /* + * Release resources held by a domain. There may or may not be live + * references to the domain, and it may or may not be fully constructed. + * + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used + * to determine if live references to the domain exist, and also whether + * continuations are permitted. + * + * If d->is_dying is DOMDYING_dead, this must not return non-zero. + */ +static int domain_teardown(struct domain *d) +{ + BUG_ON(!d->is_dying); + + /* + * This hypercall can take minutes of wallclock time to complete. This + * logic implements a co-routine, stashing state in struct domain across + * hypercall continuation boundaries. + */ + switch ( d->teardown.val ) + { + /* + * Record the current progress. Subsequent hypercall continuations + * will logically restart work from this point. + * + * PROGRESS() markers must not be in the middle of loops. The loop + * variable isn't preserved across a continuation. + * + * To avoid redundant work, there should be a marker before each + * function which may return -ERESTART. + */ +#define PROGRESS(x) \ + d->teardown.val = PROG_ ## x; \ + /* Fallthrough */ \ + case PROG_ ## x + + enum { + PROG_done = 1, + }; + + case 0: + PROGRESS(done): + break; + +#undef PROGRESS + + default: + BUG(); + } + + return 0; +} + +/* * Destroy a domain once all references to it have been dropped. Used either * from the RCU path, or from the domain_create() error path before the domain * is inserted into the domlist. @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid, if ( init_status & INIT_watchdog ) watchdog_domain_destroy(d); + /* Must not hit a continuation in this context. */ + ASSERT(domain_teardown(d) == 0); + _domain_destroy(d); return ERR_PTR(err); @@ -733,6 +789,9 @@ int domain_kill(struct domain *d) domain_set_outstanding_pages(d, 0); /* fallthrough */ case DOMDYING_dying: + rc = domain_teardown(d); + if ( rc ) + break; rc = evtchn_destroy(d); if ( rc ) break; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index faf5fda36f..3f35c537b8 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -525,6 +525,14 @@ struct domain /* Argo interdomain communication support */ struct argo_domain *argo; #endif + + /* + * Continuation information for domain_teardown(). All fields entirely + * private. + */ + struct { + unsigned int val; + } teardown; }; static inline struct page_list_head *page_to_list(
There is no common equivelent of domain_reliquish_resources(), which has caused various pieces of common cleanup to live in inappropriate places. Perhaps most obviously, evtchn_destroy() is called for every continuation of domain_reliquish_resources(), which can easily be thousands of times. Create domain_teardown() to be a new top level facility, and call it from the appropriate positions in domain_kill() and domain_create()'s error path. No change in behaviour yet. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien@xen.org> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> --- xen/common/domain.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++ xen/include/xen/sched.h | 8 +++++++ 2 files changed, 67 insertions(+)