diff mbox series

[2/3] xen/domain: Introduce domain_teardown()

Message ID 20201221181446.7791-3-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen/domain: More structured teardown | expand

Commit Message

Andrew Cooper Dec. 21, 2020, 6:14 p.m. UTC
There is no common equivelent of domain_reliquish_resources(), which has
caused various pieces of common cleanup to live in inappropriate
places.

Perhaps most obviously, evtchn_destroy() is called for every continuation of
domain_reliquish_resources(), which can easily be thousands of times.

Create domain_teardown() to be a new top level facility, and call it from the
appropriate positions in domain_kill() and domain_create()'s error path.

No change in behaviour yet.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
---
 xen/common/domain.c     | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched.h |  8 +++++++
 2 files changed, 67 insertions(+)

Comments

Julien Grall Dec. 21, 2020, 6:36 p.m. UTC | #1
Hi Andrew,

On 21/12/2020 18:14, Andrew Cooper wrote:
> There is no common equivelent of domain_reliquish_resources(), which has

s/equivelent/equivalent/

> caused various pieces of common cleanup to live in inappropriate
> places.
> 
> Perhaps most obviously, evtchn_destroy() is called for every continuation of
> domain_reliquish_resources(), which can easily be thousands of times.

s/reliquish/relinquish/

> 
> Create domain_teardown() to be a new top level facility, and call it from the
> appropriate positions in domain_kill() and domain_create()'s error path.
> 
> No change in behaviour yet.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> ---
>   xen/common/domain.c     | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
>   xen/include/xen/sched.h |  8 +++++++
>   2 files changed, 67 insertions(+)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index ce3667f1b4..ef1987335b 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s)
>   custom_param("extra_guest_irqs", parse_extra_guest_irqs);
>   
>   /*
> + * Release resources held by a domain.  There may or may not be live
> + * references to the domain, and it may or may not be fully constructed.
> + *
> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
> + * to determine if live references to the domain exist, and also whether
> + * continuations are permitted.
> + *
> + * If d->is_dying is DOMDYING_dead, this must not return non-zero.
> + */
> +static int domain_teardown(struct domain *d)
> +{
> +    BUG_ON(!d->is_dying);
> +
> +    /*
> +     * This hypercall can take minutes of wallclock time to complete.  This
> +     * logic implements a co-routine, stashing state in struct domain across
> +     * hypercall continuation boundaries.
> +     */
> +    switch ( d->teardown.val )
> +    {
> +        /*
> +         * Record the current progress.  Subsequent hypercall continuations
> +         * will logically restart work from this point.
> +         *
> +         * PROGRESS() markers must not be in the middle of loops.  The loop
> +         * variable isn't preserved across a continuation.
> +         *
> +         * To avoid redundant work, there should be a marker before each
> +         * function which may return -ERESTART.
> +         */
> +#define PROGRESS(x)                             \
> +        d->teardown.val = PROG_ ## x;           \
> +        /* Fallthrough */                       \
> +    case PROG_ ## x
> +
> +        enum {
> +            PROG_done = 1,
> +        };
> +
> +    case 0:
> +    PROGRESS(done):
> +        break;
> +
> +#undef PROGRESS
> +
> +    default:
> +        BUG();
> +    }
> +
> +    return 0;
> +}
> +
> +/*
>    * Destroy a domain once all references to it have been dropped.  Used either
>    * from the RCU path, or from the domain_create() error path before the domain
>    * is inserted into the domlist.
> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>       if ( init_status & INIT_watchdog )
>           watchdog_domain_destroy(d);
>   
> +    /* Must not hit a continuation in this context. */
> +    ASSERT(domain_teardown(d) == 0);
The ASSERT() will become a NOP in production build, so 
domain_teardown_down() will not be called.

However, I think it would be better if we pass an extra argument to 
indicated wheter the code is allowed to preempt. This would make the 
preemption check more obvious in evtchn_destroy() compare to the current 
d->is_dying != DOMDYING_dead.

> +
>       _domain_destroy(d);
>   
>       return ERR_PTR(err);
> @@ -733,6 +789,9 @@ int domain_kill(struct domain *d)
>           domain_set_outstanding_pages(d, 0);
>           /* fallthrough */
>       case DOMDYING_dying:
> +        rc = domain_teardown(d);
> +        if ( rc )
> +            break;
>           rc = evtchn_destroy(d);
>           if ( rc )
>               break;
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index faf5fda36f..3f35c537b8 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -525,6 +525,14 @@ struct domain
>       /* Argo interdomain communication support */
>       struct argo_domain *argo;
>   #endif
> +
> +    /*
> +     * Continuation information for domain_teardown().  All fields entirely
> +     * private.
> +     */
> +    struct {
> +        unsigned int val;
> +    } teardown;
>   };
>   
>   static inline struct page_list_head *page_to_list(
> 

Cheers,
Andrew Cooper Dec. 21, 2020, 6:45 p.m. UTC | #2
On 21/12/2020 18:36, Julien Grall wrote:
>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>       if ( init_status & INIT_watchdog )
>>           watchdog_domain_destroy(d);
>>   +    /* Must not hit a continuation in this context. */
>> +    ASSERT(domain_teardown(d) == 0);
> The ASSERT() will become a NOP in production build, so
> domain_teardown_down() will not be called.

Urgh - its not really a nop, but it's evaluation isn't symmetric between
debug and release builds.  I'll need an extra local variable.

>
> However, I think it would be better if we pass an extra argument to
> indicated wheter the code is allowed to preempt. This would make the
> preemption check more obvious in evtchn_destroy() compare to the
> current d->is_dying != DOMDYING_dead.

We can have a predicate if you'd prefer, but plumbing an extra parameter
is wasteful, and can only cause confusion if it is out of sync with
d->is_dying.

~Andrew
Jan Beulich Dec. 22, 2020, 7:50 a.m. UTC | #3
On 21.12.2020 19:45, Andrew Cooper wrote:
> On 21/12/2020 18:36, Julien Grall wrote:
>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>       if ( init_status & INIT_watchdog )
>>>           watchdog_domain_destroy(d);
>>>   +    /* Must not hit a continuation in this context. */
>>> +    ASSERT(domain_teardown(d) == 0);
>> The ASSERT() will become a NOP in production build, so
>> domain_teardown_down() will not be called.
> 
> Urgh - its not really a nop, but it's evaluation isn't symmetric between
> debug and release builds.  I'll need an extra local variable.

Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
resulting constructs, and would like to propose an alternative,
even if I fear it'll be controversial.)

>> However, I think it would be better if we pass an extra argument to
>> indicated wheter the code is allowed to preempt. This would make the
>> preemption check more obvious in evtchn_destroy() compare to the
>> current d->is_dying != DOMDYING_dead.
> 
> We can have a predicate if you'd prefer, but plumbing an extra parameter
> is wasteful, and can only cause confusion if it is out of sync with
> d->is_dying.

I agree here - it wasn't so long ago that event_channel.c gained
a DOMDYING_dead check, and I don't see why we shouldn't extend
this approach to here and elsewhere.

Jan
Julien Grall Dec. 22, 2020, 10:25 a.m. UTC | #4
Hi,

On 22/12/2020 07:50, Jan Beulich wrote:
> On 21.12.2020 19:45, Andrew Cooper wrote:
>> On 21/12/2020 18:36, Julien Grall wrote:
>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>>        if ( init_status & INIT_watchdog )
>>>>            watchdog_domain_destroy(d);
>>>>    +    /* Must not hit a continuation in this context. */
>>>> +    ASSERT(domain_teardown(d) == 0);
>>> The ASSERT() will become a NOP in production build, so
>>> domain_teardown_down() will not be called.
>>
>> Urgh - its not really a nop, but it's evaluation isn't symmetric between
>> debug and release builds.  I'll need an extra local variable.
> 
> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
> resulting constructs, and would like to propose an alternative,
> even if I fear it'll be controversial.)
> 
>>> However, I think it would be better if we pass an extra argument to
>>> indicated wheter the code is allowed to preempt. This would make the
>>> preemption check more obvious in evtchn_destroy() compare to the
>>> current d->is_dying != DOMDYING_dead.
>>
>> We can have a predicate if you'd prefer, but plumbing an extra parameter
>> is wasteful, and can only cause confusion if it is out of sync with
>> d->is_dying.
> 
> I agree here - it wasn't so long ago that event_channel.c gained
> a DOMDYING_dead check, and I don't see why we shouldn't extend
> this approach to here and elsewhere.

I think the d->is_dying != DOMYING_dead is difficult to understand even 
with the comment on top. This was ok in one place, but now it will 
spread everywhere. So at least, I would suggest to introduce a wrapper 
that is better named.

There is also a futureproof concern. At the moment, we are considering 
the preemption will not be needed in domain_create(). I am ready to bet 
that the assumption is going to be broken sooner or later.

As we don't have an extra parameter, we would need to investigate all 
the callers in helpers to see where we need to drop the d->is_dying != 
DOMYING_dead. I am sure you will agree that there is a risk to screw up.

With an extra parameter, it is easier to re-enable preemption when needed.

Anyway, both of you seem to agree on the approach here. If that's what 
you want, then so it be, but that's not something I will feel confident 
to Ack. Although, I will not Nack it.

Cheers,
Jan Beulich Dec. 22, 2020, 10:35 a.m. UTC | #5
On 21.12.2020 19:14, Andrew Cooper wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s)
>  custom_param("extra_guest_irqs", parse_extra_guest_irqs);
>  
>  /*
> + * Release resources held by a domain.  There may or may not be live
> + * references to the domain, and it may or may not be fully constructed.
> + *
> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
> + * to determine if live references to the domain exist, and also whether
> + * continuations are permitted.
> + *
> + * If d->is_dying is DOMDYING_dead, this must not return non-zero.
> + */
> +static int domain_teardown(struct domain *d)
> +{
> +    BUG_ON(!d->is_dying);
> +
> +    /*
> +     * This hypercall can take minutes of wallclock time to complete.  This
> +     * logic implements a co-routine, stashing state in struct domain across
> +     * hypercall continuation boundaries.
> +     */
> +    switch ( d->teardown.val )
> +    {
> +        /*
> +         * Record the current progress.  Subsequent hypercall continuations
> +         * will logically restart work from this point.
> +         *
> +         * PROGRESS() markers must not be in the middle of loops.  The loop
> +         * variable isn't preserved across a continuation.
> +         *
> +         * To avoid redundant work, there should be a marker before each
> +         * function which may return -ERESTART.
> +         */
> +#define PROGRESS(x)                             \
> +        d->teardown.val = PROG_ ## x;           \
> +        /* Fallthrough */                       \
> +    case PROG_ ## x
> +
> +        enum {
> +            PROG_done = 1,
> +        };
> +
> +    case 0:
> +    PROGRESS(done):
> +        break;
> +
> +#undef PROGRESS
> +
> +    default:
> +        BUG();
> +    }
> +
> +    return 0;
> +}

While as an initial step this may be okay, I envision ordering issues
with domain_relinquish_resources() - there may be per-arch things that
want doing before certain common ones, and others which should only
be done when certain common teardown was already performed. Therefore
rather than this being a "common equivalent of
domain_relinquish_resources()", I'd rather see (and call) it a
(designated) replacement, where individual per-arch functions would
get called at appropriate times.

Jan
Jan Beulich Dec. 22, 2020, 10:53 a.m. UTC | #6
On 22.12.2020 11:25, Julien Grall wrote:
> On 22/12/2020 07:50, Jan Beulich wrote:
>> On 21.12.2020 19:45, Andrew Cooper wrote:
>>> On 21/12/2020 18:36, Julien Grall wrote:
>>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>>>        if ( init_status & INIT_watchdog )
>>>>>            watchdog_domain_destroy(d);
>>>>>    +    /* Must not hit a continuation in this context. */
>>>>> +    ASSERT(domain_teardown(d) == 0);
>>>> The ASSERT() will become a NOP in production build, so
>>>> domain_teardown_down() will not be called.
>>>
>>> Urgh - its not really a nop, but it's evaluation isn't symmetric between
>>> debug and release builds.  I'll need an extra local variable.
>>
>> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
>> resulting constructs, and would like to propose an alternative,
>> even if I fear it'll be controversial.)
>>
>>>> However, I think it would be better if we pass an extra argument to
>>>> indicated wheter the code is allowed to preempt. This would make the
>>>> preemption check more obvious in evtchn_destroy() compare to the
>>>> current d->is_dying != DOMDYING_dead.
>>>
>>> We can have a predicate if you'd prefer, but plumbing an extra parameter
>>> is wasteful, and can only cause confusion if it is out of sync with
>>> d->is_dying.
>>
>> I agree here - it wasn't so long ago that event_channel.c gained
>> a DOMDYING_dead check, and I don't see why we shouldn't extend
>> this approach to here and elsewhere.
> 
> I think the d->is_dying != DOMYING_dead is difficult to understand even 
> with the comment on top. This was ok in one place, but now it will 
> spread everywhere. So at least, I would suggest to introduce a wrapper 
> that is better named.
> 
> There is also a futureproof concern. At the moment, we are considering 
> the preemption will not be needed in domain_create(). I am ready to bet 
> that the assumption is going to be broken sooner or later.

This is a fair consideration, yet I'm having trouble seeing what it
might be that would cause domain_create() to require preemption.
The function is supposed to only produce an empty container. But yes,
if e.g. vCPU creation was to move here, the situation would indeed
change.

Jan
Julien Grall Dec. 22, 2020, 11:05 a.m. UTC | #7
Hi Jan,

On 22/12/2020 10:53, Jan Beulich wrote:
> On 22.12.2020 11:25, Julien Grall wrote:
>> On 22/12/2020 07:50, Jan Beulich wrote:
>>> On 21.12.2020 19:45, Andrew Cooper wrote:
>>>> On 21/12/2020 18:36, Julien Grall wrote:
>>>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>>>>         if ( init_status & INIT_watchdog )
>>>>>>             watchdog_domain_destroy(d);
>>>>>>     +    /* Must not hit a continuation in this context. */
>>>>>> +    ASSERT(domain_teardown(d) == 0);
>>>>> The ASSERT() will become a NOP in production build, so
>>>>> domain_teardown_down() will not be called.
>>>>
>>>> Urgh - its not really a nop, but it's evaluation isn't symmetric between
>>>> debug and release builds.  I'll need an extra local variable.
>>>
>>> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
>>> resulting constructs, and would like to propose an alternative,
>>> even if I fear it'll be controversial.)
>>>
>>>>> However, I think it would be better if we pass an extra argument to
>>>>> indicated wheter the code is allowed to preempt. This would make the
>>>>> preemption check more obvious in evtchn_destroy() compare to the
>>>>> current d->is_dying != DOMDYING_dead.
>>>>
>>>> We can have a predicate if you'd prefer, but plumbing an extra parameter
>>>> is wasteful, and can only cause confusion if it is out of sync with
>>>> d->is_dying.
>>>
>>> I agree here - it wasn't so long ago that event_channel.c gained
>>> a DOMDYING_dead check, and I don't see why we shouldn't extend
>>> this approach to here and elsewhere.
>>
>> I think the d->is_dying != DOMYING_dead is difficult to understand even
>> with the comment on top. This was ok in one place, but now it will
>> spread everywhere. So at least, I would suggest to introduce a wrapper
>> that is better named.
>>
>> There is also a futureproof concern. At the moment, we are considering
>> the preemption will not be needed in domain_create(). I am ready to bet
>> that the assumption is going to be broken sooner or later.
> 
> This is a fair consideration, yet I'm having trouble seeing what it
> might be that would cause domain_create() to require preemption.
> The function is supposed to only produce an empty container. But yes,
> if e.g. vCPU creation was to move here, the situation would indeed
> change.
We are not really creating an "empty container". There are at least one 
initial pool of memory allocated for HAP (256 pages) which need to be 
freed if we fail to create the domain.

There is a second pool in discussion for the IOMMU (see [1]) which will 
also initially allocate 256 pages.

To me it looks like domain_create() is getting quite big and preemption 
is going to be required sooner or later.

Cheers,

[1] <20201005094905.2929-6-paul@xen.org>

> 
> Jan
>
Andrew Cooper Dec. 22, 2020, 11:11 a.m. UTC | #8
On 22/12/2020 10:53, Jan Beulich wrote:
> On 22.12.2020 11:25, Julien Grall wrote:
>> On 22/12/2020 07:50, Jan Beulich wrote:
>>> On 21.12.2020 19:45, Andrew Cooper wrote:
>>>> On 21/12/2020 18:36, Julien Grall wrote:
>>>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>>>>        if ( init_status & INIT_watchdog )
>>>>>>            watchdog_domain_destroy(d);
>>>>>>    +    /* Must not hit a continuation in this context. */
>>>>>> +    ASSERT(domain_teardown(d) == 0);
>>>>> The ASSERT() will become a NOP in production build, so
>>>>> domain_teardown_down() will not be called.
>>>> Urgh - its not really a nop, but it's evaluation isn't symmetric between
>>>> debug and release builds.  I'll need an extra local variable.
>>> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
>>> resulting constructs, and would like to propose an alternative,
>>> even if I fear it'll be controversial.)
>>>
>>>>> However, I think it would be better if we pass an extra argument to
>>>>> indicated wheter the code is allowed to preempt. This would make the
>>>>> preemption check more obvious in evtchn_destroy() compare to the
>>>>> current d->is_dying != DOMDYING_dead.
>>>> We can have a predicate if you'd prefer, but plumbing an extra parameter
>>>> is wasteful, and can only cause confusion if it is out of sync with
>>>> d->is_dying.
>>> I agree here - it wasn't so long ago that event_channel.c gained
>>> a DOMDYING_dead check, and I don't see why we shouldn't extend
>>> this approach to here and elsewhere.
>> I think the d->is_dying != DOMYING_dead is difficult to understand even 
>> with the comment on top. This was ok in one place, but now it will 
>> spread everywhere. So at least, I would suggest to introduce a wrapper 
>> that is better named.
>>
>> There is also a futureproof concern. At the moment, we are considering 
>> the preemption will not be needed in domain_create(). I am ready to bet 
>> that the assumption is going to be broken sooner or later.
> This is a fair consideration, yet I'm having trouble seeing what it
> might be that would cause domain_create() to require preemption.
> The function is supposed to only produce an empty container. But yes,
> if e.g. vCPU creation was to move here, the situation would indeed
> change.

As discussed, I no longer think that is a good plan, especially if we
want a sane mechanism for not allocating AMX memory for domains not
configured to use AMX.

domain_create() (and vcpu_create() to a lesser extent) are the functions
which can't become preemptible, because they are the allocation and
setup of the objects which would be used to store continuation
information for other hypercalls.

The only option if these get too complicated is to split the complexity
out into other hypercalls.

~Andrew
Andrew Cooper Dec. 22, 2020, 11:46 a.m. UTC | #9
On 22/12/2020 10:35, Jan Beulich wrote:
> On 21.12.2020 19:14, Andrew Cooper wrote:
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s)
>>  custom_param("extra_guest_irqs", parse_extra_guest_irqs);
>>  
>>  /*
>> + * Release resources held by a domain.  There may or may not be live
>> + * references to the domain, and it may or may not be fully constructed.
>> + *
>> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
>> + * to determine if live references to the domain exist, and also whether
>> + * continuations are permitted.
>> + *
>> + * If d->is_dying is DOMDYING_dead, this must not return non-zero.
>> + */
>> +static int domain_teardown(struct domain *d)
>> +{
>> +    BUG_ON(!d->is_dying);
>> +
>> +    /*
>> +     * This hypercall can take minutes of wallclock time to complete.  This
>> +     * logic implements a co-routine, stashing state in struct domain across
>> +     * hypercall continuation boundaries.
>> +     */
>> +    switch ( d->teardown.val )
>> +    {
>> +        /*
>> +         * Record the current progress.  Subsequent hypercall continuations
>> +         * will logically restart work from this point.
>> +         *
>> +         * PROGRESS() markers must not be in the middle of loops.  The loop
>> +         * variable isn't preserved across a continuation.
>> +         *
>> +         * To avoid redundant work, there should be a marker before each
>> +         * function which may return -ERESTART.
>> +         */
>> +#define PROGRESS(x)                             \
>> +        d->teardown.val = PROG_ ## x;           \
>> +        /* Fallthrough */                       \
>> +    case PROG_ ## x
>> +
>> +        enum {
>> +            PROG_done = 1,
>> +        };
>> +
>> +    case 0:
>> +    PROGRESS(done):
>> +        break;
>> +
>> +#undef PROGRESS
>> +
>> +    default:
>> +        BUG();
>> +    }
>> +
>> +    return 0;
>> +}
> While as an initial step this may be okay, I envision ordering issues
> with domain_relinquish_resources() - there may be per-arch things that
> want doing before certain common ones, and others which should only
> be done when certain common teardown was already performed. Therefore
> rather than this being a "common equivalent of
> domain_relinquish_resources()", I'd rather see (and call) it a
> (designated) replacement, where individual per-arch functions would
> get called at appropriate times.

Over time, I do expect it to replace domain_relinquish_resources().  I'm
not sure if that will take the form of a specific arch_domain_teardown()
call (and reserving some space in teardown.val for arch use), or whether
it will be a load of possibly-CONFIG'd stubs.

I do also have a work in progress supporting per-vcpu continuations to
be able to remove the TODO's introduced into the paging() path.  However
there is a bit more plumbing work required than I have time right now,
so this will have to wait a little until I've got some of the more
urgent 4.15 work done.

~Andrew
Jan Beulich Dec. 22, 2020, 11:55 a.m. UTC | #10
On 22.12.2020 12:46, Andrew Cooper wrote:
> On 22/12/2020 10:35, Jan Beulich wrote:
>> On 21.12.2020 19:14, Andrew Cooper wrote:
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s)
>>>  custom_param("extra_guest_irqs", parse_extra_guest_irqs);
>>>  
>>>  /*
>>> + * Release resources held by a domain.  There may or may not be live
>>> + * references to the domain, and it may or may not be fully constructed.
>>> + *
>>> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
>>> + * to determine if live references to the domain exist, and also whether
>>> + * continuations are permitted.
>>> + *
>>> + * If d->is_dying is DOMDYING_dead, this must not return non-zero.
>>> + */
>>> +static int domain_teardown(struct domain *d)
>>> +{
>>> +    BUG_ON(!d->is_dying);
>>> +
>>> +    /*
>>> +     * This hypercall can take minutes of wallclock time to complete.  This
>>> +     * logic implements a co-routine, stashing state in struct domain across
>>> +     * hypercall continuation boundaries.
>>> +     */
>>> +    switch ( d->teardown.val )
>>> +    {
>>> +        /*
>>> +         * Record the current progress.  Subsequent hypercall continuations
>>> +         * will logically restart work from this point.
>>> +         *
>>> +         * PROGRESS() markers must not be in the middle of loops.  The loop
>>> +         * variable isn't preserved across a continuation.
>>> +         *
>>> +         * To avoid redundant work, there should be a marker before each
>>> +         * function which may return -ERESTART.
>>> +         */
>>> +#define PROGRESS(x)                             \
>>> +        d->teardown.val = PROG_ ## x;           \
>>> +        /* Fallthrough */                       \
>>> +    case PROG_ ## x
>>> +
>>> +        enum {
>>> +            PROG_done = 1,
>>> +        };
>>> +
>>> +    case 0:
>>> +    PROGRESS(done):
>>> +        break;
>>> +
>>> +#undef PROGRESS
>>> +
>>> +    default:
>>> +        BUG();
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>> While as an initial step this may be okay, I envision ordering issues
>> with domain_relinquish_resources() - there may be per-arch things that
>> want doing before certain common ones, and others which should only
>> be done when certain common teardown was already performed. Therefore
>> rather than this being a "common equivalent of
>> domain_relinquish_resources()", I'd rather see (and call) it a
>> (designated) replacement, where individual per-arch functions would
>> get called at appropriate times.
> 
> Over time, I do expect it to replace domain_relinquish_resources().  I'm
> not sure if that will take the form of a specific arch_domain_teardown()
> call (and reserving some space in teardown.val for arch use), or whether
> it will be a load of possibly-CONFIG'd stubs.

I'd consider helpful if you could slightly re-word the description then.

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
diff mbox series

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index ce3667f1b4..ef1987335b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -273,6 +273,59 @@  static int __init parse_extra_guest_irqs(const char *s)
 custom_param("extra_guest_irqs", parse_extra_guest_irqs);
 
 /*
+ * Release resources held by a domain.  There may or may not be live
+ * references to the domain, and it may or may not be fully constructed.
+ *
+ * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
+ * to determine if live references to the domain exist, and also whether
+ * continuations are permitted.
+ *
+ * If d->is_dying is DOMDYING_dead, this must not return non-zero.
+ */
+static int domain_teardown(struct domain *d)
+{
+    BUG_ON(!d->is_dying);
+
+    /*
+     * This hypercall can take minutes of wallclock time to complete.  This
+     * logic implements a co-routine, stashing state in struct domain across
+     * hypercall continuation boundaries.
+     */
+    switch ( d->teardown.val )
+    {
+        /*
+         * Record the current progress.  Subsequent hypercall continuations
+         * will logically restart work from this point.
+         *
+         * PROGRESS() markers must not be in the middle of loops.  The loop
+         * variable isn't preserved across a continuation.
+         *
+         * To avoid redundant work, there should be a marker before each
+         * function which may return -ERESTART.
+         */
+#define PROGRESS(x)                             \
+        d->teardown.val = PROG_ ## x;           \
+        /* Fallthrough */                       \
+    case PROG_ ## x
+
+        enum {
+            PROG_done = 1,
+        };
+
+    case 0:
+    PROGRESS(done):
+        break;
+
+#undef PROGRESS
+
+    default:
+        BUG();
+    }
+
+    return 0;
+}
+
+/*
  * Destroy a domain once all references to it have been dropped.  Used either
  * from the RCU path, or from the domain_create() error path before the domain
  * is inserted into the domlist.
@@ -553,6 +606,9 @@  struct domain *domain_create(domid_t domid,
     if ( init_status & INIT_watchdog )
         watchdog_domain_destroy(d);
 
+    /* Must not hit a continuation in this context. */
+    ASSERT(domain_teardown(d) == 0);
+
     _domain_destroy(d);
 
     return ERR_PTR(err);
@@ -733,6 +789,9 @@  int domain_kill(struct domain *d)
         domain_set_outstanding_pages(d, 0);
         /* fallthrough */
     case DOMDYING_dying:
+        rc = domain_teardown(d);
+        if ( rc )
+            break;
         rc = evtchn_destroy(d);
         if ( rc )
             break;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index faf5fda36f..3f35c537b8 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -525,6 +525,14 @@  struct domain
     /* Argo interdomain communication support */
     struct argo_domain *argo;
 #endif
+
+    /*
+     * Continuation information for domain_teardown().  All fields entirely
+     * private.
+     */
+    struct {
+        unsigned int val;
+    } teardown;
 };
 
 static inline struct page_list_head *page_to_list(