[2/3] xen/domain: Introduce domain_teardown()

Message ID	20201221181446.7791-3-andrew.cooper3@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=3GnW=FZ=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9E81422D08 Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> IronPort-SDR: YM3is+7Zqq7V8NlCoaB8HckE19fpwhw6g/oSB9Nlw3U5Tvw3lNLeurhFXLG0q6HzmPtP1Vt6yd GDEVl5p1jIateQHhbHTGmRHS6KXmrnebYHDe1wWPcpOLvqZ8F4H+nVbgfF5Z5W10JiOsbtj5Qh m3HII8KswCQOiK4j0+m0fmUL319hk5NJ9MhnZAh9xPm5WMJ1GTaOetOZogH/kL5tUZdI6cdMeI wnErJIkn4lxsqG/04yvguXKumnXAV586kWuEVf2s/kkwFfOmjBvmgPMUws9E+L8WS2IoBuJXRr R/g= From: Andrew Cooper <andrew.cooper3@citrix.com> To: Xen-devel <xen-devel@lists.xenproject.org> CC: Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <JBeulich@suse.com>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>, Wei Liu <wl@xen.org>, Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>, Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> Subject: [PATCH 2/3] xen/domain: Introduce domain_teardown() Date: Mon, 21 Dec 2020 18:14:45 +0000 Message-ID: <20201221181446.7791-3-andrew.cooper3@citrix.com> In-Reply-To: <20201221181446.7791-1-andrew.cooper3@citrix.com> References: <20201221181446.7791-1-andrew.cooper3@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit
Series	xen/domain: More structured teardown \| expand [0/3] xen/domain: More structured teardown [1/3] xen/domain: Reorder trivial initialisation in early domain_create() [2/3] xen/domain: Introduce domain_teardown() [3/3] xen/evtchn: Clean up teardown handling

Andrew Cooper Dec. 21, 2020, 6:14 p.m. UTC

There is no common equivelent of domain_reliquish_resources(), which has
caused various pieces of common cleanup to live in inappropriate
places.

Perhaps most obviously, evtchn_destroy() is called for every continuation of
domain_reliquish_resources(), which can easily be thousands of times.

Create domain_teardown() to be a new top level facility, and call it from the
appropriate positions in domain_kill() and domain_create()'s error path.

No change in behaviour yet.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
---
 xen/common/domain.c     | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched.h |  8 +++++++
 2 files changed, 67 insertions(+)

Julien Grall Dec. 21, 2020, 6:36 p.m. UTC | #1

Hi Andrew,

On 21/12/2020 18:14, Andrew Cooper wrote:
> There is no common equivelent of domain_reliquish_resources(), which has

s/equivelent/equivalent/

> caused various pieces of common cleanup to live in inappropriate
> places.
> 
> Perhaps most obviously, evtchn_destroy() is called for every continuation of
> domain_reliquish_resources(), which can easily be thousands of times.

s/reliquish/relinquish/

> 
> Create domain_teardown() to be a new top level facility, and call it from the
> appropriate positions in domain_kill() and domain_create()'s error path.
> 
> No change in behaviour yet.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> ---
>   xen/common/domain.c     | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
>   xen/include/xen/sched.h |  8 +++++++
>   2 files changed, 67 insertions(+)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index ce3667f1b4..ef1987335b 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s)
>   custom_param("extra_guest_irqs", parse_extra_guest_irqs);
>   
>   /*
> + * Release resources held by a domain.  There may or may not be live
> + * references to the domain, and it may or may not be fully constructed.
> + *
> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
> + * to determine if live references to the domain exist, and also whether
> + * continuations are permitted.
> + *
> + * If d->is_dying is DOMDYING_dead, this must not return non-zero.
> + */
> +static int domain_teardown(struct domain *d)
> +{
> +    BUG_ON(!d->is_dying);
> +
> +    /*
> +     * This hypercall can take minutes of wallclock time to complete.  This
> +     * logic implements a co-routine, stashing state in struct domain across
> +     * hypercall continuation boundaries.
> +     */
> +    switch ( d->teardown.val )
> +    {
> +        /*
> +         * Record the current progress.  Subsequent hypercall continuations
> +         * will logically restart work from this point.
> +         *
> +         * PROGRESS() markers must not be in the middle of loops.  The loop
> +         * variable isn't preserved across a continuation.
> +         *
> +         * To avoid redundant work, there should be a marker before each
> +         * function which may return -ERESTART.
> +         */
> +#define PROGRESS(x)                             \
> +        d->teardown.val = PROG_ ## x;           \
> +        /* Fallthrough */                       \
> +    case PROG_ ## x
> +
> +        enum {
> +            PROG_done = 1,
> +        };
> +
> +    case 0:
> +    PROGRESS(done):
> +        break;
> +
> +#undef PROGRESS
> +
> +    default:
> +        BUG();
> +    }
> +
> +    return 0;
> +}
> +
> +/*
>    * Destroy a domain once all references to it have been dropped.  Used either
>    * from the RCU path, or from the domain_create() error path before the domain
>    * is inserted into the domlist.
> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>       if ( init_status & INIT_watchdog )
>           watchdog_domain_destroy(d);
>   
> +    /* Must not hit a continuation in this context. */
> +    ASSERT(domain_teardown(d) == 0);
The ASSERT() will become a NOP in production build, so 
domain_teardown_down() will not be called.

However, I think it would be better if we pass an extra argument to 
indicated wheter the code is allowed to preempt. This would make the 
preemption check more obvious in evtchn_destroy() compare to the current 
d->is_dying != DOMDYING_dead.

> +
>       _domain_destroy(d);
>   
>       return ERR_PTR(err);
> @@ -733,6 +789,9 @@ int domain_kill(struct domain *d)
>           domain_set_outstanding_pages(d, 0);
>           /* fallthrough */
>       case DOMDYING_dying:
> +        rc = domain_teardown(d);
> +        if ( rc )
> +            break;
>           rc = evtchn_destroy(d);
>           if ( rc )
>               break;
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index faf5fda36f..3f35c537b8 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -525,6 +525,14 @@ struct domain
>       /* Argo interdomain communication support */
>       struct argo_domain *argo;
>   #endif
> +
> +    /*
> +     * Continuation information for domain_teardown().  All fields entirely
> +     * private.
> +     */
> +    struct {
> +        unsigned int val;
> +    } teardown;
>   };
>   
>   static inline struct page_list_head *page_to_list(
> 

Cheers,

Andrew Cooper Dec. 21, 2020, 6:45 p.m. UTC | #2

On 21/12/2020 18:36, Julien Grall wrote:
>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>       if ( init_status & INIT_watchdog )
>>           watchdog_domain_destroy(d);
>>   +    /* Must not hit a continuation in this context. */
>> +    ASSERT(domain_teardown(d) == 0);
> The ASSERT() will become a NOP in production build, so
> domain_teardown_down() will not be called.

Urgh - its not really a nop, but it's evaluation isn't symmetric between
debug and release builds.  I'll need an extra local variable.

>
> However, I think it would be better if we pass an extra argument to
> indicated wheter the code is allowed to preempt. This would make the
> preemption check more obvious in evtchn_destroy() compare to the
> current d->is_dying != DOMDYING_dead.

We can have a predicate if you'd prefer, but plumbing an extra parameter
is wasteful, and can only cause confusion if it is out of sync with
d->is_dying.

~Andrew

Jan Beulich Dec. 22, 2020, 7:50 a.m. UTC | #3

On 21.12.2020 19:45, Andrew Cooper wrote:
> On 21/12/2020 18:36, Julien Grall wrote:
>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>       if ( init_status & INIT_watchdog )
>>>           watchdog_domain_destroy(d);
>>>   +    /* Must not hit a continuation in this context. */
>>> +    ASSERT(domain_teardown(d) == 0);
>> The ASSERT() will become a NOP in production build, so
>> domain_teardown_down() will not be called.
> 
> Urgh - its not really a nop, but it's evaluation isn't symmetric between
> debug and release builds.  I'll need an extra local variable.

Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
resulting constructs, and would like to propose an alternative,
even if I fear it'll be controversial.)

>> However, I think it would be better if we pass an extra argument to
>> indicated wheter the code is allowed to preempt. This would make the
>> preemption check more obvious in evtchn_destroy() compare to the
>> current d->is_dying != DOMDYING_dead.
> 
> We can have a predicate if you'd prefer, but plumbing an extra parameter
> is wasteful, and can only cause confusion if it is out of sync with
> d->is_dying.

I agree here - it wasn't so long ago that event_channel.c gained
a DOMDYING_dead check, and I don't see why we shouldn't extend
this approach to here and elsewhere.

Jan

Julien Grall Dec. 22, 2020, 10:25 a.m. UTC | #4

Hi,

On 22/12/2020 07:50, Jan Beulich wrote:
> On 21.12.2020 19:45, Andrew Cooper wrote:
>> On 21/12/2020 18:36, Julien Grall wrote:
>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>>        if ( init_status & INIT_watchdog )
>>>>            watchdog_domain_destroy(d);
>>>>    +    /* Must not hit a continuation in this context. */
>>>> +    ASSERT(domain_teardown(d) == 0);
>>> The ASSERT() will become a NOP in production build, so
>>> domain_teardown_down() will not be called.
>>
>> Urgh - its not really a nop, but it's evaluation isn't symmetric between
>> debug and release builds.  I'll need an extra local variable.
> 
> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
> resulting constructs, and would like to propose an alternative,
> even if I fear it'll be controversial.)
> 
>>> However, I think it would be better if we pass an extra argument to
>>> indicated wheter the code is allowed to preempt. This would make the
>>> preemption check more obvious in evtchn_destroy() compare to the
>>> current d->is_dying != DOMDYING_dead.
>>
>> We can have a predicate if you'd prefer, but plumbing an extra parameter
>> is wasteful, and can only cause confusion if it is out of sync with
>> d->is_dying.
> 
> I agree here - it wasn't so long ago that event_channel.c gained
> a DOMDYING_dead check, and I don't see why we shouldn't extend
> this approach to here and elsewhere.

I think the d->is_dying != DOMYING_dead is difficult to understand even 
with the comment on top. This was ok in one place, but now it will 
spread everywhere. So at least, I would suggest to introduce a wrapper 
that is better named.

There is also a futureproof concern. At the moment, we are considering 
the preemption will not be needed in domain_create(). I am ready to bet 
that the assumption is going to be broken sooner or later.

As we don't have an extra parameter, we would need to investigate all 
the callers in helpers to see where we need to drop the d->is_dying != 
DOMYING_dead. I am sure you will agree that there is a risk to screw up.

With an extra parameter, it is easier to re-enable preemption when needed.

Anyway, both of you seem to agree on the approach here. If that's what 
you want, then so it be, but that's not something I will feel confident 
to Ack. Although, I will not Nack it.

Cheers,

Jan Beulich Dec. 22, 2020, 10:35 a.m. UTC | #5

On 21.12.2020 19:14, Andrew Cooper wrote:
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s)
>  custom_param("extra_guest_irqs", parse_extra_guest_irqs);
>  
>  /*
> + * Release resources held by a domain.  There may or may not be live
> + * references to the domain, and it may or may not be fully constructed.
> + *
> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
> + * to determine if live references to the domain exist, and also whether
> + * continuations are permitted.
> + *
> + * If d->is_dying is DOMDYING_dead, this must not return non-zero.
> + */
> +static int domain_teardown(struct domain *d)
> +{
> +    BUG_ON(!d->is_dying);
> +
> +    /*
> +     * This hypercall can take minutes of wallclock time to complete.  This
> +     * logic implements a co-routine, stashing state in struct domain across
> +     * hypercall continuation boundaries.
> +     */
> +    switch ( d->teardown.val )
> +    {
> +        /*
> +         * Record the current progress.  Subsequent hypercall continuations
> +         * will logically restart work from this point.
> +         *
> +         * PROGRESS() markers must not be in the middle of loops.  The loop
> +         * variable isn't preserved across a continuation.
> +         *
> +         * To avoid redundant work, there should be a marker before each
> +         * function which may return -ERESTART.
> +         */
> +#define PROGRESS(x)                             \
> +        d->teardown.val = PROG_ ## x;           \
> +        /* Fallthrough */                       \
> +    case PROG_ ## x
> +
> +        enum {
> +            PROG_done = 1,
> +        };
> +
> +    case 0:
> +    PROGRESS(done):
> +        break;
> +
> +#undef PROGRESS
> +
> +    default:
> +        BUG();
> +    }
> +
> +    return 0;
> +}

While as an initial step this may be okay, I envision ordering issues
with domain_relinquish_resources() - there may be per-arch things that
want doing before certain common ones, and others which should only
be done when certain common teardown was already performed. Therefore
rather than this being a "common equivalent of
domain_relinquish_resources()", I'd rather see (and call) it a
(designated) replacement, where individual per-arch functions would
get called at appropriate times.

Jan

Jan Beulich Dec. 22, 2020, 10:53 a.m. UTC | #6

On 22.12.2020 11:25, Julien Grall wrote:
> On 22/12/2020 07:50, Jan Beulich wrote:
>> On 21.12.2020 19:45, Andrew Cooper wrote:
>>> On 21/12/2020 18:36, Julien Grall wrote:
>>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>>>        if ( init_status & INIT_watchdog )
>>>>>            watchdog_domain_destroy(d);
>>>>>    +    /* Must not hit a continuation in this context. */
>>>>> +    ASSERT(domain_teardown(d) == 0);
>>>> The ASSERT() will become a NOP in production build, so
>>>> domain_teardown_down() will not be called.
>>>
>>> Urgh - its not really a nop, but it's evaluation isn't symmetric between
>>> debug and release builds.  I'll need an extra local variable.
>>
>> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
>> resulting constructs, and would like to propose an alternative,
>> even if I fear it'll be controversial.)
>>
>>>> However, I think it would be better if we pass an extra argument to
>>>> indicated wheter the code is allowed to preempt. This would make the
>>>> preemption check more obvious in evtchn_destroy() compare to the
>>>> current d->is_dying != DOMDYING_dead.
>>>
>>> We can have a predicate if you'd prefer, but plumbing an extra parameter
>>> is wasteful, and can only cause confusion if it is out of sync with
>>> d->is_dying.
>>
>> I agree here - it wasn't so long ago that event_channel.c gained
>> a DOMDYING_dead check, and I don't see why we shouldn't extend
>> this approach to here and elsewhere.
> 
> I think the d->is_dying != DOMYING_dead is difficult to understand even 
> with the comment on top. This was ok in one place, but now it will 
> spread everywhere. So at least, I would suggest to introduce a wrapper 
> that is better named.
> 
> There is also a futureproof concern. At the moment, we are considering 
> the preemption will not be needed in domain_create(). I am ready to bet 
> that the assumption is going to be broken sooner or later.

This is a fair consideration, yet I'm having trouble seeing what it
might be that would cause domain_create() to require preemption.
The function is supposed to only produce an empty container. But yes,
if e.g. vCPU creation was to move here, the situation would indeed
change.

Jan

Julien Grall Dec. 22, 2020, 11:05 a.m. UTC | #7

Hi Jan,

On 22/12/2020 10:53, Jan Beulich wrote:
> On 22.12.2020 11:25, Julien Grall wrote:
>> On 22/12/2020 07:50, Jan Beulich wrote:
>>> On 21.12.2020 19:45, Andrew Cooper wrote:
>>>> On 21/12/2020 18:36, Julien Grall wrote:
>>>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>>>>         if ( init_status & INIT_watchdog )
>>>>>>             watchdog_domain_destroy(d);
>>>>>>     +    /* Must not hit a continuation in this context. */
>>>>>> +    ASSERT(domain_teardown(d) == 0);
>>>>> The ASSERT() will become a NOP in production build, so
>>>>> domain_teardown_down() will not be called.
>>>>
>>>> Urgh - its not really a nop, but it's evaluation isn't symmetric between
>>>> debug and release builds.  I'll need an extra local variable.
>>>
>>> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
>>> resulting constructs, and would like to propose an alternative,
>>> even if I fear it'll be controversial.)
>>>
>>>>> However, I think it would be better if we pass an extra argument to
>>>>> indicated wheter the code is allowed to preempt. This would make the
>>>>> preemption check more obvious in evtchn_destroy() compare to the
>>>>> current d->is_dying != DOMDYING_dead.
>>>>
>>>> We can have a predicate if you'd prefer, but plumbing an extra parameter
>>>> is wasteful, and can only cause confusion if it is out of sync with
>>>> d->is_dying.
>>>
>>> I agree here - it wasn't so long ago that event_channel.c gained
>>> a DOMDYING_dead check, and I don't see why we shouldn't extend
>>> this approach to here and elsewhere.
>>
>> I think the d->is_dying != DOMYING_dead is difficult to understand even
>> with the comment on top. This was ok in one place, but now it will
>> spread everywhere. So at least, I would suggest to introduce a wrapper
>> that is better named.
>>
>> There is also a futureproof concern. At the moment, we are considering
>> the preemption will not be needed in domain_create(). I am ready to bet
>> that the assumption is going to be broken sooner or later.
> 
> This is a fair consideration, yet I'm having trouble seeing what it
> might be that would cause domain_create() to require preemption.
> The function is supposed to only produce an empty container. But yes,
> if e.g. vCPU creation was to move here, the situation would indeed
> change.
We are not really creating an "empty container". There are at least one 
initial pool of memory allocated for HAP (256 pages) which need to be 
freed if we fail to create the domain.

There is a second pool in discussion for the IOMMU (see [1]) which will 
also initially allocate 256 pages.

To me it looks like domain_create() is getting quite big and preemption 
is going to be required sooner or later.

Cheers,

[1] <20201005094905.2929-6-paul@xen.org>

> 
> Jan
>

Andrew Cooper Dec. 22, 2020, 11:11 a.m. UTC | #8

On 22/12/2020 10:53, Jan Beulich wrote:
> On 22.12.2020 11:25, Julien Grall wrote:
>> On 22/12/2020 07:50, Jan Beulich wrote:
>>> On 21.12.2020 19:45, Andrew Cooper wrote:
>>>> On 21/12/2020 18:36, Julien Grall wrote:
>>>>>> @@ -553,6 +606,9 @@ struct domain *domain_create(domid_t domid,
>>>>>>        if ( init_status & INIT_watchdog )
>>>>>>            watchdog_domain_destroy(d);
>>>>>>    +    /* Must not hit a continuation in this context. */
>>>>>> +    ASSERT(domain_teardown(d) == 0);
>>>>> The ASSERT() will become a NOP in production build, so
>>>>> domain_teardown_down() will not be called.
>>>> Urgh - its not really a nop, but it's evaluation isn't symmetric between
>>>> debug and release builds.  I'll need an extra local variable.
>>> Or use ASSERT_UNREACHABLE(). (I admit I don't really like the
>>> resulting constructs, and would like to propose an alternative,
>>> even if I fear it'll be controversial.)
>>>
>>>>> However, I think it would be better if we pass an extra argument to
>>>>> indicated wheter the code is allowed to preempt. This would make the
>>>>> preemption check more obvious in evtchn_destroy() compare to the
>>>>> current d->is_dying != DOMDYING_dead.
>>>> We can have a predicate if you'd prefer, but plumbing an extra parameter
>>>> is wasteful, and can only cause confusion if it is out of sync with
>>>> d->is_dying.
>>> I agree here - it wasn't so long ago that event_channel.c gained
>>> a DOMDYING_dead check, and I don't see why we shouldn't extend
>>> this approach to here and elsewhere.
>> I think the d->is_dying != DOMYING_dead is difficult to understand even 
>> with the comment on top. This was ok in one place, but now it will 
>> spread everywhere. So at least, I would suggest to introduce a wrapper 
>> that is better named.
>>
>> There is also a futureproof concern. At the moment, we are considering 
>> the preemption will not be needed in domain_create(). I am ready to bet 
>> that the assumption is going to be broken sooner or later.
> This is a fair consideration, yet I'm having trouble seeing what it
> might be that would cause domain_create() to require preemption.
> The function is supposed to only produce an empty container. But yes,
> if e.g. vCPU creation was to move here, the situation would indeed
> change.

As discussed, I no longer think that is a good plan, especially if we
want a sane mechanism for not allocating AMX memory for domains not
configured to use AMX.

domain_create() (and vcpu_create() to a lesser extent) are the functions
which can't become preemptible, because they are the allocation and
setup of the objects which would be used to store continuation
information for other hypercalls.

The only option if these get too complicated is to split the complexity
out into other hypercalls.

~Andrew

Andrew Cooper Dec. 22, 2020, 11:46 a.m. UTC | #9

On 22/12/2020 10:35, Jan Beulich wrote:
> On 21.12.2020 19:14, Andrew Cooper wrote:
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s)
>>  custom_param("extra_guest_irqs", parse_extra_guest_irqs);
>>  
>>  /*
>> + * Release resources held by a domain.  There may or may not be live
>> + * references to the domain, and it may or may not be fully constructed.
>> + *
>> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
>> + * to determine if live references to the domain exist, and also whether
>> + * continuations are permitted.
>> + *
>> + * If d->is_dying is DOMDYING_dead, this must not return non-zero.
>> + */
>> +static int domain_teardown(struct domain *d)
>> +{
>> +    BUG_ON(!d->is_dying);
>> +
>> +    /*
>> +     * This hypercall can take minutes of wallclock time to complete.  This
>> +     * logic implements a co-routine, stashing state in struct domain across
>> +     * hypercall continuation boundaries.
>> +     */
>> +    switch ( d->teardown.val )
>> +    {
>> +        /*
>> +         * Record the current progress.  Subsequent hypercall continuations
>> +         * will logically restart work from this point.
>> +         *
>> +         * PROGRESS() markers must not be in the middle of loops.  The loop
>> +         * variable isn't preserved across a continuation.
>> +         *
>> +         * To avoid redundant work, there should be a marker before each
>> +         * function which may return -ERESTART.
>> +         */
>> +#define PROGRESS(x)                             \
>> +        d->teardown.val = PROG_ ## x;           \
>> +        /* Fallthrough */                       \
>> +    case PROG_ ## x
>> +
>> +        enum {
>> +            PROG_done = 1,
>> +        };
>> +
>> +    case 0:
>> +    PROGRESS(done):
>> +        break;
>> +
>> +#undef PROGRESS
>> +
>> +    default:
>> +        BUG();
>> +    }
>> +
>> +    return 0;
>> +}
> While as an initial step this may be okay, I envision ordering issues
> with domain_relinquish_resources() - there may be per-arch things that
> want doing before certain common ones, and others which should only
> be done when certain common teardown was already performed. Therefore
> rather than this being a "common equivalent of
> domain_relinquish_resources()", I'd rather see (and call) it a
> (designated) replacement, where individual per-arch functions would
> get called at appropriate times.

Over time, I do expect it to replace domain_relinquish_resources().  I'm
not sure if that will take the form of a specific arch_domain_teardown()
call (and reserving some space in teardown.val for arch use), or whether
it will be a load of possibly-CONFIG'd stubs.

I do also have a work in progress supporting per-vcpu continuations to
be able to remove the TODO's introduced into the paging() path.  However
there is a bit more plumbing work required than I have time right now,
so this will have to wait a little until I've got some of the more
urgent 4.15 work done.

~Andrew

Jan Beulich Dec. 22, 2020, 11:55 a.m. UTC | #10

On 22.12.2020 12:46, Andrew Cooper wrote:
> On 22/12/2020 10:35, Jan Beulich wrote:
>> On 21.12.2020 19:14, Andrew Cooper wrote:
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -273,6 +273,59 @@ static int __init parse_extra_guest_irqs(const char *s)
>>>  custom_param("extra_guest_irqs", parse_extra_guest_irqs);
>>>  
>>>  /*
>>> + * Release resources held by a domain.  There may or may not be live
>>> + * references to the domain, and it may or may not be fully constructed.
>>> + *
>>> + * d->is_dying differing between DOMDYING_dying and DOMDYING_dead can be used
>>> + * to determine if live references to the domain exist, and also whether
>>> + * continuations are permitted.
>>> + *
>>> + * If d->is_dying is DOMDYING_dead, this must not return non-zero.
>>> + */
>>> +static int domain_teardown(struct domain *d)
>>> +{
>>> +    BUG_ON(!d->is_dying);
>>> +
>>> +    /*
>>> +     * This hypercall can take minutes of wallclock time to complete.  This
>>> +     * logic implements a co-routine, stashing state in struct domain across
>>> +     * hypercall continuation boundaries.
>>> +     */
>>> +    switch ( d->teardown.val )
>>> +    {
>>> +        /*
>>> +         * Record the current progress.  Subsequent hypercall continuations
>>> +         * will logically restart work from this point.
>>> +         *
>>> +         * PROGRESS() markers must not be in the middle of loops.  The loop
>>> +         * variable isn't preserved across a continuation.
>>> +         *
>>> +         * To avoid redundant work, there should be a marker before each
>>> +         * function which may return -ERESTART.
>>> +         */
>>> +#define PROGRESS(x)                             \
>>> +        d->teardown.val = PROG_ ## x;           \
>>> +        /* Fallthrough */                       \
>>> +    case PROG_ ## x
>>> +
>>> +        enum {
>>> +            PROG_done = 1,
>>> +        };
>>> +
>>> +    case 0:
>>> +    PROGRESS(done):
>>> +        break;
>>> +
>>> +#undef PROGRESS
>>> +
>>> +    default:
>>> +        BUG();
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>> While as an initial step this may be okay, I envision ordering issues
>> with domain_relinquish_resources() - there may be per-arch things that
>> want doing before certain common ones, and others which should only
>> be done when certain common teardown was already performed. Therefore
>> rather than this being a "common equivalent of
>> domain_relinquish_resources()", I'd rather see (and call) it a
>> (designated) replacement, where individual per-arch functions would
>> get called at appropriate times.
> 
> Over time, I do expect it to replace domain_relinquish_resources().  I'm
> not sure if that will take the form of a specific arch_domain_teardown()
> call (and reserving some space in teardown.val for arch use), or whether
> it will be a load of possibly-CONFIG'd stubs.

I'd consider helpful if you could slightly re-word the description then.

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

[2/3] xen/domain: Introduce domain_teardown()

Commit Message

Comments

Patch