Revert "domctl: improve locking during domain destruction"

Message ID	de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia@amazon.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=L5TH=5J=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4E25E2076F From: Hongyan Xia <hx242@xen.org> To: xen-devel@lists.xenproject.org Date: Tue, 24 Mar 2020 15:21:58 +0000 Message-Id: <de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia@amazon.com> Subject: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction" Precedence: list Cc: Stefano Stabellini <sstabellini@kernel.org>, Julien Grall <julien@xen.org>, Wei Liu <wl@xen.org>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, George Dunlap <george.dunlap@citrix.com>, Jan Beulich <jbeulich@suse.com> Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	Revert "domctl: improve locking during domain destruction" \| expand Revert "domctl: improve locking during domain destruction"

Hongyan Xia March 24, 2020, 3:21 p.m. UTC

From: Hongyan Xia <hongyxia@amazon.com>

Unfortunately, even though that commit dropped the domctl lock and
allowed other domctl to continue, it created severe lock contention
within domain destructions themselves. Multiple domain destructions in
parallel now spin for the global heap lock when freeing memory and could
spend a long time before the next hypercall continuation. In contrast,
after dropping that commit, parallel domain destructions will just fail
to take the domctl lock, creating a hypercall continuation and backing
off immediately, allowing the thread that holds the lock to destroy a
domain much more quickly and allowing backed-off threads to process
events and irqs.

On a 144-core server with 4TiB of memory, destroying 32 guests (each
with 4 vcpus and 122GiB memory) simultaneously takes:

before the revert: 29 minutes
after the revert: 6 minutes

This is timed between the first page and the very last page of all 32
guests is released back to the heap.

This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
 xen/common/domain.c | 11 +----------
 xen/common/domctl.c |  5 +----
 2 files changed, 2 insertions(+), 14 deletions(-)

Jan Beulich March 24, 2020, 4:13 p.m. UTC | #1

On 24.03.2020 16:21, Hongyan Xia wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Unfortunately, even though that commit dropped the domctl lock and
> allowed other domctl to continue, it created severe lock contention
> within domain destructions themselves. Multiple domain destructions in
> parallel now spin for the global heap lock when freeing memory and could
> spend a long time before the next hypercall continuation.

I'm not at all happy to see this reverted; instead I was hoping that
we could drop the domctl lock in further cases. If a lack of
continuations is the problem, did you try forcing them to occur more
frequently?

> In contrast,
> after dropping that commit, parallel domain destructions will just fail
> to take the domctl lock, creating a hypercall continuation and backing
> off immediately, allowing the thread that holds the lock to destroy a
> domain much more quickly and allowing backed-off threads to process
> events and irqs.
> 
> On a 144-core server with 4TiB of memory, destroying 32 guests (each
> with 4 vcpus and 122GiB memory) simultaneously takes:
> 
> before the revert: 29 minutes
> after the revert: 6 minutes

This wants comparing against numbers demonstrating the bad effects of
the global domctl lock. Iirc they were quite a bit higher than 6 min,
perhaps depending on guest properties.

Jan

Julien Grall March 24, 2020, 6:39 p.m. UTC | #2

On 24/03/2020 16:13, Jan Beulich wrote:
> On 24.03.2020 16:21, Hongyan Xia wrote:
>> From: Hongyan Xia <hongyxia@amazon.com>
>> In contrast,
>> after dropping that commit, parallel domain destructions will just fail
>> to take the domctl lock, creating a hypercall continuation and backing
>> off immediately, allowing the thread that holds the lock to destroy a
>> domain much more quickly and allowing backed-off threads to process
>> events and irqs.
>>
>> On a 144-core server with 4TiB of memory, destroying 32 guests (each
>> with 4 vcpus and 122GiB memory) simultaneously takes:
>>
>> before the revert: 29 minutes
>> after the revert: 6 minutes
> 
> This wants comparing against numbers demonstrating the bad effects of
> the global domctl lock. Iirc they were quite a bit higher than 6 min,
> perhaps depending on guest properties.

Your original commit message doesn't contain any clue in which cases the 
domctl lock was an issue. So please provide information on the setups 
you think it will make it worse.

Cheers,

Julien Grall March 24, 2020, 6:40 p.m. UTC | #3

On 24/03/2020 15:21, Hongyan Xia wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Unfortunately, even though that commit dropped the domctl lock and
> allowed other domctl to continue, it created severe lock contention
> within domain destructions themselves. Multiple domain destructions in
> parallel now spin for the global heap lock when freeing memory and could
> spend a long time before the next hypercall continuation. In contrast,
> after dropping that commit, parallel domain destructions will just fail
> to take the domctl lock, creating a hypercall continuation and backing
> off immediately, allowing the thread that holds the lock to destroy a
> domain much more quickly and allowing backed-off threads to process
> events and irqs.
> 
> On a 144-core server with 4TiB of memory, destroying 32 guests (each
> with 4 vcpus and 122GiB memory) simultaneously takes:
> 
> before the revert: 29 minutes
> after the revert: 6 minutes
> 
> This is timed between the first page and the very last page of all 32
> guests is released back to the heap.
> 
> This reverts commit 228ab9992ffb1d8f9d2475f2581e68b2913acb88.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Julien Grall <julien@xen.org>

> ---
>   xen/common/domain.c | 11 +----------
>   xen/common/domctl.c |  5 +----
>   2 files changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index b4eb476a9c..7b02f5ead7 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -698,20 +698,11 @@ int domain_kill(struct domain *d)
>       if ( d == current->domain )
>           return -EINVAL;
>   
> -    /* Protected by d->domain_lock. */
> +    /* Protected by domctl_lock. */
>       switch ( d->is_dying )
>       {
>       case DOMDYING_alive:
> -        domain_unlock(d);
>           domain_pause(d);
> -        domain_lock(d);
> -        /*
> -         * With the domain lock dropped, d->is_dying may have changed. Call
> -         * ourselves recursively if so, which is safe as then we won't come
> -         * back here.
> -         */
> -        if ( d->is_dying != DOMDYING_alive )
> -            return domain_kill(d);
>           d->is_dying = DOMDYING_dying;
>           argo_destroy(d);
>           evtchn_destroy(d);
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index a69b3b59a8..e010079203 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -571,14 +571,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
>           break;
>   
>       case XEN_DOMCTL_destroydomain:
> -        domctl_lock_release();
> -        domain_lock(d);
>           ret = domain_kill(d);
> -        domain_unlock(d);
>           if ( ret == -ERESTART )
>               ret = hypercall_create_continuation(
>                   __HYPERVISOR_domctl, "h", u_domctl);
> -        goto domctl_out_unlock_domonly;
> +        break;
>   
>       case XEN_DOMCTL_setnodeaffinity:
>       {
>

Jan Beulich March 25, 2020, 7:11 a.m. UTC | #4

On 24.03.2020 19:39, Julien Grall wrote:
> On 24/03/2020 16:13, Jan Beulich wrote:
>> On 24.03.2020 16:21, Hongyan Xia wrote:
>>> From: Hongyan Xia <hongyxia@amazon.com>
>>> In contrast,
>>> after dropping that commit, parallel domain destructions will just fail
>>> to take the domctl lock, creating a hypercall continuation and backing
>>> off immediately, allowing the thread that holds the lock to destroy a
>>> domain much more quickly and allowing backed-off threads to process
>>> events and irqs.
>>>
>>> On a 144-core server with 4TiB of memory, destroying 32 guests (each
>>> with 4 vcpus and 122GiB memory) simultaneously takes:
>>>
>>> before the revert: 29 minutes
>>> after the revert: 6 minutes
>>
>> This wants comparing against numbers demonstrating the bad effects of
>> the global domctl lock. Iirc they were quite a bit higher than 6 min,
>> perhaps depending on guest properties.
> 
> Your original commit message doesn't contain any clue in which
> cases the domctl lock was an issue. So please provide information
> on the setups you think it will make it worse.

I did never observe the issue myself - let's see whether one of the SUSE
people possibly involved in this back then recall (or have further
pointers; Jim, Charles?), or whether any of the (partly former) Citrix
folks do. My vague recollection is that the issue was the tool stack as
a whole stalling for far too long in particular when destroying very
large guests. One important aspect not discussed in the commit message
at all is that holding the domctl lock block basically _all_ tool stack
operations (including e.g. creation of new guests), whereas the new
issue attempted to be addressed is limited to just domain cleanup.

Jan

Hongyan Xia March 26, 2020, 2:39 p.m. UTC | #5

On Wed, 2020-03-25 at 08:11 +0100, Jan Beulich wrote:
> On 24.03.2020 19:39, Julien Grall wrote:
> > On 24/03/2020 16:13, Jan Beulich wrote:
> > > On 24.03.2020 16:21, Hongyan Xia wrote:
> > > > From: Hongyan Xia <hongyxia@amazon.com>
> > > > In contrast,
> > > > after dropping that commit, parallel domain destructions will
> > > > just fail
> > > > to take the domctl lock, creating a hypercall continuation and
> > > > backing
> > > > off immediately, allowing the thread that holds the lock to
> > > > destroy a
> > > > domain much more quickly and allowing backed-off threads to
> > > > process
> > > > events and irqs.
> > > > 
> > > > On a 144-core server with 4TiB of memory, destroying 32 guests
> > > > (each
> > > > with 4 vcpus and 122GiB memory) simultaneously takes:
> > > > 
> > > > before the revert: 29 minutes
> > > > after the revert: 6 minutes
> > > 
> > > This wants comparing against numbers demonstrating the bad
> > > effects of
> > > the global domctl lock. Iirc they were quite a bit higher than 6
> > > min,
> > > perhaps depending on guest properties.
> > 
> > Your original commit message doesn't contain any clue in which
> > cases the domctl lock was an issue. So please provide information
> > on the setups you think it will make it worse.
> 
> I did never observe the issue myself - let's see whether one of the
> SUSE
> people possibly involved in this back then recall (or have further
> pointers; Jim, Charles?), or whether any of the (partly former)
> Citrix
> folks do. My vague recollection is that the issue was the tool stack
> as
> a whole stalling for far too long in particular when destroying very
> large guests. One important aspect not discussed in the commit
> message
> at all is that holding the domctl lock block basically _all_ tool
> stack
> operations (including e.g. creation of new guests), whereas the new
> issue attempted to be addressed is limited to just domain cleanup.

The best solution is to make the heap scalable instead of a global
lock, but that is not going to be trivial.

Of course, another solution is to keep the domctl lock dropped in
domain_kill() but have another domain_kill lock so that competing
domain_kill()s will try to take that lock and back off with hypercall
continuation. But this is kind of hacky (we introduce a lock to reduce
spinlock contention elsewhere), which is probably not a solution but a
workaround.

Seeing the dramatic increase from 6 to 29 minutes in concurrent guest
destruction, I wonder if the benefit of that commit can outweigh this
negative though.

Hongyan

Jim Fehlig March 26, 2020, 4:55 p.m. UTC | #6

On 3/25/20 1:11 AM, Jan Beulich wrote:
> On 24.03.2020 19:39, Julien Grall wrote:
>> On 24/03/2020 16:13, Jan Beulich wrote:
>>> On 24.03.2020 16:21, Hongyan Xia wrote:
>>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>> In contrast,
>>>> after dropping that commit, parallel domain destructions will just fail
>>>> to take the domctl lock, creating a hypercall continuation and backing
>>>> off immediately, allowing the thread that holds the lock to destroy a
>>>> domain much more quickly and allowing backed-off threads to process
>>>> events and irqs.
>>>>
>>>> On a 144-core server with 4TiB of memory, destroying 32 guests (each
>>>> with 4 vcpus and 122GiB memory) simultaneously takes:
>>>>
>>>> before the revert: 29 minutes
>>>> after the revert: 6 minutes
>>>
>>> This wants comparing against numbers demonstrating the bad effects of
>>> the global domctl lock. Iirc they were quite a bit higher than 6 min,
>>> perhaps depending on guest properties.
>>
>> Your original commit message doesn't contain any clue in which
>> cases the domctl lock was an issue. So please provide information
>> on the setups you think it will make it worse.
> 
> I did never observe the issue myself - let's see whether one of the SUSE
> people possibly involved in this back then recall (or have further
> pointers; Jim, Charles?), or whether any of the (partly former) Citrix
> folks do. My vague recollection is that the issue was the tool stack as
> a whole stalling for far too long in particular when destroying very
> large guests.

I too only have a vague memory of the issue but do recall shutting down large 
guests (e.g. 500GB) taking a long time and blocking other toolstack operations. 
I haven't checked on the behavior in quite some time though.

> One important aspect not discussed in the commit message
> at all is that holding the domctl lock block basically _all_ tool stack
> operations (including e.g. creation of new guests), whereas the new
> issue attempted to be addressed is limited to just domain cleanup.

I more vaguely recall shutting down the host taking a *long* time when dom0 had 
large amounts of memory, e.g. when it had all host memory (no dom0_mem= setting 
and autoballooning enabled).

Regards,
Jim

Julien Grall March 31, 2020, 10:31 a.m. UTC | #7

Hi Jim,

On 26/03/2020 16:55, Jim Fehlig wrote:
> On 3/25/20 1:11 AM, Jan Beulich wrote:
>> On 24.03.2020 19:39, Julien Grall wrote:
>>> On 24/03/2020 16:13, Jan Beulich wrote:
>>>> On 24.03.2020 16:21, Hongyan Xia wrote:
>>>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>>> In contrast,
>>>>> after dropping that commit, parallel domain destructions will just 
>>>>> fail
>>>>> to take the domctl lock, creating a hypercall continuation and backing
>>>>> off immediately, allowing the thread that holds the lock to destroy a
>>>>> domain much more quickly and allowing backed-off threads to process
>>>>> events and irqs.
>>>>>
>>>>> On a 144-core server with 4TiB of memory, destroying 32 guests (each
>>>>> with 4 vcpus and 122GiB memory) simultaneously takes:
>>>>>
>>>>> before the revert: 29 minutes
>>>>> after the revert: 6 minutes
>>>>
>>>> This wants comparing against numbers demonstrating the bad effects of
>>>> the global domctl lock. Iirc they were quite a bit higher than 6 min,
>>>> perhaps depending on guest properties.
>>>
>>> Your original commit message doesn't contain any clue in which
>>> cases the domctl lock was an issue. So please provide information
>>> on the setups you think it will make it worse.
>>
>> I did never observe the issue myself - let's see whether one of the SUSE
>> people possibly involved in this back then recall (or have further
>> pointers; Jim, Charles?), or whether any of the (partly former) Citrix
>> folks do. My vague recollection is that the issue was the tool stack as
>> a whole stalling for far too long in particular when destroying very
>> large guests.
> 
> I too only have a vague memory of the issue but do recall shutting down 
> large guests (e.g. 500GB) taking a long time and blocking other 
> toolstack operations. I haven't checked on the behavior in quite some 
> time though.

It might be worth checking how toolstack operations (such as domain 
creating) is affected by the revert. @Hongyan would you be able to test it?

> 
>> One important aspect not discussed in the commit message
>> at all is that holding the domctl lock block basically _all_ tool stack
>> operations (including e.g. creation of new guests), whereas the new
>> issue attempted to be addressed is limited to just domain cleanup.
> 
> I more vaguely recall shutting down the host taking a *long* time when 
> dom0 had large amounts of memory, e.g. when it had all host memory (no 
> dom0_mem= setting and autoballooning enabled).

AFAIK, we never relinquish memory from dom0. So I am not sure how a 
large amount of memory in Dom0 would affect the host shutting down.

Cheers,

Revert "domctl: improve locking during domain destruction"

Commit Message

Comments

Patch