diff mbox series

[v1] domctl: hold domctl lock while domain is destroyed

Message ID 2e7044de3cd8a6768a20250e61fe262f3a018724.1631790362.git.isaikin-dmitry@yandex.ru (mailing list archive)
State New, archived
Headers show
Series [v1] domctl: hold domctl lock while domain is destroyed | expand

Commit Message

Dmitry Isaykin Sept. 16, 2021, 11:10 a.m. UTC
From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>

This significantly speeds up concurrent destruction of multiple domains on x86.

I identify the place taking the most time:

    do_domctl(case XEN_DOMCTL_destroydomain)
      -> domain_kill()
           -> domain_relinquish_resources()
                -> relinquish_memory(d, &d->page_list, PGT_l4_page_table)

My reference setup: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Xen 4.14.

I use this command for test:

    for i in $(seq 1 5) ; do xl destroy test-vm-${i} & done

Without holding the lock all calls of `relinquish_memory(d, &d->page_list, PGT_l4_page_table)`
took on my setup (for HVM with 2GB of memory) about 3 seconds for each destroying domain.

With holding the lock it took only 100 ms.

Signed-off-by: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
---
 xen/common/domctl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Jan Beulich Sept. 16, 2021, 12:30 p.m. UTC | #1
On 16.09.2021 13:10, Dmitry Isaikin wrote:
> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
> 
> This significantly speeds up concurrent destruction of multiple domains on x86.

This effectively is a simplistic revert of 228ab9992ffb ("domctl:
improve locking during domain destruction"). There it was found to
actually improve things; now you're claiming the opposite. It'll
take more justification, clearly identifying that you actually
revert an earlier change, and an explanation why then you don't
revert that change altogether. You will want to specifically also
consider the cleaning up of huge VMs, where use of the (global)
domctl lock may hamper progress of other (parallel) operations on
the system.

> I identify the place taking the most time:
> 
>     do_domctl(case XEN_DOMCTL_destroydomain)
>       -> domain_kill()
>            -> domain_relinquish_resources()
>                 -> relinquish_memory(d, &d->page_list, PGT_l4_page_table)
> 
> My reference setup: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Xen 4.14.
> 
> I use this command for test:
> 
>     for i in $(seq 1 5) ; do xl destroy test-vm-${i} & done
> 
> Without holding the lock all calls of `relinquish_memory(d, &d->page_list, PGT_l4_page_table)`
> took on my setup (for HVM with 2GB of memory) about 3 seconds for each destroying domain.
> 
> With holding the lock it took only 100 ms.

I'm further afraid I can't make the connection. Do you have an
explanation for why there would be such a massive difference?
What would prevent progress of relinquish_memory() with the
domctl lock not held?

Jan
Roger Pau Monné Sept. 16, 2021, 1:08 p.m. UTC | #2
On Thu, Sep 16, 2021 at 02:30:39PM +0200, Jan Beulich wrote:
> On 16.09.2021 13:10, Dmitry Isaikin wrote:
> > From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
> > 
> > This significantly speeds up concurrent destruction of multiple domains on x86.
> 
> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
> improve locking during domain destruction"). There it was found to
> actually improve things; now you're claiming the opposite. It'll
> take more justification, clearly identifying that you actually
> revert an earlier change, and an explanation why then you don't
> revert that change altogether. You will want to specifically also
> consider the cleaning up of huge VMs, where use of the (global)
> domctl lock may hamper progress of other (parallel) operations on
> the system.
> 
> > I identify the place taking the most time:
> > 
> >     do_domctl(case XEN_DOMCTL_destroydomain)
> >       -> domain_kill()
> >            -> domain_relinquish_resources()
> >                 -> relinquish_memory(d, &d->page_list, PGT_l4_page_table)
> > 
> > My reference setup: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Xen 4.14.
> > 
> > I use this command for test:
> > 
> >     for i in $(seq 1 5) ; do xl destroy test-vm-${i} & done
> > 
> > Without holding the lock all calls of `relinquish_memory(d, &d->page_list, PGT_l4_page_table)`
> > took on my setup (for HVM with 2GB of memory) about 3 seconds for each destroying domain.
> > 
> > With holding the lock it took only 100 ms.
> 
> I'm further afraid I can't make the connection. Do you have an
> explanation for why there would be such a massive difference?
> What would prevent progress of relinquish_memory() with the
> domctl lock not held?

I would recommend to Dmitry to use lock profiling with and without
this change applied and try to spot which lock is causing the
contention as a starting point. That should be fairly easy and could
share some light.

Regards, Roger.
Andrew Cooper Sept. 16, 2021, 5:52 p.m. UTC | #3
On 16/09/2021 13:30, Jan Beulich wrote:
> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>
>> This significantly speeds up concurrent destruction of multiple domains on x86.
> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
> improve locking during domain destruction"). There it was found to
> actually improve things;

Was it?  I recall that it was simply an expectation that performance
would be better...

Amazon previously identified 228ab9992ffb as a massive perf hit, too.

Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
incomplete, and it appears as if it wasn't necessarily a wise move in
hindsight.

~Andrew
Jan Beulich Sept. 17, 2021, 6:17 a.m. UTC | #4
On 16.09.2021 19:52, Andrew Cooper wrote:
> On 16/09/2021 13:30, Jan Beulich wrote:
>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>
>>> This significantly speeds up concurrent destruction of multiple domains on x86.
>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>> improve locking during domain destruction"). There it was found to
>> actually improve things;
> 
> Was it?  I recall that it was simply an expectation that performance
> would be better...

My recollection is that it was, for one of our customers.

> Amazon previously identified 228ab9992ffb as a massive perf hit, too.

Interesting. I don't recall any mail to that effect.

> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
> incomplete, and it appears as if it wasn't necessarily a wise move in
> hindsight.

Possible; I continue to think though that the present observation wants
properly understanding instead of more or less blindly undoing that
change.

Jan
Julien Grall Sept. 17, 2021, 9:27 a.m. UTC | #5
Hi,

(+ some AWS folks)

On 17/09/2021 11:17, Jan Beulich wrote:
> On 16.09.2021 19:52, Andrew Cooper wrote:
>> On 16/09/2021 13:30, Jan Beulich wrote:
>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>>
>>>> This significantly speeds up concurrent destruction of multiple domains on x86.
>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>> improve locking during domain destruction"). There it was found to
>>> actually improve things;
>>
>> Was it?  I recall that it was simply an expectation that performance
>> would be better...
> 
> My recollection is that it was, for one of our customers.
> 
>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
> 
> Interesting. I don't recall any mail to that effect.

Here we go:

https://lore.kernel.org/xen-devel/de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia@amazon.com/

We have been using the revert for quite a while in production and didn't 
notice any regression.

> 
>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>> incomplete, and it appears as if it wasn't necessarily a wise move in
>> hindsight.
> 
> Possible; I continue to think though that the present observation wants
> properly understanding instead of more or less blindly undoing that
> change.

To be honest, I think this is the other way around. You wrote and merged 
a patch with the following justification:

"
     There is no need to hold the global domctl lock across domain_kill() -
     the domain lock is fully sufficient here, and parallel cleanup after
     multiple domains performs quite a bit better this way.
"

Clearly, the original commit message is lacking details on the exact 
setups and numbers. But we now have two stakeholders with proof that 
your patch is harmful to the setup you claim perform better with your patch.

To me this is enough justification to revert the original patch. Anyone 
against the revert, should provide clear details of why the patch should 
not be reverted.

Cheers,
Andrew Cooper Sept. 17, 2021, 9:41 a.m. UTC | #6
On 17/09/2021 10:27, Julien Grall wrote:
> Hi,
> 
> (+ some AWS folks)
> 
> On 17/09/2021 11:17, Jan Beulich wrote:
>> On 16.09.2021 19:52, Andrew Cooper wrote:
>>> On 16/09/2021 13:30, Jan Beulich wrote:
>>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>>>
>>>>> This significantly speeds up concurrent destruction of multiple
>>>>> domains on x86.
>>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>>> improve locking during domain destruction"). There it was found to
>>>> actually improve things;
>>>
>>> Was it?  I recall that it was simply an expectation that performance
>>> would be better...
>>
>> My recollection is that it was, for one of our customers.
>>
>>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
>>
>> Interesting. I don't recall any mail to that effect.
> 
> Here we go:
> 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Fde46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia%40amazon.com%2F&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C8cf65b3fb3324abe7cf108d979bd7171%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637674676843910175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=si7eYIxSqsJY77sWuwsad5MzJDMzGF%2F8L0JxGrWTmtI%3D&amp;reserved=0
> 
> 
> We have been using the revert for quite a while in production and didn't
> notice any regression.
> 
>>
>>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>>> incomplete, and it appears as if it wasn't necessarily a wise move in
>>> hindsight.
>>
>> Possible; I continue to think though that the present observation wants
>> properly understanding instead of more or less blindly undoing that
>> change.
> 
> To be honest, I think this is the other way around. You wrote and merged
> a patch with the following justification:
> 
> "
>     There is no need to hold the global domctl lock across domain_kill() -
>     the domain lock is fully sufficient here, and parallel cleanup after
>     multiple domains performs quite a bit better this way.
> "
> 
> Clearly, the original commit message is lacking details on the exact
> setups and numbers. But we now have two stakeholders with proof that
> your patch is harmful to the setup you claim perform better with your
> patch.
> 
> To me this is enough justification to revert the original patch. Anyone
> against the revert, should provide clear details of why the patch should
> not be reverted.

I second a revert.

I was concerned at the time that the claim was unsubstantiated, and now
there is plenty of evidence to counter the claim.

~Andrew
Jan Beulich Sept. 17, 2021, 9:47 a.m. UTC | #7
On 17.09.2021 11:41, Andrew Cooper wrote:
> On 17/09/2021 10:27, Julien Grall wrote:
>> Hi,
>>
>> (+ some AWS folks)
>>
>> On 17/09/2021 11:17, Jan Beulich wrote:
>>> On 16.09.2021 19:52, Andrew Cooper wrote:
>>>> On 16/09/2021 13:30, Jan Beulich wrote:
>>>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>>>>
>>>>>> This significantly speeds up concurrent destruction of multiple
>>>>>> domains on x86.
>>>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>>>> improve locking during domain destruction"). There it was found to
>>>>> actually improve things;
>>>>
>>>> Was it?  I recall that it was simply an expectation that performance
>>>> would be better...
>>>
>>> My recollection is that it was, for one of our customers.
>>>
>>>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
>>>
>>> Interesting. I don't recall any mail to that effect.
>>
>> Here we go:
>>
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Fde46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia%40amazon.com%2F&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C8cf65b3fb3324abe7cf108d979bd7171%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637674676843910175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=si7eYIxSqsJY77sWuwsad5MzJDMzGF%2F8L0JxGrWTmtI%3D&amp;reserved=0
>>
>>
>> We have been using the revert for quite a while in production and didn't
>> notice any regression.
>>
>>>
>>>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>>>> incomplete, and it appears as if it wasn't necessarily a wise move in
>>>> hindsight.
>>>
>>> Possible; I continue to think though that the present observation wants
>>> properly understanding instead of more or less blindly undoing that
>>> change.
>>
>> To be honest, I think this is the other way around. You wrote and merged
>> a patch with the following justification:
>>
>> "
>>     There is no need to hold the global domctl lock across domain_kill() -
>>     the domain lock is fully sufficient here, and parallel cleanup after
>>     multiple domains performs quite a bit better this way.
>> "
>>
>> Clearly, the original commit message is lacking details on the exact
>> setups and numbers. But we now have two stakeholders with proof that
>> your patch is harmful to the setup you claim perform better with your
>> patch.
>>
>> To me this is enough justification to revert the original patch. Anyone
>> against the revert, should provide clear details of why the patch should
>> not be reverted.
> 
> I second a revert.
> 
> I was concerned at the time that the claim was unsubstantiated, and now
> there is plenty of evidence to counter the claim.

Well, I won't object to a proper revert. I still think we'd better get to
the bottom of this, not the least because I thought there was agreement
that mid to long term we should get rid of global locking wherever
possible. Or are both of you saying that using a global lock here is
obviously fine? And does either of you have at least a theory to explain
the observation? I can only say that I find it puzzling.

Jan
Julien Grall Sept. 17, 2021, 4:01 p.m. UTC | #8
Hi Jan,

On 17/09/2021 14:47, Jan Beulich wrote:
> On 17.09.2021 11:41, Andrew Cooper wrote:
>> On 17/09/2021 10:27, Julien Grall wrote:
>>> Hi,
>>>
>>> (+ some AWS folks)
>>>
>>> On 17/09/2021 11:17, Jan Beulich wrote:
>>>> On 16.09.2021 19:52, Andrew Cooper wrote:
>>>>> On 16/09/2021 13:30, Jan Beulich wrote:
>>>>>> On 16.09.2021 13:10, Dmitry Isaikin wrote:
>>>>>>> From: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
>>>>>>>
>>>>>>> This significantly speeds up concurrent destruction of multiple
>>>>>>> domains on x86.
>>>>>> This effectively is a simplistic revert of 228ab9992ffb ("domctl:
>>>>>> improve locking during domain destruction"). There it was found to
>>>>>> actually improve things;
>>>>>
>>>>> Was it?  I recall that it was simply an expectation that performance
>>>>> would be better...
>>>>
>>>> My recollection is that it was, for one of our customers.
>>>>
>>>>> Amazon previously identified 228ab9992ffb as a massive perf hit, too.
>>>>
>>>> Interesting. I don't recall any mail to that effect.
>>>
>>> Here we go:
>>>
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fxen-devel%2Fde46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongyxia%40amazon.com%2F&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C8cf65b3fb3324abe7cf108d979bd7171%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637674676843910175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=si7eYIxSqsJY77sWuwsad5MzJDMzGF%2F8L0JxGrWTmtI%3D&amp;reserved=0
>>>
>>>
>>> We have been using the revert for quite a while in production and didn't
>>> notice any regression.
>>>
>>>>
>>>>> Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
>>>>> incomplete, and it appears as if it wasn't necessarily a wise move in
>>>>> hindsight.
>>>>
>>>> Possible; I continue to think though that the present observation wants
>>>> properly understanding instead of more or less blindly undoing that
>>>> change.
>>>
>>> To be honest, I think this is the other way around. You wrote and merged
>>> a patch with the following justification:
>>>
>>> "
>>>      There is no need to hold the global domctl lock across domain_kill() -
>>>      the domain lock is fully sufficient here, and parallel cleanup after
>>>      multiple domains performs quite a bit better this way.
>>> "
>>>
>>> Clearly, the original commit message is lacking details on the exact
>>> setups and numbers. But we now have two stakeholders with proof that
>>> your patch is harmful to the setup you claim perform better with your
>>> patch.
>>>
>>> To me this is enough justification to revert the original patch. Anyone
>>> against the revert, should provide clear details of why the patch should
>>> not be reverted.
>>
>> I second a revert.
>>
>> I was concerned at the time that the claim was unsubstantiated, and now
>> there is plenty of evidence to counter the claim.
> 
> Well, I won't object to a proper revert. I still think we'd better get to
> the bottom of this, not the least because I thought there was agreement
> that mid to long term we should get rid of global locking wherever
> possible. Or are both of you saying that using a global lock here is
> obviously fine? And does either of you have at least a theory to explain
> the observation? I can only say that I find it puzzling.

I will quote what Hongyan wrote back on the first report:

"
The best solution is to make the heap scalable instead of a global
lock, but that is not going to be trivial.

Of course, another solution is to keep the domctl lock dropped in
domain_kill() but have another domain_kill lock so that competing
domain_kill()s will try to take that lock and back off with hypercall
continuation. But this is kind of hacky (we introduce a lock to reduce
spinlock contention elsewhere), which is probably not a solution but a
workaround.
"

Cheers,
Jan Beulich Sept. 20, 2021, 8:19 a.m. UTC | #9
On 17.09.2021 18:01, Julien Grall wrote:
> I will quote what Hongyan wrote back on the first report:
> 
> "
> The best solution is to make the heap scalable instead of a global
> lock, but that is not going to be trivial.
> 
> Of course, another solution is to keep the domctl lock dropped in
> domain_kill() but have another domain_kill lock so that competing
> domain_kill()s will try to take that lock and back off with hypercall
> continuation. But this is kind of hacky (we introduce a lock to reduce
> spinlock contention elsewhere), which is probably not a solution but a
> workaround.
> "

Interesting. Is contention on the heap lock merely suspected or
was that lock proven to be the problem one?

Jan
diff mbox series

Patch

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 12d6144d28..b9a50d3e5d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -497,14 +497,13 @@  long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         break;
 
     case XEN_DOMCTL_destroydomain:
-        domctl_lock_release();
         domain_lock(d);
         ret = domain_kill(d);
         domain_unlock(d);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(
                 __HYPERVISOR_domctl, "h", u_domctl);
-        goto domctl_out_unlock_domonly;
+        break;
 
     case XEN_DOMCTL_setnodeaffinity:
     {