diff mbox series

[XEN,for-4.13,v2] x86/domctl: have XEN_DOMCTL_getpageframeinfo3 preemptible

Message ID 20191125145915.106589-1-anthony.perard@citrix.com (mailing list archive)
State New, archived
Headers show
Series [XEN,for-4.13,v2] x86/domctl: have XEN_DOMCTL_getpageframeinfo3 preemptible | expand

Commit Message

Anthony PERARD Nov. 25, 2019, 2:59 p.m. UTC
This hypercall can take a long time to finish because it attempts to
grab the `hostp2m' lock up to 1024 times. The accumulated wait for the
lock can take several seconds.

This can easily happen with a guest with 32 vcpus and plenty of RAM,
during localhost migration.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---

Notes:
    Changes in v2:
    - fix coding style.
    - check for translated guests.
    - avoid preemption on the last iteration.
    - add a comment in the public header.
    
    Further possible improvement to the hypercall:
    - process several GFNs after grabbing the hostp2m lock
    - Remove the limit

 xen/arch/x86/domctl.c       | 20 ++++++++++++++++++++
 xen/include/public/domctl.h |  4 ++++
 2 files changed, 24 insertions(+)

Comments

Jan Beulich Nov. 25, 2019, 4:22 p.m. UTC | #1
On 25.11.2019 15:59, Anthony PERARD wrote:
> This hypercall can take a long time to finish because it attempts to
> grab the `hostp2m' lock up to 1024 times. The accumulated wait for the
> lock can take several seconds.
> 
> This can easily happen with a guest with 32 vcpus and plenty of RAM,
> during localhost migration.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

As indicated on v1 already, this being a workaround rather than a fix
should be stated clearly in the description. Especially if more such
operations turn up, it'll become increasingly obvious that the root
of the problem will need dealing with rather than papering over some
of the symptoms. With this taken care of I'd be (still hesitantly)
willing to give my ack for this as a short term "solution".

> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -425,6 +425,26 @@ long arch_do_domctl(
>                  ret = -EFAULT;
>                  break;
>              }
> +
> +            /*
> +             * Avoid checking for preemption when the `hostp2m' lock isn't
> +             * involve, i.e. non-translated guest, and avoid preemption on
> +             * the last iteration.
> +             */
> +            if ( paging_mode_translate(d) &&
> +                 likely((i + 1) < num) && hypercall_preempt_check() )
> +            {
> +                domctl->u.getpageframeinfo3.num = num - i - 1;
> +                domctl->u.getpageframeinfo3.array.p =
> +                    guest_handle + ((i + 1) * width);
> +                if ( __copy_to_guest(u_domctl, domctl, 1) )
> +                {
> +                    ret = -EFAULT;
> +                    break;
> +                }
> +                return hypercall_create_continuation(__HYPERVISOR_domctl,
> +                                                     "h", u_domctl);
> +            }
>          }
>  
>          break;
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index a03e80e5984a..1b69eb75cb20 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -163,6 +163,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t);
>  #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
>  
>  /* XEN_DOMCTL_getpageframeinfo3 */
> +/*
> + * Both value `num' and `array' are modified by the hypercall to allow
> + * preemption.

s/are/may be/ ?

If Juergen wants to still allow this in, I'd be fine taking care of both
remarks while committing.

Jan
Anthony PERARD Nov. 25, 2019, 5:37 p.m. UTC | #2
On Mon, Nov 25, 2019 at 05:22:19PM +0100, Jan Beulich wrote:
> On 25.11.2019 15:59, Anthony PERARD wrote:
> > This hypercall can take a long time to finish because it attempts to
> > grab the `hostp2m' lock up to 1024 times. The accumulated wait for the
> > lock can take several seconds.
> > 
> > This can easily happen with a guest with 32 vcpus and plenty of RAM,
> > during localhost migration.
> > 
> > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> 
> As indicated on v1 already, this being a workaround rather than a fix
> should be stated clearly in the description. Especially if more such
> operations turn up, it'll become increasingly obvious that the root
> of the problem will need dealing with rather than papering over some
> of the symptoms. With this taken care of I'd be (still hesitantly)
> willing to give my ack for this as a short term "solution".

Sorry to have lead you to believe that the patch was *the* solution to
the problem described. I don't think the patch itself is a workaround or
a fix, it is simply an improvement to the hypercall. That improvement
could be used to remove the limit on `num' (something that I've read on
xen-devel as a possible improvement).

Would it be enough to add this following paragraph to the commit description?

    While the patch doesn't fix the problem with the lock contention and
    the fact that the `hostp2m' lock is currently global (and not on a
    single page), it is still an improvement to the hypercall.


I don't like the terms "workaround" or "short term solution" as a
description for this patch. Both implies that the patch could be
reverted once the root issue is taking care of.

I'll keep working to try to improve the use of the hostp2m lock, at
least with that hypercall, but I don't have a solution yet and it would
be nice to have this patch in the release.

> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> > index a03e80e5984a..1b69eb75cb20 100644
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -163,6 +163,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t);
> >  #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
> >  
> >  /* XEN_DOMCTL_getpageframeinfo3 */
> > +/*
> > + * Both value `num' and `array' are modified by the hypercall to allow
> > + * preemption.
> 
> s/are/may be/ ?

I don't think the distinction is necessary. How would that be useful to
know that both values may not be modified? I though the goal of the
added description was to warn against reusing the values after calling
the hypercall.

Thanks,
Jan Beulich Nov. 26, 2019, 9:01 a.m. UTC | #3
On 25.11.2019 18:37, Anthony PERARD wrote:
> On Mon, Nov 25, 2019 at 05:22:19PM +0100, Jan Beulich wrote:
>> On 25.11.2019 15:59, Anthony PERARD wrote:
>>> This hypercall can take a long time to finish because it attempts to
>>> grab the `hostp2m' lock up to 1024 times. The accumulated wait for the
>>> lock can take several seconds.
>>>
>>> This can easily happen with a guest with 32 vcpus and plenty of RAM,
>>> during localhost migration.
>>>
>>> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
>>
>> As indicated on v1 already, this being a workaround rather than a fix
>> should be stated clearly in the description. Especially if more such
>> operations turn up, it'll become increasingly obvious that the root
>> of the problem will need dealing with rather than papering over some
>> of the symptoms. With this taken care of I'd be (still hesitantly)
>> willing to give my ack for this as a short term "solution".
> 
> Sorry to have lead you to believe that the patch was *the* solution to
> the problem described. I don't think the patch itself is a workaround or
> a fix, it is simply an improvement to the hypercall. That improvement
> could be used to remove the limit on `num' (something that I've read on
> xen-devel as a possible improvement).

Hmm, yes, this is a good point. I wonder why you don't drop the limit
then right away, at least for translated guests. This would then make
clear that ...

> Would it be enough to add this following paragraph to the commit description?
> 
>     While the patch doesn't fix the problem with the lock contention and
>     the fact that the `hostp2m' lock is currently global (and not on a
>     single page), it is still an improvement to the hypercall.
> 
> 
> I don't like the terms "workaround" or "short term solution" as a
> description for this patch. Both implies that the patch could be
> reverted once the root issue is taking care of.

... indeed the patch isn't a candidate for reverting down the road
(which so far I did in fact imply).

Of course if Jürgen indicated that he'd be willing to accept the
patch in its current form, but not in its possible extended one,
then - making the description state this planned improvement _and_
there being a promise to actually follow up for 4.14 - I'd be okay
with the code change remaining as it is.

Then again - dropping the (arbitrary) limit on the number of
entries isn't going to be really helpful when the hypercall even
with this limit in place may already take several seconds, as you
say. I'd agree though that the change still is a long term
improvement. So I would probably indeed leave the code change as
is, but amend your suggested addition to the description by
pointing out the possibility of dropping the arbitrary limit.

>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>> index a03e80e5984a..1b69eb75cb20 100644
>>> --- a/xen/include/public/domctl.h
>>> +++ b/xen/include/public/domctl.h
>>> @@ -163,6 +163,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t);
>>>  #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
>>>  
>>>  /* XEN_DOMCTL_getpageframeinfo3 */
>>> +/*
>>> + * Both value `num' and `array' are modified by the hypercall to allow
>>> + * preemption.
>>
>> s/are/may be/ ?
> 
> I don't think the distinction is necessary. How would that be useful to
> know that both values may not be modified? I though the goal of the
> added description was to warn against reusing the values after calling
> the hypercall.

If you write "are", you're saying that it _will_ be modified, i.e. a
caller may (even if just for some sanity checking) verify that the fields
indeed did change. I think wording in the public headers in particular
should precisely represent all possible behaviors.

Jan
George Dunlap Nov. 26, 2019, 10 a.m. UTC | #4
On 11/26/19 9:01 AM, Jan Beulich wrote:
>>>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>>>> index a03e80e5984a..1b69eb75cb20 100644
>>>> --- a/xen/include/public/domctl.h
>>>> +++ b/xen/include/public/domctl.h
>>>> @@ -163,6 +163,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t);
>>>>  #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
>>>>  
>>>>  /* XEN_DOMCTL_getpageframeinfo3 */
>>>> +/*
>>>> + * Both value `num' and `array' are modified by the hypercall to allow
>>>> + * preemption.
>>>
>>> s/are/may be/ ?
>>
>> I don't think the distinction is necessary. How would that be useful to
>> know that both values may not be modified? I though the goal of the
>> added description was to warn against reusing the values after calling
>> the hypercall.
> 
> If you write "are", you're saying that it _will_ be modified, i.e. a
> caller may (even if just for some sanity checking) verify that the fields
> indeed did change. I think wording in the public headers in particular
> should precisely represent all possible behaviors.

FWIW I agree with this.

 -George
Jürgen Groß Nov. 26, 2019, 1:01 p.m. UTC | #5
On 25.11.19 17:22, Jan Beulich wrote:
> On 25.11.2019 15:59, Anthony PERARD wrote:
>> This hypercall can take a long time to finish because it attempts to
>> grab the `hostp2m' lock up to 1024 times. The accumulated wait for the
>> lock can take several seconds.
>>
>> This can easily happen with a guest with 32 vcpus and plenty of RAM,
>> during localhost migration.
>>
>> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> 
> As indicated on v1 already, this being a workaround rather than a fix
> should be stated clearly in the description. Especially if more such
> operations turn up, it'll become increasingly obvious that the root
> of the problem will need dealing with rather than papering over some
> of the symptoms. With this taken care of I'd be (still hesitantly)
> willing to give my ack for this as a short term "solution".
> 
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -425,6 +425,26 @@ long arch_do_domctl(
>>                   ret = -EFAULT;
>>                   break;
>>               }
>> +
>> +            /*
>> +             * Avoid checking for preemption when the `hostp2m' lock isn't
>> +             * involve, i.e. non-translated guest, and avoid preemption on
>> +             * the last iteration.
>> +             */
>> +            if ( paging_mode_translate(d) &&
>> +                 likely((i + 1) < num) && hypercall_preempt_check() )
>> +            {
>> +                domctl->u.getpageframeinfo3.num = num - i - 1;
>> +                domctl->u.getpageframeinfo3.array.p =
>> +                    guest_handle + ((i + 1) * width);
>> +                if ( __copy_to_guest(u_domctl, domctl, 1) )
>> +                {
>> +                    ret = -EFAULT;
>> +                    break;
>> +                }
>> +                return hypercall_create_continuation(__HYPERVISOR_domctl,
>> +                                                     "h", u_domctl);
>> +            }
>>           }
>>   
>>           break;
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index a03e80e5984a..1b69eb75cb20 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -163,6 +163,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t);
>>   #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
>>   
>>   /* XEN_DOMCTL_getpageframeinfo3 */
>> +/*
>> + * Both value `num' and `array' are modified by the hypercall to allow
>> + * preemption.
> 
> s/are/may be/ ?
> 
> If Juergen wants to still allow this in, I'd be fine taking care of both
> remarks while committing.

Sorry, was only scanning for an Ack.

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen
diff mbox series

Patch

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 43e368d63bb9..b461aadbd640 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -425,6 +425,26 @@  long arch_do_domctl(
                 ret = -EFAULT;
                 break;
             }
+
+            /*
+             * Avoid checking for preemption when the `hostp2m' lock isn't
+             * involve, i.e. non-translated guest, and avoid preemption on
+             * the last iteration.
+             */
+            if ( paging_mode_translate(d) &&
+                 likely((i + 1) < num) && hypercall_preempt_check() )
+            {
+                domctl->u.getpageframeinfo3.num = num - i - 1;
+                domctl->u.getpageframeinfo3.array.p =
+                    guest_handle + ((i + 1) * width);
+                if ( __copy_to_guest(u_domctl, domctl, 1) )
+                {
+                    ret = -EFAULT;
+                    break;
+                }
+                return hypercall_create_continuation(__HYPERVISOR_domctl,
+                                                     "h", u_domctl);
+            }
         }
 
         break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index a03e80e5984a..1b69eb75cb20 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -163,6 +163,10 @@  DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t);
 #define XEN_DOMCTL_PFINFO_LTAB_MASK (0xfU<<28)
 
 /* XEN_DOMCTL_getpageframeinfo3 */
+/*
+ * Both value `num' and `array' are modified by the hypercall to allow
+ * preemption.
+ */
 struct xen_domctl_getpageframeinfo3 {
     /* IN variables. */
     uint64_aligned_t num;