diff mbox series

[4/6] xen/hypercall: Cope with -ERESTART on more hypercall paths

Message ID 20191205223008.8623-5-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen: Support continuations from tasklets | expand

Commit Message

Andrew Cooper Dec. 5, 2019, 10:30 p.m. UTC
These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
folding existing contination logic where applicable.

In addition:
 * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
   tight spinlock loop, and make the continuation logic common at the
   epilogue.
 * Contrary to the comment in the code, kexec_exec() does return in the
   KEXEC_REBOOT case, needs to pass ret back to the caller.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
---
 xen/arch/x86/platform_hypercall.c | 14 ++++++++++++--
 xen/common/compat/domain.c        |  9 ++++-----
 xen/common/domain.c               |  8 ++++----
 xen/common/kexec.c                | 20 ++++++++++++++++----
 xen/common/sysctl.c               | 13 +++++++++++--
 5 files changed, 47 insertions(+), 17 deletions(-)

Comments

Julien Grall Dec. 8, 2019, 12:57 p.m. UTC | #1
Hi Andrew,

On 05/12/2019 22:30, Andrew Cooper wrote:
> These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
> folding existing contination logic where applicable.
> 
> In addition:
>   * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
>     tight spinlock loop, and make the continuation logic common at the
>     epilogue.
>   * Contrary to the comment in the code, kexec_exec() does return in the
>     KEXEC_REBOOT case, needs to pass ret back to the caller.

It is not entirely trivial to me that KEXEC_REBOOT refers to 
KEXEC_DEFAULT_TYPE. The more that if you look at the kexec_reboot() 
helper, it will not return (see BUG()). What may return is 
continue_hypercall_on_cpu().

So would it make sense to use KEXEC_DEFAULT_TYPE?

[...]

> @@ -816,6 +819,13 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>    out:
>       spin_unlock(&xenpf_lock);
>   
> +    if ( ret == -ERESTART )
> +    {
> +    create_continuation:

Shall we indent create_continuation the same way as out?

[...]

> @@ -1263,13 +1263,25 @@ static int do_kexec_op_internal(unsigned long op,
>   
>   long do_kexec_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
>   {
> -    return do_kexec_op_internal(op, uarg, 0);
> +    int ret = do_kexec_op_internal(op, uarg, 0);
Shouldn't it be long (or unsigned long) here? Otherwise, the return of 
hypercall_create_continuation() may be truncated.


> +
> +    if ( ret == -ERESTART )
> +        ret = hypercall_create_continuation(__HYPERVISOR_kexec_op,
> +                                            "lh", op, uarg);
> +
> +    return ret;
>   }

[...]

> @@ -516,6 +519,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>            __copy_to_guest(u_sysctl, op, 1) )
>           ret = -EFAULT;
>   
> +    if ( ret == -ERESTART )
> +    {
> +    create_continuation:


Similar question as the previous label create_continuation.

> +        ret = hypercall_create_continuation(__HYPERVISOR_sysctl, "h", u_sysctl);
> +    }
> +
>       return ret;
>   }
>   
>
Jan Beulich Dec. 9, 2019, 4:25 p.m. UTC | #2
On 05.12.2019 23:30, Andrew Cooper wrote:
> These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
> folding existing contination logic where applicable.
> 
> In addition:
>  * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
>    tight spinlock loop, and make the continuation logic common at the
>    epilogue.

Is this really needed with a hypercall_preempt_check() invocation
already in the bodies of these loops?

Jan
Jan Beulich Dec. 9, 2019, 4:29 p.m. UTC | #3
On 09.12.2019 17:25, Jan Beulich wrote:
> On 05.12.2019 23:30, Andrew Cooper wrote:
>> These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
>> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
>> folding existing contination logic where applicable.
>>
>> In addition:
>>  * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
>>    tight spinlock loop, and make the continuation logic common at the
>>    epilogue.
> 
> Is this really needed with a hypercall_preempt_check() invocation
> already in the bodies of these loops?

And if it's really to be added, shouldn't it be at the bottom
of the loop bodies rather than at the top?

Jan
Andrew Cooper Dec. 9, 2019, 5:37 p.m. UTC | #4
On 08/12/2019 12:57, Julien Grall wrote:
> Hi Andrew,
>
> On 05/12/2019 22:30, Andrew Cooper wrote:
>> These hypercalls each use continue_hypercall_on_cpu(), whose API is
>> about to
>> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
>> folding existing contination logic where applicable.
>>
>> In addition:
>>   * For platform op and sysctl, insert a cpu_relax() into what is
>> otherwise a
>>     tight spinlock loop, and make the continuation logic common at the
>>     epilogue.
>>   * Contrary to the comment in the code, kexec_exec() does return in the
>>     KEXEC_REBOOT case, needs to pass ret back to the caller.
>
> It is not entirely trivial to me that KEXEC_REBOOT refers to
> KEXEC_DEFAULT_TYPE. The more that if you look at the kexec_reboot()
> helper, it will not return (see BUG()). What may return is
> continue_hypercall_on_cpu().
>
> So would it make sense to use KEXEC_DEFAULT_TYPE?

I'm not sure why I capitalised it, but no - using KEXEC_DEFAULT_TYPE is
worse.  A casual reader is far more likely to understand kexec_reboot()
in this context.

>
> [...]
>
>> @@ -816,6 +819,13 @@ ret_t
>> do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>>    out:
>>       spin_unlock(&xenpf_lock);
>>   +    if ( ret == -ERESTART )
>> +    {
>> +    create_continuation:
>
> Shall we indent create_continuation the same way as out?

They have different scopes, and while it may look weird, this is in
accordance with our style.

>
> [...]
>
>> @@ -1263,13 +1263,25 @@ static int do_kexec_op_internal(unsigned long
>> op,
>>     long do_kexec_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void)
>> uarg)
>>   {
>> -    return do_kexec_op_internal(op, uarg, 0);
>> +    int ret = do_kexec_op_internal(op, uarg, 0);
> Shouldn't it be long (or unsigned long) here? Otherwise, the return of
> hypercall_create_continuation() may be truncated.

If you're concerned about truncation via this pattern, then there are
other areas of the code to be worred about.

However, there is nothing to truncate.  The return value of
hypercall_create_continuation() is the primary hypercall number, i.e.
__HYPERVISOR_kexec_op in this case.

~Andrew
Andrew Cooper Dec. 9, 2019, 5:43 p.m. UTC | #5
On 09/12/2019 16:29, Jan Beulich wrote:
> On 09.12.2019 17:25, Jan Beulich wrote:
>> On 05.12.2019 23:30, Andrew Cooper wrote:
>>> These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
>>> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
>>> folding existing contination logic where applicable.
>>>
>>> In addition:
>>>  * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
>>>    tight spinlock loop, and make the continuation logic common at the
>>>    epilogue.
>> Is this really needed with a hypercall_preempt_check() invocation
>> already in the bodies of these loops?

Yes.  The reason you're supposed to pause is to stop having memory
traffic constantly trying to pull the spinlock's cacheline into shared
state.

Racing round a tight loop constantly reading 4 or 5 memory locations is
almost as bad.

> And if it's really to be added, shouldn't it be at the bottom
> of the loop bodies rather than at the top?

It doesn't matter.

~Andrew
Jan Beulich Dec. 10, 2019, 8:27 a.m. UTC | #6
On 09.12.2019 18:43, Andrew Cooper wrote:
> On 09/12/2019 16:29, Jan Beulich wrote:
>> On 09.12.2019 17:25, Jan Beulich wrote:
>>> On 05.12.2019 23:30, Andrew Cooper wrote:
>>>> These hypercalls each use continue_hypercall_on_cpu(), whose API is about to
>>>> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
>>>> folding existing contination logic where applicable.
>>>>
>>>> In addition:
>>>>  * For platform op and sysctl, insert a cpu_relax() into what is otherwise a
>>>>    tight spinlock loop, and make the continuation logic common at the
>>>>    epilogue.
>>> Is this really needed with a hypercall_preempt_check() invocation
>>> already in the bodies of these loops?
> 
> Yes.  The reason you're supposed to pause is to stop having memory
> traffic constantly trying to pull the spinlock's cacheline into shared
> state.
> 
> Racing round a tight loop constantly reading 4 or 5 memory locations is
> almost as bad.
> 
>> And if it's really to be added, shouldn't it be at the bottom
>> of the loop bodies rather than at the top?
> 
> It doesn't matter.

Well, modern documentation indeed gives no hint whatsoever towards
its placement. Recalling the initial guidelines from Intel (from
even before the whitepaper was made available) there was a
suggestion towards placing it close ahead of the problematic
memory access. The last example in the whitepaper looks to support
this without explicitly saying so.

Anyway, preferably with it moved
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Julien Grall Dec. 11, 2019, 12:01 p.m. UTC | #7
Hi,

On 09/12/2019 17:37, Andrew Cooper wrote:
> On 08/12/2019 12:57, Julien Grall wrote:
>> Hi Andrew,
>>
>> On 05/12/2019 22:30, Andrew Cooper wrote:
>>> These hypercalls each use continue_hypercall_on_cpu(), whose API is
>>> about to
>>> switch to use -ERESTART.  Update the soon-to-be affected paths to cope,
>>> folding existing contination logic where applicable.
>>>
>>> In addition:
>>>    * For platform op and sysctl, insert a cpu_relax() into what is
>>> otherwise a
>>>      tight spinlock loop, and make the continuation logic common at the
>>>      epilogue.
>>>    * Contrary to the comment in the code, kexec_exec() does return in the
>>>      KEXEC_REBOOT case, needs to pass ret back to the caller.
>>
>> It is not entirely trivial to me that KEXEC_REBOOT refers to
>> KEXEC_DEFAULT_TYPE. The more that if you look at the kexec_reboot()
>> helper, it will not return (see BUG()). What may return is
>> continue_hypercall_on_cpu().
>>
>> So would it make sense to use KEXEC_DEFAULT_TYPE?
> 
> I'm not sure why I capitalised it, but no - using KEXEC_DEFAULT_TYPE is
> worse.  A casual reader is far more likely to understand kexec_reboot()
> in this context.

But kexec_reboot() cannot return (see BUG()) so a reader may not 
understand why you suggest that it will return. So I think this want to 
be reworded.
>>
>> [...]
>>
>>> @@ -816,6 +819,13 @@ ret_t
>>> do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>>>     out:
>>>        spin_unlock(&xenpf_lock);
>>>    +    if ( ret == -ERESTART )
>>> +    {
>>> +    create_continuation:
>>
>> Shall we indent create_continuation the same way as out?
> 
> They have different scopes, and while it may look weird, this is in
> accordance with our style.
> 
>>
>> [...]
>>
>>> @@ -1263,13 +1263,25 @@ static int do_kexec_op_internal(unsigned long
>>> op,
>>>      long do_kexec_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void)
>>> uarg)
>>>    {
>>> -    return do_kexec_op_internal(op, uarg, 0);
>>> +    int ret = do_kexec_op_internal(op, uarg, 0);
>> Shouldn't it be long (or unsigned long) here? Otherwise, the return of
>> hypercall_create_continuation() may be truncated.
> 
> If you're concerned about truncation via this pattern, then there are
> other areas of the code to be worred about.

I knew you would mention the other places :).

> 
> However, there is nothing to truncate.  The return value of
> hypercall_create_continuation() is the primary hypercall number, i.e.
> __HYPERVISOR_kexec_op in this case.

Make sense. And I guess the signed bit will be propagated so if 
do_kexec_op() return -EINVAL (32-bit), then it would still be -EINVAL 
(64-bit).

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index b19f6ec4ed..c0c209baac 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -201,9 +201,12 @@  ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
      * with this vcpu.
      */
     while ( !spin_trylock(&xenpf_lock) )
+    {
+        cpu_relax();
+
         if ( hypercall_preempt_check() )
-            return hypercall_create_continuation(
-                __HYPERVISOR_platform_op, "h", u_xenpf_op);
+            goto create_continuation;
+    }
 
     switch ( op->cmd )
     {
@@ -816,6 +819,13 @@  ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
  out:
     spin_unlock(&xenpf_lock);
 
+    if ( ret == -ERESTART )
+    {
+    create_continuation:
+        ret = hypercall_create_continuation(__HYPERVISOR_platform_op,
+                                            "h", u_xenpf_op);
+    }
+
     return ret;
 }
 
diff --git a/xen/common/compat/domain.c b/xen/common/compat/domain.c
index 11c6afc463..1a14403672 100644
--- a/xen/common/compat/domain.c
+++ b/xen/common/compat/domain.c
@@ -79,11 +79,6 @@  int compat_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) ar
 
             xfree(ctxt);
         }
-
-        if ( rc == -ERESTART )
-            rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
-                                               cmd, vcpuid, arg);
-
         break;
     }
 
@@ -130,6 +125,10 @@  int compat_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) ar
         break;
     }
 
+    if ( rc == -ERESTART )
+        rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+                                           cmd, vcpuid, arg);
+
     return rc;
 }
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 865a1cb9d7..ab7e4d09c0 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1422,10 +1422,6 @@  long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EINVAL;
 
         rc = arch_initialise_vcpu(v, arg);
-        if ( rc == -ERESTART )
-            rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
-                                               cmd, vcpuid, arg);
-
         break;
 
     case VCPUOP_up:
@@ -1598,6 +1594,10 @@  long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    if ( rc == -ERESTART )
+        rc = hypercall_create_continuation(__HYPERVISOR_vcpu_op, "iih",
+                                           cmd, vcpuid, arg);
+
     return rc;
 }
 
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index a262cc5a18..2fca75cec0 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -842,7 +842,7 @@  static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
         break;
     }
 
-    return -EINVAL; /* never reached */
+    return ret;
 }
 
 static int kexec_swap_images(int type, struct kexec_image *new,
@@ -1220,7 +1220,7 @@  static int do_kexec_op_internal(unsigned long op,
         return ret;
 
     if ( test_and_set_bit(KEXEC_FLAG_IN_HYPERCALL, &kexec_flags) )
-        return hypercall_create_continuation(__HYPERVISOR_kexec_op, "lh", op, uarg);
+        return -ERESTART;
 
     switch ( op )
     {
@@ -1263,13 +1263,25 @@  static int do_kexec_op_internal(unsigned long op,
 
 long do_kexec_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
-    return do_kexec_op_internal(op, uarg, 0);
+    int ret = do_kexec_op_internal(op, uarg, 0);
+
+    if ( ret == -ERESTART )
+        ret = hypercall_create_continuation(__HYPERVISOR_kexec_op,
+                                            "lh", op, uarg);
+
+    return ret;
 }
 
 #ifdef CONFIG_COMPAT
 int compat_kexec_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) uarg)
 {
-    return do_kexec_op_internal(op, uarg, 1);
+    int ret = do_kexec_op_internal(op, uarg, 1);
+
+    if ( ret == -ERESTART )
+        ret = hypercall_create_continuation(__HYPERVISOR_kexec_op,
+                                            "lh", op, uarg);
+
+    return ret;
 }
 #endif
 
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index f88a285e7f..7b55047bb9 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -51,9 +51,12 @@  long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
      * with this vcpu.
      */
     while ( !spin_trylock(&sysctl_lock) )
+    {
+        cpu_relax();
+
         if ( hypercall_preempt_check() )
-            return hypercall_create_continuation(
-                __HYPERVISOR_sysctl, "h", u_sysctl);
+            goto create_continuation;
+    }
 
     switch ( op->cmd )
     {
@@ -516,6 +519,12 @@  long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
          __copy_to_guest(u_sysctl, op, 1) )
         ret = -EFAULT;
 
+    if ( ret == -ERESTART )
+    {
+    create_continuation:
+        ret = hypercall_create_continuation(__HYPERVISOR_sysctl, "h", u_sysctl);
+    }
+
     return ret;
 }