diff mbox series

[5/6] xen/tasklet: Return -ERESTART from continue_hypercall_on_cpu()

Message ID 20191205223008.8623-6-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen: Support continuations from tasklets | expand

Commit Message

Andrew Cooper Dec. 5, 2019, 10:30 p.m. UTC
Some hypercalls tasklets want to create a continuation, rather than fail the
hypercall with a hard error.  By the time the tasklet is executing, it is too
late to create the continuation, and even continue_hypercall_on_cpu() doesn't
have enough state to do it correctly.

All callers of continue_hypercall_on_cpu() have been updated to turn -ERESTART
into a continuation, where appropriate modifications can be made to register
and/or memory parameters.

This changes the continue_hypercall_on_cpu() behaviour to unconditionally
create a hypercall continuation, in case the tasklet wants to use it, and then
to have arch_hypercall_tasklet_result() cancel the continuation when a result
is available.  None of these hypercalls are fast paths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>

There is one RFC point.  The statement in the header file of "If this function
returns 0 then the function is guaranteed to run at some point in the future."
was never true.  In the case of a CPU miss, the hypercall would be blindly
failed with -EINVAL.

The current behaviour with this patch is to not cancel the continuation, which
I think is less bad, but still not great.  Thoughts?
---
 xen/arch/arm/traps.c     |  1 +
 xen/arch/x86/hypercall.c |  7 +++++++
 xen/common/domain.c      |  9 +++++----
 xen/include/xen/domain.h | 11 ++++++++---
 4 files changed, 21 insertions(+), 7 deletions(-)

Comments

Jan Beulich Dec. 9, 2019, 4:52 p.m. UTC | #1
On 05.12.2019 23:30, Andrew Cooper wrote:
> Some hypercalls tasklets want to create a continuation, rather than fail the
> hypercall with a hard error.  By the time the tasklet is executing, it is too
> late to create the continuation, and even continue_hypercall_on_cpu() doesn't
> have enough state to do it correctly.

I think it would be quite nice if you made clear what piece of state
it is actually missing. To be honest, I don't recall anymore.

> All callers of continue_hypercall_on_cpu() have been updated to turn -ERESTART
> into a continuation, where appropriate modifications can be made to register
> and/or memory parameters.
> 
> This changes the continue_hypercall_on_cpu() behaviour to unconditionally
> create a hypercall continuation, in case the tasklet wants to use it, and then
> to have arch_hypercall_tasklet_result() cancel the continuation when a result
> is available.  None of these hypercalls are fast paths.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wl@xen.org>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien@xen.org>
> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> 
> There is one RFC point.  The statement in the header file of "If this function
> returns 0 then the function is guaranteed to run at some point in the future."
> was never true.  In the case of a CPU miss, the hypercall would be blindly
> failed with -EINVAL.

"Was never true" sounds like "completely broken". Afaict it was true
in all cases except the purely hypothetical one of the tasklet ending
up executing on the wrong CPU.

> The current behaviour with this patch is to not cancel the continuation, which
> I think is less bad, but still not great.  Thoughts?

Well, that's a guest live lock then aiui. Is there any way for the
guest to make it out of there? If not, perhaps it'd be "better" to
crash the guest? (I don't suppose there's anything we can do to
still make the tasklet run on the intended CPU.)

> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1489,6 +1489,7 @@ void arch_hypercall_tasklet_result(struct vcpu *v, long res)
>  {
>      struct cpu_user_regs *regs = &v->arch.cpu_info->guest_cpu_user_regs;
>  
> +    regs->pc += 4;  /* Skip over 'hvc #XEN_HYPERCALL_TAG' */
>      HYPERCALL_RESULT_REG(regs) = res;
>  }
>  
> --- a/xen/arch/x86/hypercall.c
> +++ b/xen/arch/x86/hypercall.c
> @@ -170,6 +170,13 @@ void arch_hypercall_tasklet_result(struct vcpu *v, long res)
>  {
>      struct cpu_user_regs *regs = &v->arch.user_regs;
>  
> +    /*
> +     * PV hypercalls are all 2-byte instructions (INT $0x82, SYSCALL).  HVM
> +     * hypercalls are all 3-byte instructions (VMCALL, VMMCALL).
> +     *
> +     * Move %rip forwards to complete the continuation.
> +     */
> +    regs->rip += 2 + is_hvm_vcpu(v);
>      regs->rax = res;
>  }

To leave the system in consistent state, perhaps better to call
hypercall_cancel_continuation() along with the PC/RIP updating?

> --- a/xen/include/xen/domain.h
> +++ b/xen/include/xen/domain.h
> @@ -96,9 +96,11 @@ void domctl_lock_release(void);
>  
>  /*
>   * Continue the current hypercall via func(data) on specified cpu.
> - * If this function returns 0 then the function is guaranteed to run at some
> - * point in the future. If this function returns an error code then the
> - * function has not been and will not be executed.
> + *
> + * This function returns -ERESTART in the success case, and a higher level
> + * caller is required to set up a hypercall continuation.  func() will be run
> + * at some point in the future.  If this function returns any other error code
> + * then func() has not, and will not be executed.
>   */
>  int continue_hypercall_on_cpu(
>      unsigned int cpu, long (*func)(void *data), void *data);

How is this comment any better wrt the "was never true" comment
you've made above? The function still wouldn't be invoked in
case of a CPU miss.

Jan
Andrew Cooper Dec. 9, 2019, 5:49 p.m. UTC | #2
On 09/12/2019 16:52, Jan Beulich wrote:
> On 05.12.2019 23:30, Andrew Cooper wrote:
>> Some hypercalls tasklets want to create a continuation, rather than fail the
>> hypercall with a hard error.  By the time the tasklet is executing, it is too
>> late to create the continuation, and even continue_hypercall_on_cpu() doesn't
>> have enough state to do it correctly.
> I think it would be quite nice if you made clear what piece of state
> it is actually missing. To be honest, I don't recall anymore.

How to correctly mutate the registers and/or memory (which is specific
to the hypercall subop in some cases).

>> All callers of continue_hypercall_on_cpu() have been updated to turn -ERESTART
>> into a continuation, where appropriate modifications can be made to register
>> and/or memory parameters.
>>
>> This changes the continue_hypercall_on_cpu() behaviour to unconditionally
>> create a hypercall continuation, in case the tasklet wants to use it, and then
>> to have arch_hypercall_tasklet_result() cancel the continuation when a result
>> is available.  None of these hypercalls are fast paths.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Wei Liu <wl@xen.org>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> CC: Julien Grall <julien@xen.org>
>> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>>
>> There is one RFC point.  The statement in the header file of "If this function
>> returns 0 then the function is guaranteed to run at some point in the future."
>> was never true.  In the case of a CPU miss, the hypercall would be blindly
>> failed with -EINVAL.
> "Was never true" sounds like "completely broken". Afaict it was true
> in all cases except the purely hypothetical one of the tasklet ending
> up executing on the wrong CPU.

There is nothing hypothetical about it.  It really will go wrong when a
CPU gets offlined.

>
>> The current behaviour with this patch is to not cancel the continuation, which
>> I think is less bad, but still not great.  Thoughts?
> Well, that's a guest live lock then aiui.

It simply continues again.  It will livelock only if the hypercall picks
a bad cpu all the time.

> Is there any way for the guest to make it out of there? If not, perhaps it'd be "better" to
> crash the guest? (I don't suppose there's anything we can do to
> still make the tasklet run on the intended CPU.)

That is why I implemented it like this.  If it is a stray interaction
with a CPU offline, the next time around it will work fine.

Anything else is rather more broken.

>
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -1489,6 +1489,7 @@ void arch_hypercall_tasklet_result(struct vcpu *v, long res)
>>  {
>>      struct cpu_user_regs *regs = &v->arch.cpu_info->guest_cpu_user_regs;
>>  
>> +    regs->pc += 4;  /* Skip over 'hvc #XEN_HYPERCALL_TAG' */
>>      HYPERCALL_RESULT_REG(regs) = res;
>>  }
>>  
>> --- a/xen/arch/x86/hypercall.c
>> +++ b/xen/arch/x86/hypercall.c
>> @@ -170,6 +170,13 @@ void arch_hypercall_tasklet_result(struct vcpu *v, long res)
>>  {
>>      struct cpu_user_regs *regs = &v->arch.user_regs;
>>  
>> +    /*
>> +     * PV hypercalls are all 2-byte instructions (INT $0x82, SYSCALL).  HVM
>> +     * hypercalls are all 3-byte instructions (VMCALL, VMMCALL).
>> +     *
>> +     * Move %rip forwards to complete the continuation.
>> +     */
>> +    regs->rip += 2 + is_hvm_vcpu(v);
>>      regs->rax = res;
>>  }
> To leave the system in consistent state, perhaps better to call
> hypercall_cancel_continuation() along with the PC/RIP updating?

Probably, yes.

>
>> --- a/xen/include/xen/domain.h
>> +++ b/xen/include/xen/domain.h
>> @@ -96,9 +96,11 @@ void domctl_lock_release(void);
>>  
>>  /*
>>   * Continue the current hypercall via func(data) on specified cpu.
>> - * If this function returns 0 then the function is guaranteed to run at some
>> - * point in the future. If this function returns an error code then the
>> - * function has not been and will not be executed.
>> + *
>> + * This function returns -ERESTART in the success case, and a higher level
>> + * caller is required to set up a hypercall continuation.  func() will be run
>> + * at some point in the future.  If this function returns any other error code
>> + * then func() has not, and will not be executed.
>>   */
>>  int continue_hypercall_on_cpu(
>>      unsigned int cpu, long (*func)(void *data), void *data);
> How is this comment any better wrt the "was never true" comment
> you've made above? The function still wouldn't be invoked in
> case of a CPU miss.

Depends now the miss came about.  It certainly stands a far better
chance now of actually executing.

~Andrew
Jan Beulich Dec. 10, 2019, 8:55 a.m. UTC | #3
On 09.12.2019 18:49, Andrew Cooper wrote:
> On 09/12/2019 16:52, Jan Beulich wrote:
>> On 05.12.2019 23:30, Andrew Cooper wrote:
>>> Some hypercalls tasklets want to create a continuation, rather than fail the
>>> hypercall with a hard error.  By the time the tasklet is executing, it is too
>>> late to create the continuation, and even continue_hypercall_on_cpu() doesn't
>>> have enough state to do it correctly.
>> I think it would be quite nice if you made clear what piece of state
>> it is actually missing. To be honest, I don't recall anymore.
> 
> How to correctly mutate the registers and/or memory (which is specific
> to the hypercall subop in some cases).

Well, in-memory arguments can be accessed as long as the mapping is
the right one (which it typically wouldn't be inside a tasklet). Do
existing continue_hypercall_on_cpu() users need this? Looking over
patch 4 again, I didn't think so. (Which isn't to say that removing
the latent issue is not a good thing.)

In-register values can be changed as long as the respective exit
path will suitably pick up the value, which I thought was always
the case.

Hence I'm afraid your single reply sentence didn't really clarify
matters. I'm sorry if this is just because of me being dense.

>>> There is one RFC point.  The statement in the header file of "If this function
>>> returns 0 then the function is guaranteed to run at some point in the future."
>>> was never true.  In the case of a CPU miss, the hypercall would be blindly
>>> failed with -EINVAL.
>> "Was never true" sounds like "completely broken". Afaict it was true
>> in all cases except the purely hypothetical one of the tasklet ending
>> up executing on the wrong CPU.
> 
> There is nothing hypothetical about it.  It really will go wrong when a
> CPU gets offlined.

Accepted, but it's still not like "completely broken". I would even
suppose the case wasn't considered when CPU offlining support was
introduced, not when continue_hypercall_on_cpu() came into existence
(which presumably is when the comment was written).

Anyway - yes, I agree this is a fair solution to the issue at hand.

>>> The current behaviour with this patch is to not cancel the continuation, which
>>> I think is less bad, but still not great.  Thoughts?
>> Well, that's a guest live lock then aiui.
> 
> It simply continues again.  It will livelock only if the hypercall picks
> a bad cpu all the time.

Oh, I see I was mislead by continue_hypercall_tasklet_handler() not
updating info->cpu, not paying attention to it actually freeing info.
Plus a crucial aspect looks to be that there are no "chained" uses of
continue_hypercall_on_cpu() anymore (the microcode loading one being
gone now) - afaict any such wouldn't guarantee forward progress with
this new model (without recording somewhere which CPUs had been dealt
with already).

Jan
Andrew Cooper Dec. 10, 2019, 5:55 p.m. UTC | #4
On 10/12/2019 08:55, Jan Beulich wrote:
> On 09.12.2019 18:49, Andrew Cooper wrote:
>> On 09/12/2019 16:52, Jan Beulich wrote:
>>> On 05.12.2019 23:30, Andrew Cooper wrote:
>>>> Some hypercalls tasklets want to create a continuation, rather than fail the
>>>> hypercall with a hard error.  By the time the tasklet is executing, it is too
>>>> late to create the continuation, and even continue_hypercall_on_cpu() doesn't
>>>> have enough state to do it correctly.
>>> I think it would be quite nice if you made clear what piece of state
>>> it is actually missing. To be honest, I don't recall anymore.
>> How to correctly mutate the registers and/or memory (which is specific
>> to the hypercall subop in some cases).
> Well, in-memory arguments can be accessed as long as the mapping is
> the right one (which it typically wouldn't be inside a tasklet). Do
> existing continue_hypercall_on_cpu() users need this? Looking over
> patch 4 again, I didn't think so. (Which isn't to say that removing
> the latent issue is not a good thing.)
>
> In-register values can be changed as long as the respective exit
> path will suitably pick up the value, which I thought was always
> the case.
>
> Hence I'm afraid your single reply sentence didn't really clarify
> matters. I'm sorry if this is just because of me being dense.

How, physically, would you arrange for continue_hypercall_on_cpu() to
make the requisite state adjustments?

Yes - registers and memory can be accessed, but only the hypercall
(sub?)op handler knows how to mutate them appropriately.

You'd have to copy the mutation logic into continue_hypercall_on_cpu(),
and pass in op/subops and a union of all pointers, *and* whatever
intermediate state the subop handler needs.

Or you can return -ERESTART and let the caller DTRT with the state it
has in context, as it would in other cases requiring a continuation.

>
>>>> There is one RFC point.  The statement in the header file of "If this function
>>>> returns 0 then the function is guaranteed to run at some point in the future."
>>>> was never true.  In the case of a CPU miss, the hypercall would be blindly
>>>> failed with -EINVAL.
>>> "Was never true" sounds like "completely broken". Afaict it was true
>>> in all cases except the purely hypothetical one of the tasklet ending
>>> up executing on the wrong CPU.
>> There is nothing hypothetical about it.  It really will go wrong when a
>> CPU gets offlined.
> Accepted, but it's still not like "completely broken".

I didn't mean it like that.  I mean "it has never had the property it
claimed", which is distinct from "the claim used to be true, but was
then accidentally regressed".

> I would even
> suppose the case wasn't considered when CPU offlining support was
> introduced, not when continue_hypercall_on_cpu() came into existence
> (which presumably is when the comment was written).
>
> Anyway - yes, I agree this is a fair solution to the issue at hand.
>
>>>> The current behaviour with this patch is to not cancel the continuation, which
>>>> I think is less bad, but still not great.  Thoughts?
>>> Well, that's a guest live lock then aiui.
>> It simply continues again.  It will livelock only if the hypercall picks
>> a bad cpu all the time.
> Oh, I see I was mislead by continue_hypercall_tasklet_handler() not
> updating info->cpu, not paying attention to it actually freeing info.
> Plus a crucial aspect looks to be that there are no "chained" uses of
> continue_hypercall_on_cpu() anymore (the microcode loading one being
> gone now) - afaict any such wouldn't guarantee forward progress with
> this new model (without recording somewhere which CPUs had been dealt
> with already).

I'd forgotten that we had that, but I can't say I'm sad to see the back
of it.  I recall at the time saying that it wasn't a clever move.

For now, I suggest that we ignore that case.  If an when a real usecase
appears, we can consider making adjustments.

~Andrew
Jan Beulich Dec. 11, 2019, 7:41 a.m. UTC | #5
On 10.12.2019 18:55, Andrew Cooper wrote:
> On 10/12/2019 08:55, Jan Beulich wrote:
>> On 09.12.2019 18:49, Andrew Cooper wrote:
>>> On 09/12/2019 16:52, Jan Beulich wrote:
>>>> On 05.12.2019 23:30, Andrew Cooper wrote:
>>>>> Some hypercalls tasklets want to create a continuation, rather than fail the
>>>>> hypercall with a hard error.  By the time the tasklet is executing, it is too
>>>>> late to create the continuation, and even continue_hypercall_on_cpu() doesn't
>>>>> have enough state to do it correctly.
>>>> I think it would be quite nice if you made clear what piece of state
>>>> it is actually missing. To be honest, I don't recall anymore.
>>> How to correctly mutate the registers and/or memory (which is specific
>>> to the hypercall subop in some cases).
>> Well, in-memory arguments can be accessed as long as the mapping is
>> the right one (which it typically wouldn't be inside a tasklet). Do
>> existing continue_hypercall_on_cpu() users need this? Looking over
>> patch 4 again, I didn't think so. (Which isn't to say that removing
>> the latent issue is not a good thing.)
>>
>> In-register values can be changed as long as the respective exit
>> path will suitably pick up the value, which I thought was always
>> the case.
>>
>> Hence I'm afraid your single reply sentence didn't really clarify
>> matters. I'm sorry if this is just because of me being dense.
> 
> How, physically, would you arrange for continue_hypercall_on_cpu() to
> make the requisite state adjustments?

You can't (at least not without having sufficient further context),
I agree. Yet ...

> Yes - registers and memory can be accessed, but only the hypercall
> (sub?)op handler knows how to mutate them appropriately.
> 
> You'd have to copy the mutation logic into continue_hypercall_on_cpu(),
> and pass in op/subops and a union of all pointers, *and* whatever
> intermediate state the subop handler needs.
> 
> Or you can return -ERESTART and let the caller DTRT with the state it
> has in context, as it would in other cases requiring a continuation.

... it continues to be unclear to me whether you're fixing an actual
bug here, or just a latent one. The existing uses of
continue_hypercall_on_cpu() don't look to require state updates
beyond the hypercall return value (or else, aiui, they wouldn't have
worked in the first place), and that one had a way to get modified.

>>>>> The current behaviour with this patch is to not cancel the continuation, which
>>>>> I think is less bad, but still not great.  Thoughts?
>>>> Well, that's a guest live lock then aiui.
>>> It simply continues again.  It will livelock only if the hypercall picks
>>> a bad cpu all the time.
>> Oh, I see I was mislead by continue_hypercall_tasklet_handler() not
>> updating info->cpu, not paying attention to it actually freeing info.
>> Plus a crucial aspect looks to be that there are no "chained" uses of
>> continue_hypercall_on_cpu() anymore (the microcode loading one being
>> gone now) - afaict any such wouldn't guarantee forward progress with
>> this new model (without recording somewhere which CPUs had been dealt
>> with already).
> 
> I'd forgotten that we had that, but I can't say I'm sad to see the back
> of it.  I recall at the time saying that it wasn't a clever move.
> 
> For now, I suggest that we ignore that case.  If an when a real usecase
> appears, we can consider making adjustments.

Oh, of course - I didn't mean to even remotely suggest anything else.

Jan
Andrew Cooper Dec. 11, 2019, 9 a.m. UTC | #6
On 11/12/2019 07:41, Jan Beulich wrote:
> On 10.12.2019 18:55, Andrew Cooper wrote:
>> On 10/12/2019 08:55, Jan Beulich wrote:
>>> On 09.12.2019 18:49, Andrew Cooper wrote:
>>>> On 09/12/2019 16:52, Jan Beulich wrote:
>>>>> On 05.12.2019 23:30, Andrew Cooper wrote:
>>>>>> Some hypercalls tasklets want to create a continuation, rather than fail the
>>>>>> hypercall with a hard error.  By the time the tasklet is executing, it is too
>>>>>> late to create the continuation, and even continue_hypercall_on_cpu() doesn't
>>>>>> have enough state to do it correctly.
>>>>> I think it would be quite nice if you made clear what piece of state
>>>>> it is actually missing. To be honest, I don't recall anymore.
>>>> How to correctly mutate the registers and/or memory (which is specific
>>>> to the hypercall subop in some cases).
>>> Well, in-memory arguments can be accessed as long as the mapping is
>>> the right one (which it typically wouldn't be inside a tasklet). Do
>>> existing continue_hypercall_on_cpu() users need this? Looking over
>>> patch 4 again, I didn't think so. (Which isn't to say that removing
>>> the latent issue is not a good thing.)
>>>
>>> In-register values can be changed as long as the respective exit
>>> path will suitably pick up the value, which I thought was always
>>> the case.
>>>
>>> Hence I'm afraid your single reply sentence didn't really clarify
>>> matters. I'm sorry if this is just because of me being dense.
>> How, physically, would you arrange for continue_hypercall_on_cpu() to
>> make the requisite state adjustments?
> You can't (at least not without having sufficient further context),
> I agree. Yet ...
>
>> Yes - registers and memory can be accessed, but only the hypercall
>> (sub?)op handler knows how to mutate them appropriately.
>>
>> You'd have to copy the mutation logic into continue_hypercall_on_cpu(),
>> and pass in op/subops and a union of all pointers, *and* whatever
>> intermediate state the subop handler needs.
>>
>> Or you can return -ERESTART and let the caller DTRT with the state it
>> has in context, as it would in other cases requiring a continuation.
> ... it continues to be unclear to me whether you're fixing an actual
> bug here, or just a latent one.

I'm not fixing any bug.

I am making a fundamental change in behaviour, so tasklet context can
use -ERESTART.

Tasklet context doesn't even know what the primary hypercall index needs
to be to correctly create a continuation.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index a20474f87c..5d35d2b7e9 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1489,6 +1489,7 @@  void arch_hypercall_tasklet_result(struct vcpu *v, long res)
 {
     struct cpu_user_regs *regs = &v->arch.cpu_info->guest_cpu_user_regs;
 
+    regs->pc += 4;  /* Skip over 'hvc #XEN_HYPERCALL_TAG' */
     HYPERCALL_RESULT_REG(regs) = res;
 }
 
diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
index 7f299d45c6..42d95f9b9a 100644
--- a/xen/arch/x86/hypercall.c
+++ b/xen/arch/x86/hypercall.c
@@ -170,6 +170,13 @@  void arch_hypercall_tasklet_result(struct vcpu *v, long res)
 {
     struct cpu_user_regs *regs = &v->arch.user_regs;
 
+    /*
+     * PV hypercalls are all 2-byte instructions (INT $0x82, SYSCALL).  HVM
+     * hypercalls are all 3-byte instructions (VMCALL, VMMCALL).
+     *
+     * Move %rip forwards to complete the continuation.
+     */
+    regs->rip += 2 + is_hvm_vcpu(v);
     regs->rax = res;
 }
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index ab7e4d09c0..eb69db3078 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1665,7 +1665,7 @@  static void continue_hypercall_tasklet_handler(void *data)
 {
     struct migrate_info *info = data;
     struct vcpu *v = info->vcpu;
-    long res = -EINVAL;
+    long res = -ERESTART;
 
     /* Wait for vcpu to sleep so that we can access its register state. */
     vcpu_sleep_sync(v);
@@ -1675,7 +1675,8 @@  static void continue_hypercall_tasklet_handler(void *data)
     if ( likely(info->cpu == smp_processor_id()) )
         res = info->func(info->data);
 
-    arch_hypercall_tasklet_result(v, res);
+    if ( res != -ERESTART )
+        arch_hypercall_tasklet_result(v, res);
 
     this_cpu(continue_info) = NULL;
 
@@ -1726,8 +1727,8 @@  int continue_hypercall_on_cpu(
 
     tasklet_schedule_on_cpu(&info->vcpu->continue_hypercall_tasklet, cpu);
 
-    /* Dummy return value will be overwritten by tasklet. */
-    return 0;
+    /* Start a continuation.  Value will be overwritten by the tasklet. */
+    return -ERESTART;
 }
 
 /*
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 1cb205d977..83c737bca4 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -96,9 +96,11 @@  void domctl_lock_release(void);
 
 /*
  * Continue the current hypercall via func(data) on specified cpu.
- * If this function returns 0 then the function is guaranteed to run at some
- * point in the future. If this function returns an error code then the
- * function has not been and will not be executed.
+ *
+ * This function returns -ERESTART in the success case, and a higher level
+ * caller is required to set up a hypercall continuation.  func() will be run
+ * at some point in the future.  If this function returns any other error code
+ * then func() has not, and will not be executed.
  */
 int continue_hypercall_on_cpu(
     unsigned int cpu, long (*func)(void *data), void *data);
@@ -106,6 +108,9 @@  int continue_hypercall_on_cpu(
 /*
  * Companion to continue_hypercall_on_cpu(), to feed func()'s result back into
  * vcpu regsiter state.
+ *
+ * Must undo the effects of the hypercall continuation created by
+ * continue_hypercall_on_cpu()'s caller.
  */
 void arch_hypercall_tasklet_result(struct vcpu *v, long res);