diff mbox

[v8,05/13] tools/libxc: support to resume uncooperative HVM guests

Message ID 1455763403-18641-6-git-send-email-wency@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wen Congyang Feb. 18, 2016, 2:43 a.m. UTC
Before this patch:
1. suspend
a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
   request to the guest). If the guest doesn't support evtchn, the xenstore
   variant will be used, suspending the guest via XenBus control node.
b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
   the guest

2. Resume:
a. fast path(fast=1)
   Do not change the guest state. We call libxl__domain_resume(.., 1) which
   calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
   PV:       modify the return code to 1, and than call the domctl:
             XEN_DOMCTL_resumedomain
   PVHVM:    same with PV
   pure HVM: do nothing in modify_returncode, and than call the domctl:
             XEN_DOMCTL_resumedomain
b. slow
   Used when the guest's state have been changed. Will call
   libxl__domain_resume(..., 0) to resume the guest.
   PV:       update start info, and reset all secondary CPU states. Than call
             the domctl: XEN_DOMCTL_resumedomain
   PVHVM:    can not be resumed. You will get the following error message:
                 "Cannot resume uncooperative HVM guests"
   pure HVM: same with PVHVM

After this patch:
1. suspend
   unchanged

2. Resume
a. fast path:
   unchanged
b. slow
   PV:       unchanged
   PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
             don't modify the return code, the PV driver will disconnect
             and reconnect.
             The guest ends up doing the XENMAPSPACE_shared_info
             XENMEM_add_to_physmap hypercall and resetting all of its CPU
             states to point to the shared_info(well except the ones past 32).
             That is the Linux kernel does that - regardless whether the
             SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
   Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.

Under COLO, we will update the guest's state(modify memory, cpu's registers,
device status...). In this case, we cannot use the fast path to resume it.
Keep the return code 0, and use a slow path to resume the guest. While
resuming HVM using slow path is not supported currently, this patch is to
make the resume call to not fail.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxc/xc_resume.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

Comments

Wei Liu Feb. 18, 2016, 12:13 p.m. UTC | #1
On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> Before this patch:
> 1. suspend
> a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
>    request to the guest). If the guest doesn't support evtchn, the xenstore
>    variant will be used, suspending the guest via XenBus control node.
> b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
>    the guest
> 
> 2. Resume:
> a. fast path(fast=1)
>    Do not change the guest state. We call libxl__domain_resume(.., 1) which
>    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
>    PV:       modify the return code to 1, and than call the domctl:
>              XEN_DOMCTL_resumedomain
>    PVHVM:    same with PV
>    pure HVM: do nothing in modify_returncode, and than call the domctl:
>              XEN_DOMCTL_resumedomain
> b. slow
>    Used when the guest's state have been changed. Will call
>    libxl__domain_resume(..., 0) to resume the guest.
>    PV:       update start info, and reset all secondary CPU states. Than call
>              the domctl: XEN_DOMCTL_resumedomain
>    PVHVM:    can not be resumed. You will get the following error message:
>                  "Cannot resume uncooperative HVM guests"
>    pure HVM: same with PVHVM
> 
> After this patch:
> 1. suspend
>    unchanged
> 
> 2. Resume
> a. fast path:
>    unchanged
> b. slow
>    PV:       unchanged
>    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
>              don't modify the return code, the PV driver will disconnect
>              and reconnect.
>              The guest ends up doing the XENMAPSPACE_shared_info
>              XENMEM_add_to_physmap hypercall and resetting all of its CPU
>              states to point to the shared_info(well except the ones past 32).
>              That is the Linux kernel does that - regardless whether the
>              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
>    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> 
> Under COLO, we will update the guest's state(modify memory, cpu's registers,
> device status...). In this case, we cannot use the fast path to resume it.
> Keep the return code 0, and use a slow path to resume the guest. While
> resuming HVM using slow path is not supported currently, this patch is to
> make the resume call to not fail.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

I proposed an alternative commit log in a previous reply:

===
Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path

Previously it was not possible to resume PVHVM or pure HVM guest in slow
path because libxc didn't support that.

Using XEN_DOMCTL_resumedomain without modifying guest return code  to resume a
guest is considered to be always safe.  Introduce a function to do that for
(PV)HVM guests in slow path resume.

This patch fixes a bug that denies (PV)HVM slow path resume.  This will
enable COLO to work properly:  COLO requires HVM guest to start in the
new context that has been set up by COLO, hence slow path resume is
required.
===

Note that I fix one place in this version from "guest state" to "guest
return code" in the second paragraph. And that sentence is a big big
assumption that I don't know whether it is true or not --
reverse-engineer from comment before xc_domain_resume and what Linux
does.

But the more I think the more I'm not sure if I'm writing the right
thing. I also can't judge what is the right behaviour on the Linux side.

Konrad, can you fact-check the commit message a bit? And maybe you can
help answer the following questions?

1. If we use fast=0 on PVHVM guest, will it work?
2. If we use fast=0 on HVM guest, will it work?

What is worse, when I say "work" I actually have no clear definition of
it. There doesn't seem to be a defined state that the guest needs to be.

Wei.
Konrad Rzeszutek Wilk Feb. 19, 2016, 2:15 p.m. UTC | #2
On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote:
> On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> > Before this patch:
> > 1. suspend
> > a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
> >    request to the guest). If the guest doesn't support evtchn, the xenstore
> >    variant will be used, suspending the guest via XenBus control node.
> > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
> >    the guest
> > 
> > 2. Resume:
> > a. fast path(fast=1)
> >    Do not change the guest state. We call libxl__domain_resume(.., 1) which
> >    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
> >    PV:       modify the return code to 1, and than call the domctl:
> >              XEN_DOMCTL_resumedomain
> >    PVHVM:    same with PV
> >    pure HVM: do nothing in modify_returncode, and than call the domctl:
> >              XEN_DOMCTL_resumedomain
> > b. slow
> >    Used when the guest's state have been changed. Will call
> >    libxl__domain_resume(..., 0) to resume the guest.
> >    PV:       update start info, and reset all secondary CPU states. Than call
> >              the domctl: XEN_DOMCTL_resumedomain
> >    PVHVM:    can not be resumed. You will get the following error message:
> >                  "Cannot resume uncooperative HVM guests"
> >    pure HVM: same with PVHVM
> > 
> > After this patch:
> > 1. suspend
> >    unchanged
> > 
> > 2. Resume
> > a. fast path:
> >    unchanged
> > b. slow
> >    PV:       unchanged
> >    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
> >              don't modify the return code, the PV driver will disconnect
> >              and reconnect.
> >              The guest ends up doing the XENMAPSPACE_shared_info
> >              XENMEM_add_to_physmap hypercall and resetting all of its CPU
> >              states to point to the shared_info(well except the ones past 32).
> >              That is the Linux kernel does that - regardless whether the
> >              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
> >    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> > 
> > Under COLO, we will update the guest's state(modify memory, cpu's registers,
> > device status...). In this case, we cannot use the fast path to resume it.
> > Keep the return code 0, and use a slow path to resume the guest. While
> > resuming HVM using slow path is not supported currently, this patch is to
> > make the resume call to not fail.
> > 
> > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> I proposed an alternative commit log in a previous reply:
> 
> ===
> Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> 
> Previously it was not possible to resume PVHVM or pure HVM guest in slow
> path because libxc didn't support that.
> 
> Using XEN_DOMCTL_resumedomain without modifying guest return code  to resume a
> guest is considered to be always safe.  Introduce a function to do that for
> (PV)HVM guests in slow path resume.
> 
> This patch fixes a bug that denies (PV)HVM slow path resume.  This will
> enable COLO to work properly:  COLO requires HVM guest to start in the
> new context that has been set up by COLO, hence slow path resume is
> required.
> ===
> 
> Note that I fix one place in this version from "guest state" to "guest
> return code" in the second paragraph. And that sentence is a big big
> assumption that I don't know whether it is true or not --
> reverse-engineer from comment before xc_domain_resume and what Linux
> does.
> 
> But the more I think the more I'm not sure if I'm writing the right
> thing. I also can't judge what is the right behaviour on the Linux side.
> 
> Konrad, can you fact-check the commit message a bit? And maybe you can
> help answer the following questions?
> 
> 1. If we use fast=0 on PVHVM guest, will it work?

Yes.
> 2. If we use fast=0 on HVM guest, will it work?

Yes.

> 
> What is worse, when I say "work" I actually have no clear definition of
> it. There doesn't seem to be a defined state that the guest needs to be.

For PVHVM guests, fast = 0, requires that the guest makes an hypercall
to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
completed (so Xen has suspended the guest then later resumed it), it
would be the guest responsibility to setup Xen infrastructure. As in
retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.

For HVM guests, fast = 0, suspends the guests without the guest making
any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
Afterwards the guest is resumed and continues as usual. No PV drivers -
hence no need to re-establish Xen PV infrastructure.

Hope this helps.
> 
> Wei.
Wei Liu Feb. 19, 2016, 2:43 p.m. UTC | #3
On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote:
> > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> > > Before this patch:
> > > 1. suspend
> > > a. PVHVM and PV: we use the same way to suspend the guest (send the suspend
> > >    request to the guest). If the guest doesn't support evtchn, the xenstore
> > >    variant will be used, suspending the guest via XenBus control node.
> > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to suspend
> > >    the guest
> > > 
> > > 2. Resume:
> > > a. fast path(fast=1)
> > >    Do not change the guest state. We call libxl__domain_resume(.., 1) which
> > >    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
> > >    PV:       modify the return code to 1, and than call the domctl:
> > >              XEN_DOMCTL_resumedomain
> > >    PVHVM:    same with PV
> > >    pure HVM: do nothing in modify_returncode, and than call the domctl:
> > >              XEN_DOMCTL_resumedomain
> > > b. slow
> > >    Used when the guest's state have been changed. Will call
> > >    libxl__domain_resume(..., 0) to resume the guest.
> > >    PV:       update start info, and reset all secondary CPU states. Than call
> > >              the domctl: XEN_DOMCTL_resumedomain
> > >    PVHVM:    can not be resumed. You will get the following error message:
> > >                  "Cannot resume uncooperative HVM guests"
> > >    pure HVM: same with PVHVM
> > > 
> > > After this patch:
> > > 1. suspend
> > >    unchanged
> > > 
> > > 2. Resume
> > > a. fast path:
> > >    unchanged
> > > b. slow
> > >    PV:       unchanged
> > >    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest. Because we
> > >              don't modify the return code, the PV driver will disconnect
> > >              and reconnect.
> > >              The guest ends up doing the XENMAPSPACE_shared_info
> > >              XENMEM_add_to_physmap hypercall and resetting all of its CPU
> > >              states to point to the shared_info(well except the ones past 32).
> > >              That is the Linux kernel does that - regardless whether the
> > >              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
> > >    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> > > 
> > > Under COLO, we will update the guest's state(modify memory, cpu's registers,
> > > device status...). In this case, we cannot use the fast path to resume it.
> > > Keep the return code 0, and use a slow path to resume the guest. While
> > > resuming HVM using slow path is not supported currently, this patch is to
> > > make the resume call to not fail.
> > > 
> > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > I proposed an alternative commit log in a previous reply:
> > 
> > ===
> > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> > 
> > Previously it was not possible to resume PVHVM or pure HVM guest in slow
> > path because libxc didn't support that.
> > 
> > Using XEN_DOMCTL_resumedomain without modifying guest return code  to resume a
> > guest is considered to be always safe.  Introduce a function to do that for
> > (PV)HVM guests in slow path resume.
> > 
> > This patch fixes a bug that denies (PV)HVM slow path resume.  This will
> > enable COLO to work properly:  COLO requires HVM guest to start in the
> > new context that has been set up by COLO, hence slow path resume is
> > required.
> > ===
> > 
> > Note that I fix one place in this version from "guest state" to "guest
> > return code" in the second paragraph. And that sentence is a big big
> > assumption that I don't know whether it is true or not --
> > reverse-engineer from comment before xc_domain_resume and what Linux
> > does.
> > 
> > But the more I think the more I'm not sure if I'm writing the right
> > thing. I also can't judge what is the right behaviour on the Linux side.
> > 
> > Konrad, can you fact-check the commit message a bit? And maybe you can
> > help answer the following questions?
> > 
> > 1. If we use fast=0 on PVHVM guest, will it work?
> 
> Yes.
> > 2. If we use fast=0 on HVM guest, will it work?
> 
> Yes.
> 
> > 
> > What is worse, when I say "work" I actually have no clear definition of
> > it. There doesn't seem to be a defined state that the guest needs to be.
> 
> For PVHVM guests, fast = 0, requires that the guest makes an hypercall
> to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
> completed (so Xen has suspended the guest then later resumed it), it
> would be the guest responsibility to setup Xen infrastructure. As in
> retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.
> 
> For HVM guests, fast = 0, suspends the guests without the guest making
> any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
> Afterwards the guest is resumed and continues as usual. No PV drivers -
> hence no need to re-establish Xen PV infrastructure.
> 

Wait, isn't this function about resuming a guest? I'm confused because
you talk about HV injecting S3 suspend. I guess you wrote the wrong
thing?

My guess is below, from the perspective of resuming a guest

  PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to
  suspend. So when toolstack uses fast=0, the guest resumes from the
  hypercall with return code unmodified. Guest then re-setup Xen
  infrastructure.

  HVM guest would have used S3 suspend to suspend itself. So when
  toolstack uses fast=0 case, hypervisor injects S3 resume and guest
  would just take the normal path like a real machine does.

Does that make sense?

Wei.

> Hope this helps.
> > 
> > Wei.
Ian Campbell Feb. 19, 2016, 2:52 p.m. UTC | #4
On Fri, 2016-02-19 at 14:43 +0000, Wei Liu wrote:
> On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote:
> > > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> > > > Before this patch:
> > > > 1. suspend
> > > > a. PVHVM and PV: we use the same way to suspend the guest (send the
> > > > suspend
> > > >    request to the guest). If the guest doesn't support evtchn, the
> > > > xenstore
> > > >    variant will be used, suspending the guest via XenBus control
> > > > node.
> > > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to
> > > > suspend
> > > >    the guest
> > > > 
> > > > 2. Resume:
> > > > a. fast path(fast=1)
> > > >    Do not change the guest state. We call libxl__domain_resume(..,
> > > > 1) which
> > > >    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
> > > >    PV:       modify the return code to 1, and than call the domctl:
> > > >              XEN_DOMCTL_resumedomain
> > > >    PVHVM:    same with PV
> > > >    pure HVM: do nothing in modify_returncode, and than call the
> > > > domctl:
> > > >              XEN_DOMCTL_resumedomain
> > > > b. slow
> > > >    Used when the guest's state have been changed. Will call
> > > >    libxl__domain_resume(..., 0) to resume the guest.
> > > >    PV:       update start info, and reset all secondary CPU states.
> > > > Than call
> > > >              the domctl: XEN_DOMCTL_resumedomain
> > > >    PVHVM:    can not be resumed. You will get the following error
> > > > message:
> > > >                  "Cannot resume uncooperative HVM guests"
> > > >    pure HVM: same with PVHVM
> > > > 
> > > > After this patch:
> > > > 1. suspend
> > > >    unchanged
> > > > 
> > > > 2. Resume
> > > > a. fast path:
> > > >    unchanged
> > > > b. slow
> > > >    PV:       unchanged
> > > >    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest.
> > > > Because we
> > > >              don't modify the return code, the PV driver will
> > > > disconnect
> > > >              and reconnect.
> > > >              The guest ends up doing the XENMAPSPACE_shared_info
> > > >              XENMEM_add_to_physmap hypercall and resetting all of
> > > > its CPU
> > > >              states to point to the shared_info(well except the
> > > > ones past 32).
> > > >              That is the Linux kernel does that - regardless
> > > > whether the
> > > >              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
> > > >    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> > > > 
> > > > Under COLO, we will update the guest's state(modify memory, cpu's
> > > > registers,
> > > > device status...). In this case, we cannot use the fast path to
> > > > resume it.
> > > > Keep the return code 0, and use a slow path to resume the guest.
> > > > While
> > > > resuming HVM using slow path is not supported currently, this patch
> > > > is to
> > > > make the resume call to not fail.
> > > > 
> > > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > > Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> > > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > 
> > > I proposed an alternative commit log in a previous reply:
> > > 
> > > ===
> > > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> > > 
> > > Previously it was not possible to resume PVHVM or pure HVM guest in
> > > slow
> > > path because libxc didn't support that.
> > > 
> > > Using XEN_DOMCTL_resumedomain without modifying guest return code  to
> > > resume a
> > > guest is considered to be always safe.  Introduce a function to do
> > > that for
> > > (PV)HVM guests in slow path resume.
> > > 
> > > This patch fixes a bug that denies (PV)HVM slow path resume.  This
> > > will
> > > enable COLO to work properly:  COLO requires HVM guest to start in
> > > the
> > > new context that has been set up by COLO, hence slow path resume is
> > > required.
> > > ===
> > > 
> > > Note that I fix one place in this version from "guest state" to
> > > "guest
> > > return code" in the second paragraph. And that sentence is a big big
> > > assumption that I don't know whether it is true or not --
> > > reverse-engineer from comment before xc_domain_resume and what Linux
> > > does.
> > > 
> > > But the more I think the more I'm not sure if I'm writing the right
> > > thing. I also can't judge what is the right behaviour on the Linux
> > > side.
> > > 
> > > Konrad, can you fact-check the commit message a bit? And maybe you
> > > can
> > > help answer the following questions?
> > > 
> > > 1. If we use fast=0 on PVHVM guest, will it work?
> > 
> > Yes.
> > > 2. If we use fast=0 on HVM guest, will it work?
> > 
> > Yes.
> > 
> > > 
> > > What is worse, when I say "work" I actually have no clear definition
> > > of
> > > it. There doesn't seem to be a defined state that the guest needs to
> > > be.
> > 
> > For PVHVM guests, fast = 0, requires that the guest makes an hypercall
> > to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
> > completed (so Xen has suspended the guest then later resumed it), it
> > would be the guest responsibility to setup Xen infrastructure. As in
> > retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.
> > 
> > For HVM guests, fast = 0, suspends the guests without the guest making
> > any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
> > Afterwards the guest is resumed and continues as usual. No PV drivers -
> > hence no need to re-establish Xen PV infrastructure.
> > 
> 
> Wait, isn't this function about resuming a guest? I'm confused because
> you talk about HV injecting S3 suspend. I guess you wrote the wrong
> thing?
> 
> My guess is below, from the perspective of resuming a guest
> 
>   PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to
>   suspend. So when toolstack uses fast=0, the guest resumes from the
>   hypercall with return code unmodified. Guest then re-setup Xen
>   infrastructure.

Who or what has torn down the existing infrastructure from the guest's life
before the suspend in this case? AFAI Remember a guest expects to return
from SCHEDOP_shutdown(SHUTDOWN_suspend) with return code == 0 in a freshly
minted new domain, but in the resume case it is actually resuming in the
original domain, complete with any evtchn's and grant tables mappings etc
still intact from before it slept.

Perhaps I'm misremembering and the guest is expected to deal with the
possibility of resources already being in place when it re-sets up the
infra?

Ian.
Wei Liu Feb. 19, 2016, 3:16 p.m. UTC | #5
On Fri, Feb 19, 2016 at 02:52:11PM +0000, Ian Campbell wrote:
> On Fri, 2016-02-19 at 14:43 +0000, Wei Liu wrote:
> > On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote:
> > > > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> > > > > Before this patch:
> > > > > 1. suspend
> > > > > a. PVHVM and PV: we use the same way to suspend the guest (send the
> > > > > suspend
> > > > >    request to the guest). If the guest doesn't support evtchn, the
> > > > > xenstore
> > > > >    variant will be used, suspending the guest via XenBus control
> > > > > node.
> > > > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to
> > > > > suspend
> > > > >    the guest
> > > > > 
> > > > > 2. Resume:
> > > > > a. fast path(fast=1)
> > > > >    Do not change the guest state. We call libxl__domain_resume(..,
> > > > > 1) which
> > > > >    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
> > > > >    PV:       modify the return code to 1, and than call the domctl:
> > > > >              XEN_DOMCTL_resumedomain
> > > > >    PVHVM:    same with PV
> > > > >    pure HVM: do nothing in modify_returncode, and than call the
> > > > > domctl:
> > > > >              XEN_DOMCTL_resumedomain
> > > > > b. slow
> > > > >    Used when the guest's state have been changed. Will call
> > > > >    libxl__domain_resume(..., 0) to resume the guest.
> > > > >    PV:       update start info, and reset all secondary CPU states.
> > > > > Than call
> > > > >              the domctl: XEN_DOMCTL_resumedomain
> > > > >    PVHVM:    can not be resumed. You will get the following error
> > > > > message:
> > > > >                  "Cannot resume uncooperative HVM guests"
> > > > >    pure HVM: same with PVHVM
> > > > > 
> > > > > After this patch:
> > > > > 1. suspend
> > > > >    unchanged
> > > > > 
> > > > > 2. Resume
> > > > > a. fast path:
> > > > >    unchanged
> > > > > b. slow
> > > > >    PV:       unchanged
> > > > >    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest.
> > > > > Because we
> > > > >              don't modify the return code, the PV driver will
> > > > > disconnect
> > > > >              and reconnect.
> > > > >              The guest ends up doing the XENMAPSPACE_shared_info
> > > > >              XENMEM_add_to_physmap hypercall and resetting all of
> > > > > its CPU
> > > > >              states to point to the shared_info(well except the
> > > > > ones past 32).
> > > > >              That is the Linux kernel does that - regardless
> > > > > whether the
> > > > >              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
> > > > >    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> > > > > 
> > > > > Under COLO, we will update the guest's state(modify memory, cpu's
> > > > > registers,
> > > > > device status...). In this case, we cannot use the fast path to
> > > > > resume it.
> > > > > Keep the return code 0, and use a slow path to resume the guest.
> > > > > While
> > > > > resuming HVM using slow path is not supported currently, this patch
> > > > > is to
> > > > > make the resume call to not fail.
> > > > > 
> > > > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > > > Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> > > > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > 
> > > > I proposed an alternative commit log in a previous reply:
> > > > 
> > > > ===
> > > > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> > > > 
> > > > Previously it was not possible to resume PVHVM or pure HVM guest in
> > > > slow
> > > > path because libxc didn't support that.
> > > > 
> > > > Using XEN_DOMCTL_resumedomain without modifying guest return code  to
> > > > resume a
> > > > guest is considered to be always safe.  Introduce a function to do
> > > > that for
> > > > (PV)HVM guests in slow path resume.
> > > > 
> > > > This patch fixes a bug that denies (PV)HVM slow path resume.  This
> > > > will
> > > > enable COLO to work properly:  COLO requires HVM guest to start in
> > > > the
> > > > new context that has been set up by COLO, hence slow path resume is
> > > > required.
> > > > ===
> > > > 
> > > > Note that I fix one place in this version from "guest state" to
> > > > "guest
> > > > return code" in the second paragraph. And that sentence is a big big
> > > > assumption that I don't know whether it is true or not --
> > > > reverse-engineer from comment before xc_domain_resume and what Linux
> > > > does.
> > > > 
> > > > But the more I think the more I'm not sure if I'm writing the right
> > > > thing. I also can't judge what is the right behaviour on the Linux
> > > > side.
> > > > 
> > > > Konrad, can you fact-check the commit message a bit? And maybe you
> > > > can
> > > > help answer the following questions?
> > > > 
> > > > 1. If we use fast=0 on PVHVM guest, will it work?
> > > 
> > > Yes.
> > > > 2. If we use fast=0 on HVM guest, will it work?
> > > 
> > > Yes.
> > > 
> > > > 
> > > > What is worse, when I say "work" I actually have no clear definition
> > > > of
> > > > it. There doesn't seem to be a defined state that the guest needs to
> > > > be.
> > > 
> > > For PVHVM guests, fast = 0, requires that the guest makes an hypercall
> > > to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
> > > completed (so Xen has suspended the guest then later resumed it), it
> > > would be the guest responsibility to setup Xen infrastructure. As in
> > > retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.
> > > 
> > > For HVM guests, fast = 0, suspends the guests without the guest making
> > > any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
> > > Afterwards the guest is resumed and continues as usual. No PV drivers -
> > > hence no need to re-establish Xen PV infrastructure.
> > > 
> > 
> > Wait, isn't this function about resuming a guest? I'm confused because
> > you talk about HV injecting S3 suspend. I guess you wrote the wrong
> > thing?
> > 
> > My guess is below, from the perspective of resuming a guest
> > 
> >   PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to
> >   suspend. So when toolstack uses fast=0, the guest resumes from the
> >   hypercall with return code unmodified. Guest then re-setup Xen
> >   infrastructure.
> 
> Who or what has torn down the existing infrastructure from the guest's life
> before the suspend in this case? AFAI Remember a guest expects to return
> from SCHEDOP_shutdown(SHUTDOWN_suspend) with return code == 0 in a freshly
> minted new domain, but in the resume case it is actually resuming in the
> original domain, complete with any evtchn's and grant tables mappings etc
> still intact from before it slept.
> 
> Perhaps I'm misremembering and the guest is expected to deal with the
> possibility of resources already being in place when it re-sets up the
> infra?
> 

Sigh, this is that sort of things that get to my nerves. I should try to
write something down when we come to a conclusion.  I would be happy to
have any definite answer to the expected behaviour of guest.
Extrapolation is not very helpful in the face of some many different
versions of Linux'es and BSDs.

But, if the confusion is only about PVHVM guest with fast=0, we can
forbid that specific combination for now. That should be enough to move
COLO forward.

Wei.

> Ian.
>
Konrad Rzeszutek Wilk Feb. 19, 2016, 4:20 p.m. UTC | #6
On Fri, Feb 19, 2016 at 03:16:27PM +0000, Wei Liu wrote:
> On Fri, Feb 19, 2016 at 02:52:11PM +0000, Ian Campbell wrote:
> > On Fri, 2016-02-19 at 14:43 +0000, Wei Liu wrote:
> > > On Fri, Feb 19, 2016 at 09:15:38AM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Thu, Feb 18, 2016 at 12:13:36PM +0000, Wei Liu wrote:
> > > > > On Thu, Feb 18, 2016 at 10:43:15AM +0800, Wen Congyang wrote:
> > > > > > Before this patch:
> > > > > > 1. suspend
> > > > > > a. PVHVM and PV: we use the same way to suspend the guest (send the
> > > > > > suspend
> > > > > >    request to the guest). If the guest doesn't support evtchn, the
> > > > > > xenstore
> > > > > >    variant will be used, suspending the guest via XenBus control
> > > > > > node.
> > > > > > b. pure HVM: we call xc_domain_shutdown(..., SHUTDOWN_suspend) to
> > > > > > suspend
> > > > > >    the guest
> > > > > > 
> > > > > > 2. Resume:
> > > > > > a. fast path(fast=1)
> > > > > >    Do not change the guest state. We call libxl__domain_resume(..,
> > > > > > 1) which
> > > > > >    calls xc_domain_resume(..., 1 /* fast=1*/) to resume the guest.
> > > > > >    PV:       modify the return code to 1, and than call the domctl:
> > > > > >              XEN_DOMCTL_resumedomain
> > > > > >    PVHVM:    same with PV
> > > > > >    pure HVM: do nothing in modify_returncode, and than call the
> > > > > > domctl:
> > > > > >              XEN_DOMCTL_resumedomain
> > > > > > b. slow
> > > > > >    Used when the guest's state have been changed. Will call
> > > > > >    libxl__domain_resume(..., 0) to resume the guest.
> > > > > >    PV:       update start info, and reset all secondary CPU states.
> > > > > > Than call
> > > > > >              the domctl: XEN_DOMCTL_resumedomain
> > > > > >    PVHVM:    can not be resumed. You will get the following error
> > > > > > message:
> > > > > >                  "Cannot resume uncooperative HVM guests"
> > > > > >    pure HVM: same with PVHVM
> > > > > > 
> > > > > > After this patch:
> > > > > > 1. suspend
> > > > > >    unchanged
> > > > > > 
> > > > > > 2. Resume
> > > > > > a. fast path:
> > > > > >    unchanged
> > > > > > b. slow
> > > > > >    PV:       unchanged
> > > > > >    PVHVM:    call XEN_DOMCTL_resumedomain to resume the guest.
> > > > > > Because we
> > > > > >              don't modify the return code, the PV driver will
> > > > > > disconnect
> > > > > >              and reconnect.
> > > > > >              The guest ends up doing the XENMAPSPACE_shared_info
> > > > > >              XENMEM_add_to_physmap hypercall and resetting all of
> > > > > > its CPU
> > > > > >              states to point to the shared_info(well except the
> > > > > > ones past 32).
> > > > > >              That is the Linux kernel does that - regardless
> > > > > > whether the
> > > > > >              SCHEDOP_shutdown:SHUTDOWN_suspend returns 1 or not.
> > > > > >    Pure HVM: call XEN_DOMCTL_resumedomain to resume the guest.
> > > > > > 
> > > > > > Under COLO, we will update the guest's state(modify memory, cpu's
> > > > > > registers,
> > > > > > device status...). In this case, we cannot use the fast path to
> > > > > > resume it.
> > > > > > Keep the return code 0, and use a slow path to resume the guest.
> > > > > > While
> > > > > > resuming HVM using slow path is not supported currently, this patch
> > > > > > is to
> > > > > > make the resume call to not fail.
> > > > > > 
> > > > > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > > > > Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> > > > > > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > > 
> > > > > I proposed an alternative commit log in a previous reply:
> > > > > 
> > > > > ===
> > > > > Use XEN_DOMCTL_resumedomain to resume (PV)HVM guest in slow path
> > > > > 
> > > > > Previously it was not possible to resume PVHVM or pure HVM guest in
> > > > > slow
> > > > > path because libxc didn't support that.
> > > > > 
> > > > > Using XEN_DOMCTL_resumedomain without modifying guest return code  to
> > > > > resume a
> > > > > guest is considered to be always safe.  Introduce a function to do
> > > > > that for
> > > > > (PV)HVM guests in slow path resume.
> > > > > 
> > > > > This patch fixes a bug that denies (PV)HVM slow path resume.  This
> > > > > will
> > > > > enable COLO to work properly:  COLO requires HVM guest to start in
> > > > > the
> > > > > new context that has been set up by COLO, hence slow path resume is
> > > > > required.
> > > > > ===
> > > > > 
> > > > > Note that I fix one place in this version from "guest state" to
> > > > > "guest
> > > > > return code" in the second paragraph. And that sentence is a big big
> > > > > assumption that I don't know whether it is true or not --
> > > > > reverse-engineer from comment before xc_domain_resume and what Linux
> > > > > does.
> > > > > 
> > > > > But the more I think the more I'm not sure if I'm writing the right
> > > > > thing. I also can't judge what is the right behaviour on the Linux
> > > > > side.
> > > > > 
> > > > > Konrad, can you fact-check the commit message a bit? And maybe you
> > > > > can
> > > > > help answer the following questions?
> > > > > 
> > > > > 1. If we use fast=0 on PVHVM guest, will it work?
> > > > 
> > > > Yes.
> > > > > 2. If we use fast=0 on HVM guest, will it work?
> > > > 
> > > > Yes.
> > > > 
> > > > > 
> > > > > What is worse, when I say "work" I actually have no clear definition
> > > > > of
> > > > > it. There doesn't seem to be a defined state that the guest needs to
> > > > > be.
> > > > 
> > > > For PVHVM guests, fast = 0, requires that the guest makes an hypercall
> > > > to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
> > > > completed (so Xen has suspended the guest then later resumed it), it
> > > > would be the guest responsibility to setup Xen infrastructure. As in
> > > > retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.
> > > > 
> > > > For HVM guests, fast = 0, suspends the guests without the guest making
> > > > any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
> > > > Afterwards the guest is resumed and continues as usual. No PV drivers -
> > > > hence no need to re-establish Xen PV infrastructure.
> > > > 
> > > 
> > > Wait, isn't this function about resuming a guest? I'm confused because
> > > you talk about HV injecting S3 suspend. I guess you wrote the wrong
> > > thing?

I was writing the whole chain - suspend, and then resume. This patch is
about resume - but to get to resume you need to suspend first.

> > > 
> > > My guess is below, from the perspective of resuming a guest
> > > 
> > >   PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to
> > >   suspend. So when toolstack uses fast=0, the guest resumes from the
> > >   hypercall with return code unmodified. Guest then re-setup Xen
> > >   infrastructure.
> > 
> > Who or what has torn down the existing infrastructure from the guest's life
> > before the suspend in this case? AFAI Remember a guest expects to return

The guest. Or it can ignore it and and just re-init all its settings.

> > from SCHEDOP_shutdown(SHUTDOWN_suspend) with return code == 0 in a freshly
> > minted new domain, but in the resume case it is actually resuming in the
> > original domain, complete with any evtchn's and grant tables mappings etc
> > still intact from before it slept.
> > 
> > Perhaps I'm misremembering and the guest is expected to deal with the
> > possibility of resources already being in place when it re-sets up the
> > infra?

Correct - albeit all of them are stale. Thought on some off-chance they may
be set correctly.

> > 
> 
> Sigh, this is that sort of things that get to my nerves. I should try to
> write something down when we come to a conclusion.  I would be happy to
> have any definite answer to the expected behaviour of guest.
> Extrapolation is not very helpful in the face of some many different
> versions of Linux'es and BSDs.
> 
> But, if the confusion is only about PVHVM guest with fast=0, we can
> forbid that specific combination for now. That should be enough to move
> COLO forward.

.. forbid what? PVHVM resuming with fast=0? Why?  Because the guest may
fall on its face?
> 
> Wei.
> 
> > Ian.
> >
Wei Liu Feb. 19, 2016, 4:42 p.m. UTC | #7
On Fri, Feb 19, 2016 at 11:20:08AM -0500, Konrad Rzeszutek Wilk wrote:
[...]
> > > > > > ===
> > > > > > 
> > > > > > Note that I fix one place in this version from "guest state" to
> > > > > > "guest
> > > > > > return code" in the second paragraph. And that sentence is a big big
> > > > > > assumption that I don't know whether it is true or not --
> > > > > > reverse-engineer from comment before xc_domain_resume and what Linux
> > > > > > does.
> > > > > > 
> > > > > > But the more I think the more I'm not sure if I'm writing the right
> > > > > > thing. I also can't judge what is the right behaviour on the Linux
> > > > > > side.
> > > > > > 
> > > > > > Konrad, can you fact-check the commit message a bit? And maybe you
> > > > > > can
> > > > > > help answer the following questions?
> > > > > > 
> > > > > > 1. If we use fast=0 on PVHVM guest, will it work?
> > > > > 
> > > > > Yes.
> > > > > > 2. If we use fast=0 on HVM guest, will it work?
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > > 
> > > > > > What is worse, when I say "work" I actually have no clear definition
> > > > > > of
> > > > > > it. There doesn't seem to be a defined state that the guest needs to
> > > > > > be.
> > > > > 
> > > > > For PVHVM guests, fast = 0, requires that the guest makes an hypercall
> > > > > to  SCHEDOP_shutdown(SHUTDOWN_suspend). After the hypercall has
> > > > > completed (so Xen has suspended the guest then later resumed it), it
> > > > > would be the guest responsibility to setup Xen infrastructure. As in
> > > > > retrieve the shared_info (XENMAPSPACE_shared_info), setup XenBus, etc.
> > > > > 
> > > > > For HVM guests, fast = 0, suspends the guests without the guest making
> > > > > any hypercalls. It is in effect the hypervisor injecting an S3 suspend.
> > > > > Afterwards the guest is resumed and continues as usual. No PV drivers -
> > > > > hence no need to re-establish Xen PV infrastructure.
> > > > > 
> > > > 
> > > > Wait, isn't this function about resuming a guest? I'm confused because
> > > > you talk about HV injecting S3 suspend. I guess you wrote the wrong
> > > > thing?
> 
> I was writing the whole chain - suspend, and then resume. This patch is
> about resume - but to get to resume you need to suspend first.
> 

Yes, of course. I was thinking more about writing it down as comment for
xc_domain_resume, so I wrote something from the perspective of resuming.

If you don't disagree with my extrapolation in previous email we don't
need to quibble about the wording anymore.

> > > > 
> > > > My guess is below, from the perspective of resuming a guest
> > > > 
> > > >   PVHVM guest would have used SCHEDOP_shutdown(SHUTDOWN_suspend) to
> > > >   suspend. So when toolstack uses fast=0, the guest resumes from the
> > > >   hypercall with return code unmodified. Guest then re-setup Xen
> > > >   infrastructure.
> > > 
> > > Who or what has torn down the existing infrastructure from the guest's life
> > > before the suspend in this case? AFAI Remember a guest expects to return
> 
> The guest. Or it can ignore it and and just re-init all its settings.
> 
> > > from SCHEDOP_shutdown(SHUTDOWN_suspend) with return code == 0 in a freshly
> > > minted new domain, but in the resume case it is actually resuming in the
> > > original domain, complete with any evtchn's and grant tables mappings etc
> > > still intact from before it slept.
> > > 
> > > Perhaps I'm misremembering and the guest is expected to deal with the
> > > possibility of resources already being in place when it re-sets up the
> > > infra?
> 
> Correct - albeit all of them are stale. Thought on some off-chance they may
> be set correctly.
> 
> > > 
> > 
> > Sigh, this is that sort of things that get to my nerves. I should try to
> > write something down when we come to a conclusion.  I would be happy to
> > have any definite answer to the expected behaviour of guest.
> > Extrapolation is not very helpful in the face of some many different
> > versions of Linux'es and BSDs.
> > 
> > But, if the confusion is only about PVHVM guest with fast=0, we can
> > forbid that specific combination for now. That should be enough to move
> > COLO forward.
> 
> .. forbid what? PVHVM resuming with fast=0? Why?  Because the guest may
> fall on its face?

Yes, forbid resuming PVHVM with fast=0 if we have no clear definition of
how it works. It's not because guest would fall, it's because we can't
tell which side (the guest or the toolstack) is buggy when the guest
falls.

But it looks like we (you ;-) ) have clear idea of how it works, we
(you) just need to write it down.

Wei.

> > 
> > Wei.
> > 
> > > Ian.
> > >
Konrad Rzeszutek Wilk Feb. 19, 2016, 5:16 p.m. UTC | #8
> > .. forbid what? PVHVM resuming with fast=0? Why?  Because the guest may
> > fall on its face?
> 
> Yes, forbid resuming PVHVM with fast=0 if we have no clear definition of
> how it works. It's not because guest would fall, it's because we can't
> tell which side (the guest or the toolstack) is buggy when the guest
> falls.
> 
> But it looks like we (you ;-) ) have clear idea of how it works, we
> (you) just need to write it down.


Where? The header file where SHUTDOWN_suspend is introduced?

Or the libxc ones?
Wei Liu Feb. 19, 2016, 5:21 p.m. UTC | #9
On Fri, Feb 19, 2016 at 12:16:31PM -0500, Konrad Rzeszutek Wilk wrote:
> > > .. forbid what? PVHVM resuming with fast=0? Why?  Because the guest may
> > > fall on its face?
> > 
> > Yes, forbid resuming PVHVM with fast=0 if we have no clear definition of
> > how it works. It's not because guest would fall, it's because we can't
> > tell which side (the guest or the toolstack) is buggy when the guest
> > falls.
> > 
> > But it looks like we (you ;-) ) have clear idea of how it works, we
> > (you) just need to write it down.
> 
> 
> Where? The header file where SHUTDOWN_suspend is introduced?
> 
> Or the libxc ones?

I have no opinion whether Xen public header should contain such text,
but I do wish to have better document for xc_domain_resume.  Basically
it is just turning what you wrote in this thread to comment for
xc_domain_resume.

Wei.
diff mbox

Patch

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index e692b81..4eedf87 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -108,6 +108,26 @@  static int xc_domain_resume_cooperative(xc_interface *xch, uint32_t domid)
     return do_domctl(xch, &domctl);
 }
 
+static int xc_domain_resume_hvm(xc_interface *xch, uint32_t domid)
+{
+    DECLARE_DOMCTL;
+
+    /*
+     * The domctl XEN_DOMCTL_resumedomain unpause each vcpu. After
+     * the domctl, the guest will run.
+     *
+     * If it is PVHVM, the guest called the hypercall
+     *    SCHEDOP_shutdown:SHUTDOWN_suspend
+     * to suspend itself. We don't modify the return code, so the PV driver
+     * will disconnect and reconnect.
+     *
+     * If it is a HVM, the guest will continue running.
+     */
+    domctl.cmd = XEN_DOMCTL_resumedomain;
+    domctl.domain = domid;
+    return do_domctl(xch, &domctl);
+}
+
 static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
 {
     DECLARE_DOMCTL;
@@ -137,10 +157,7 @@  static int xc_domain_resume_any(xc_interface *xch, uint32_t domid)
      */
 #if defined(__i386__) || defined(__x86_64__)
     if ( info.hvm )
-    {
-        ERROR("Cannot resume uncooperative HVM guests");
-        return rc;
-    }
+        return xc_domain_resume_hvm(xch, domid);
 
     if ( xc_domain_get_guest_width(xch, domid, &dinfo->guest_width) != 0 )
     {