diff mbox

[RFC] libxc: Document xc_domain_resume

Message ID 1456775966-8475-1-git-send-email-konrad.wilk@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Konrad Rzeszutek Wilk Feb. 29, 2016, 7:59 p.m. UTC
Document the save and suspend mechanism.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxc/include/xenctrl.h | 52 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

Comments

Andrew Cooper March 1, 2016, 12:16 a.m. UTC | #1
On 29/02/2016 19:59, Konrad Rzeszutek Wilk wrote:
> Document the save and suspend mechanism.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxc/include/xenctrl.h | 52 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
>
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 150d727..9778947 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -565,6 +565,58 @@ int xc_domain_destroy(xc_interface *xch,
>   * This function resumes a suspended domain. The domain should have
>   * been previously suspended.
>   *
> + * Note that there are 'xc_domain_suspend' as suspending a domain
> + * is quite the endeavour. As such this long comment will describe the
> + * suspend and resume path.

I am not sure this second sentence is useful.

> + *
> + * For the purpose of this explanation there are three guests:
> + * PV (using hypercalls for privilgied operations), HVM
> + * (fully hardware virtualized guests using emulated devices for everything),
> + * and PVHVM (hardware virtualized guest with PV drivers).

PV aware with hardware virtualisation.  It is perfectly possible to be
"PV aware" without having blkfront and netfront drivers.  I realise this
is a grey area, but "PV drivers" does tend to imply the blk/net
protocols rather than the full "PV awareness".

> + *
> + * HVM guest are the simplest - they suspend via S3 and resume from
> + * S3. Upon resume they have to re-negotiate with the emulated devices.

And S4.

> + *
> + * PV and PVHVM communate via via hypercalls for suspend (and resume).
> + * For suspend the toolstack initiaties the process by writting an value in
> + * XenBus "control/shutdown" with the string "suspend".

I feel it is worth commenting about the stupidity of this protocol
whereby the ack mechanism is to clear the key, and the only reject/fail
mechanism is to leave the key unmodified and wait for the toolstack to
timeout.  (Similarly memory/target for ballooning.)

> + *
> + * The PV guest stashes anything it deems neccessary in 'struct start_info'
> + * in case of failure (PVHVM may ignore this) and calls the

What do you mean for the failure case here?

> + * SCHEDOP_shutdown::SHUTDOWN_suspend  hypercall (for PV as argument it
> + * passes the MFN to 'struct start_info').
> + *
> + * And then the guest is suspended.
> + *
> + * At this point the guest may be resumed on the same host under the same
> + * domain (checkpointing or suspending failed), or on a different host.

Slightly misleading.

The guest may be resumed in the same domain (in which case domid is the
same and all gubbins are still in place), or in a new domain; likely a
different domid, possibly a different host (but not impossible to switch
host and retain the same numeric domid) at which point all gubbins are lost.

> + *
> + * The checkpointing or notifying an guest that the suspend failed is by

"a guest"

> + * having the SCHEDOP_shutdown::SHUTDOWN_suspend hypercall return a non-zero
> + * value.

Do we have to document it as "suspend failed"?  In the case of a
checkpoint, it really isn't a failure.

> + *
> + * The PV and PVHVM resume path are similar. For PV it would be similar to bootup
> + * - figure out where the 'struct start_info' is (or if the suspend was
> + * cancelled aka checkpointed - reuse the saved values).

PV isn't similar to boot.

On boot, PV guests get start_info %rsi (or %esi) from the domain
builder.  In the case of suspend (failed or otherwise), start_info is in
%rdx (or %edx), (mutated as applicable by the safe/restore logic).

For HVM, there is no start info relevant for suspend/resume.

~Andrew

> + *
> + * From here on they differ depending whether the guest is PV or PVHVM
> + * in specifics but follow overall the same path:
> + *  - PV: Bringing up the vCPUS,
> + *  - PVHVM: Setup vector callback,
> + *  - Bring up vCPU runstates,
> + *  - Remap the grant tables if checkpointing or setup from scratch,
> + *
> + *
> + * If the resume was not checkpointing (or if suspend was succesful) we would
> + * setup the PV timers and the different PV events. Lastly the PV drivers
> + * re-negotiate with the backend.
> + *
> + * This function would return before the guest started resuming. That is
> + * the guest would be in non-running state and its vCPU context would be
> + * in the the SCHEDOP_shutdown::SHUTDOWN_suspend hypercall return path
> + * (for PV and PVHVM). For HVM it would be in would be in QEMU emulated
> + * BIOS handling S3 suspend.
> + *
>   * @parm xch a handle to an open hypervisor interface
>   * @parm domid the domain id to resume
>   * @parm fast use cooperative resume (guest must support this)
Wei Liu March 1, 2016, 1:43 p.m. UTC | #2
On Mon, Feb 29, 2016 at 02:59:26PM -0500, Konrad Rzeszutek Wilk wrote:
> Document the save and suspend mechanism.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxc/include/xenctrl.h | 52 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 150d727..9778947 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -565,6 +565,58 @@ int xc_domain_destroy(xc_interface *xch,
>   * This function resumes a suspended domain. The domain should have
>   * been previously suspended.
>   *
> + * Note that there are 'xc_domain_suspend' as suspending a domain
> + * is quite the endeavour. As such this long comment will describe the
> + * suspend and resume path.
> + *
> + * For the purpose of this explanation there are three guests:
> + * PV (using hypercalls for privilgied operations), HVM
> + * (fully hardware virtualized guests using emulated devices for everything),
> + * and PVHVM (hardware virtualized guest with PV drivers).
> + *
> + * HVM guest are the simplest - they suspend via S3 and resume from
> + * S3. Upon resume they have to re-negotiate with the emulated devices.
> + *
> + * PV and PVHVM communate via via hypercalls for suspend (and resume).

"communicate"?

Wei.
diff mbox

Patch

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 150d727..9778947 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -565,6 +565,58 @@  int xc_domain_destroy(xc_interface *xch,
  * This function resumes a suspended domain. The domain should have
  * been previously suspended.
  *
+ * Note that there are 'xc_domain_suspend' as suspending a domain
+ * is quite the endeavour. As such this long comment will describe the
+ * suspend and resume path.
+ *
+ * For the purpose of this explanation there are three guests:
+ * PV (using hypercalls for privilgied operations), HVM
+ * (fully hardware virtualized guests using emulated devices for everything),
+ * and PVHVM (hardware virtualized guest with PV drivers).
+ *
+ * HVM guest are the simplest - they suspend via S3 and resume from
+ * S3. Upon resume they have to re-negotiate with the emulated devices.
+ *
+ * PV and PVHVM communate via via hypercalls for suspend (and resume).
+ * For suspend the toolstack initiaties the process by writting an value in
+ * XenBus "control/shutdown" with the string "suspend".
+ *
+ * The PV guest stashes anything it deems neccessary in 'struct start_info'
+ * in case of failure (PVHVM may ignore this) and calls the
+ * SCHEDOP_shutdown::SHUTDOWN_suspend  hypercall (for PV as argument it
+ * passes the MFN to 'struct start_info').
+ *
+ * And then the guest is suspended.
+ *
+ * At this point the guest may be resumed on the same host under the same
+ * domain (checkpointing or suspending failed), or on a different host.
+ *
+ * The checkpointing or notifying an guest that the suspend failed is by
+ * having the SCHEDOP_shutdown::SHUTDOWN_suspend hypercall return a non-zero
+ * value.
+ *
+ * The PV and PVHVM resume path are similar. For PV it would be similar to bootup
+ * - figure out where the 'struct start_info' is (or if the suspend was
+ * cancelled aka checkpointed - reuse the saved values).
+ *
+ * From here on they differ depending whether the guest is PV or PVHVM
+ * in specifics but follow overall the same path:
+ *  - PV: Bringing up the vCPUS,
+ *  - PVHVM: Setup vector callback,
+ *  - Bring up vCPU runstates,
+ *  - Remap the grant tables if checkpointing or setup from scratch,
+ *
+ *
+ * If the resume was not checkpointing (or if suspend was succesful) we would
+ * setup the PV timers and the different PV events. Lastly the PV drivers
+ * re-negotiate with the backend.
+ *
+ * This function would return before the guest started resuming. That is
+ * the guest would be in non-running state and its vCPU context would be
+ * in the the SCHEDOP_shutdown::SHUTDOWN_suspend hypercall return path
+ * (for PV and PVHVM). For HVM it would be in would be in QEMU emulated
+ * BIOS handling S3 suspend.
+ *
  * @parm xch a handle to an open hypervisor interface
  * @parm domid the domain id to resume
  * @parm fast use cooperative resume (guest must support this)