[v3,2/2] docs/designs: Add a design document for migration of xenstore data
diff mbox series

Message ID 20200128122823.12920-3-pdurrant@amazon.com
State Superseded
Headers show
Series
  • docs: Migration design documents
Related show

Commit Message

Paul Durrant Jan. 28, 2020, 12:28 p.m. UTC
This patch details proposes extra migration data and xenstore protocol
extensions to support non-cooperative live migration of guests.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wl@xen.org>

v3:
 - New in v3
---
 docs/designs/xenstore-migration.md | 122 +++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)
 create mode 100644 docs/designs/xenstore-migration.md

Comments

Jürgen Groß Jan. 28, 2020, 1:35 p.m. UTC | #1
On 28.01.20 13:28, Paul Durrant wrote:
> This patch details proposes extra migration data and xenstore protocol
> extensions to support non-cooperative live migration of guests.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Wei Liu <wl@xen.org>
> 
> v3:
>   - New in v3
> ---
>   docs/designs/xenstore-migration.md | 122 +++++++++++++++++++++++++++++
>   1 file changed, 122 insertions(+)
>   create mode 100644 docs/designs/xenstore-migration.md
> 
> diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
> new file mode 100644
> index 0000000000..9020b6ff9a
> --- /dev/null
> +++ b/docs/designs/xenstore-migration.md
> @@ -0,0 +1,122 @@
> +**watch data**
> +
> +
> +`<path>|<token>|`
> +
> +`<path>` again is considered relative and, together with `<token>`, should
> +be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
> +Note that `<path>` must not be a *special* value (beginning with `@`).

Why not? This is possible for a guest, so it should be possible to
migrate such a watch.

Today it might not make any sense, but we should not block any future
special values which might make sense for a guest to use.

> +
> +
> +### Protocol Extension
> +
> +The `WATCH` operation does not allow specification of a `<domid>`; it is
> +assumed that the watch pertains to the domain that owns the shared ring
> +over which the operation is passed. Hence, for the tool-stack to be able
> +to register a watch on behalf of a domain a new operation is needed:
> +
> +```
> +ADD_DOMAIN_WATCHES      <domid>|<watch>|+
> +
> +Adds watches on behalf of the specified domain.
> +
> +<watch> is a NUL separated tuple of <path>|<token>. <path> must not be
> +@<wspecial>, otherwise the semantics of this operation are identical to
> +the domain issuing WATCH <path>|<token>|.

I wouldn't exclude @<wspecial>.


Juergen
Durrant, Paul Jan. 28, 2020, 1:45 p.m. UTC | #2
> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 28 January 2020 13:36
> To: Durrant, Paul <pdurrant@amazon.co.uk>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Julien Grall
> <julien@xen.org>; Wei Liu <wl@xen.org>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; George Dunlap <George.Dunlap@eu.citrix.com>;
> Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>
> Subject: Re: [Xen-devel] [PATCH v3 2/2] docs/designs: Add a design
> document for migration of xenstore data
> 
> On 28.01.20 13:28, Paul Durrant wrote:
> > This patch details proposes extra migration data and xenstore protocol
> > extensions to support non-cooperative live migration of guests.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Julien Grall <julien@xen.org>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Wei Liu <wl@xen.org>
> >
> > v3:
> >   - New in v3
> > ---
> >   docs/designs/xenstore-migration.md | 122 +++++++++++++++++++++++++++++
> >   1 file changed, 122 insertions(+)
> >   create mode 100644 docs/designs/xenstore-migration.md
> >
> > diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-
> migration.md
> > new file mode 100644
> > index 0000000000..9020b6ff9a
> > --- /dev/null
> > +++ b/docs/designs/xenstore-migration.md
> > @@ -0,0 +1,122 @@
> > +**watch data**
> > +
> > +
> > +`<path>|<token>|`
> > +
> > +`<path>` again is considered relative and, together with `<token>`,
> should
> > +be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
> > +Note that `<path>` must not be a *special* value (beginning with `@`).
> 
> Why not? This is possible for a guest, so it should be possible to
> migrate such a watch.
> 
> Today it might not make any sense, but we should not block any future
> special values which might make sense for a guest to use.
> 

Ok. I was just limiting things to what is actually needed. If you think there is merit in allowing them then I have no objection. I'll remove the exclusions.

  Paul
Wei Liu Jan. 28, 2020, 1:46 p.m. UTC | #3
On Tue, Jan 28, 2020 at 12:28:23PM +0000, Paul Durrant wrote:
> +```
> +0     1     2     3     4     5     6     7 octet
> ++------------------------+------------------------+
> +| type                   | record specific data   |
> ++------------------------+                        |
> +...
> ++-------------------------------------------------+
> +```
> +
> +
> +| Field | Description |
> +|---|---|
> +| `type` | 0x00000000: invalid |
> +|        | 0x00000001: node data |
> +|        | 0x00000002: watch data |
> +|        | 0x00000003 - 0xFFFFFFFF: reserved for future use |
> +
> +
> +where data is always in the form of a NUL separated and terminated tuple
> +as follows
> +

Is there any padding requirement for a record? I take it there isn't?

Wei.
Durrant, Paul Jan. 28, 2020, 2:15 p.m. UTC | #4
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Wei
> Liu
> Sent: 28 January 2020 13:46
> To: Durrant, Paul <pdurrant@amazon.co.uk>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Julien Grall
> <julien@xen.org>; Wei Liu <wl@xen.org>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; George Dunlap <George.Dunlap@eu.citrix.com>;
> Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH v3 2/2] docs/designs: Add a design
> document for migration of xenstore data
> 
> On Tue, Jan 28, 2020 at 12:28:23PM +0000, Paul Durrant wrote:
> > +```
> > +0     1     2     3     4     5     6     7 octet
> > ++------------------------+------------------------+
> > +| type                   | record specific data   |
> > ++------------------------+                        |
> > +...
> > ++-------------------------------------------------+
> > +```
> > +
> > +
> > +| Field | Description |
> > +|---|---|
> > +| `type` | 0x00000000: invalid |
> > +|        | 0x00000001: node data |
> > +|        | 0x00000002: watch data |
> > +|        | 0x00000003 - 0xFFFFFFFF: reserved for future use |
> > +
> > +
> > +where data is always in the form of a NUL separated and terminated
> tuple
> > +as follows
> > +
> 
> Is there any padding requirement for a record? I take it there isn't?
> 

No, the padding is at the generic record level (already part of the spec).

  Paul

> Wei.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

Patch
diff mbox series

diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
new file mode 100644
index 0000000000..9020b6ff9a
--- /dev/null
+++ b/docs/designs/xenstore-migration.md
@@ -0,0 +1,122 @@ 
+# Xenstore Migration
+
+## Background
+
+The design for *Non-Cooperative Migration of Guests*[1] explains that extra
+save records are required in the migrations stream to allow a guest running
+PV drivers to be migrated without its co-operation. Moreover the save
+records must include details of registered xenstore watches as well as
+content; information that cannot currently be recovered from `xenstored`,
+and hence some extension to the xenstore protocol[2] will also be required.
+
+The *libxenlight Domain Image Format* specification[3] already defines a
+record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
+transferring xenstore data pertaining to the domain directly as it is
+specified such that keys are relative to the path
+`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
+define at least one new save record type.
+
+## Proposal
+
+### New Save Record
+
+A new mandatory record type should be defined within the libxenlight Domain
+Image Format:
+
+`0x00000007: DOMAIN_XENSTORE_DATA`
+
+The format of each of these new records should be as follows:
+
+
+```
+0     1     2     3     4     5     6     7 octet
++------------------------+------------------------+
+| type                   | record specific data   |
++------------------------+                        |
+...
++-------------------------------------------------+
+```
+
+
+| Field | Description |
+|---|---|
+| `type` | 0x00000000: invalid |
+|        | 0x00000001: node data |
+|        | 0x00000002: watch data |
+|        | 0x00000003 - 0xFFFFFFFF: reserved for future use |
+
+
+where data is always in the form of a NUL separated and terminated tuple
+as follows
+
+
+**node data**
+
+
+`<path>|<value>|<perm-as-string>|`
+
+
+`<path>` is considered relative to the domain path `/local/domain/$domid`
+and hence must not begin with `/`.
+`<path>` and `<value>` should be suitable to formulate a `WRITE` operation
+to the receiving xenstore and `<perm-as-string>` should be similarly suitable
+to formulate a subsequent `SET_PERMS` operation.
+
+**watch data**
+
+
+`<path>|<token>|`
+
+`<path>` again is considered relative and, together with `<token>`, should
+be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
+Note that `<path>` must not be a *special* value (beginning with `@`).
+
+
+### Protocol Extension
+
+The `WATCH` operation does not allow specification of a `<domid>`; it is
+assumed that the watch pertains to the domain that owns the shared ring
+over which the operation is passed. Hence, for the tool-stack to be able
+to register a watch on behalf of a domain a new operation is needed:
+
+```
+ADD_DOMAIN_WATCHES      <domid>|<watch>|+
+
+Adds watches on behalf of the specified domain.
+
+<watch> is a NUL separated tuple of <path>|<token>. <path> must not be
+@<wspecial>, otherwise the semantics of this operation are identical to
+the domain issuing WATCH <path>|<token>|.
+```
+
+The watch information for a domain also needs to be extracted from the
+sending xenstored so the following operation is also needed:
+
+```
+GET_DOMAIN_WATCHES      <domid>|<index>   <gencnt>|<watch>|* 
+
+Gets the list of watches that are currently registered for the domain.
+
+<watch> is a NUL separated tuple of <path>|<token>. The sub-list returned
+will start at <index> into the the overall list of watches and may be
+truncated such that the returned data fits within XENSTORE_PAYLOAD_MAX.
+If <index> is beyond the end of the overall list then the returned sub-
+list will be empty. If the value of <gencnt> changes then it indicates
+that the overall watch list has changed and thus it may be necessary
+to re-issue the operation for previous values of <index>.
+```
+
+It may also be desirable to state in the protocol specification that
+the `INTRODUCE` operation should not clear the `<mfn>` specified such that
+a `RELEASE` operation followed by an `INTRODUCE` operation form an
+idempotent pair. The current implementation of *C xentored* does this
+(in the `domain_conn_reset()` function) but this could be dropped as this
+behaviour is not currently specified and the page will always be zeroed
+for a newly created domain.
+
+
+* * *
+
+[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
+[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt
+[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/specs/libxl-migration-stream.pandoc