diff mbox series

[v6,2/2] docs/designs: Add a design document for migration of xenstore data

Message ID 20200305173041.5141-3-pdurrant@amzn.com (mailing list archive)
State New, archived
Headers show
Series docs: Migration design documents | expand

Commit Message

pdurrant@amzn.com March 5, 2020, 5:30 p.m. UTC
From: Paul Durrant <pdurrant@amazon.com>

This patch details proposes extra migration data and xenstore protocol
extensions to support non-cooperative live migration of guests.

NOTE: doc/misc/xenstore.txt is also amened to replace the <mfn> term
      for the INTRODUCE operation with the <gfn>, since this is what
      it actually is.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wl@xen.org>

v6:
 - Addressed comments from Julien

v5:
 - Add QUIESCE
 - Make semantics of <index> in GET_DOMAIN_WATCHES more clear

v4:
 - Drop the restrictions on special paths

v3:
 - New in v3
---
 docs/designs/xenstore-migration.md | 171 +++++++++++++++++++++++++++++
 docs/misc/xenstore.txt             |   6 +-
 2 files changed, 174 insertions(+), 3 deletions(-)
 create mode 100644 docs/designs/xenstore-migration.md

Comments

Julien Grall March 6, 2020, 12:02 p.m. UTC | #1
Hi Paul,

On 05/03/2020 17:30, pdurrant@amzn.com wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This patch details proposes extra migration data and xenstore protocol
> extensions to support non-cooperative live migration of guests.
> 
> NOTE: doc/misc/xenstore.txt is also amened to replace the <mfn> term
>        for the INTRODUCE operation with the <gfn>, since this is what
>        it actually is.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Wei Liu <wl@xen.org>
> 
> v6:
>   - Addressed comments from Julien
> 
> v5:
>   - Add QUIESCE
>   - Make semantics of <index> in GET_DOMAIN_WATCHES more clear
> 
> v4:
>   - Drop the restrictions on special paths
> 
> v3:
>   - New in v3
> ---
>   docs/designs/xenstore-migration.md | 171 +++++++++++++++++++++++++++++
>   docs/misc/xenstore.txt             |   6 +-
>   2 files changed, 174 insertions(+), 3 deletions(-)
>   create mode 100644 docs/designs/xenstore-migration.md
> 
> diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
> new file mode 100644
> index 0000000000..7e61f462f0
> --- /dev/null
> +++ b/docs/designs/xenstore-migration.md
> @@ -0,0 +1,171 @@
> +# Xenstore Migration
> +
> +## Background
> +
> +The design for *Non-Cooperative Migration of Guests*[1] explains that extra
> +save records are required in the migrations stream to allow a guest running
> +PV drivers to be migrated without its co-operation. Moreover the save
> +records must include details of registered xenstore watches as well as
> +content; information that cannot currently be recovered from `xenstored`,
> +and hence some extension to the xenstore protocol[2] will also be required.
> +
> +The *libxenlight Domain Image Format* specification[3] already defines a
> +record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
> +transferring xenstore data pertaining to the domain directly as it is
> +specified such that keys are relative to the path
> +`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
> +define at least one new save record type.
> +
> +## Proposal
> +
> +### New Save Record
> +
> +A new mandatory record type should be defined within the libxenlight Domain
> +Image Format:
> +
> +`0x00000007: DOMAIN_XENSTORE_DATA`
> +
> +The format of each of these new records should be as follows:
> +
> +
> +```
> +0     1     2     3     4     5     6     7 octet
> ++------------------------+------------------------+
> +| type                   | record specific data   |
> ++------------------------+                        |
> +...
> ++-------------------------------------------------+
> +```
> +
> +NB: The record data does not contain a length because the libxenlight record
> +header specifies the length.
> +
> +
> +| Field  | Description                                      |
> +|--------|--------------------------------------------------|
> +| `type` | 0x00000000: invalid                              |
> +|        | 0x00000001: node data                            |
> +|        | 0x00000002: watch data                           |
> +|        | 0x00000003: transaction data                     |
> +|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
> +
> +
> +where data is always in the form of a tuple as follows

NIT: missing full stop.

> +
> +
> +**node data**
> +
> +
> +`<path>|<perm-count>|<perm-as-string>|+<value|>`
> +
In the xenstore spec, | means a 'nul (zero) byte'. Are you using the 
same terminology here? If so, it may be worth mentioning it.

> +
> +`<path>` and `<value|>` should be suitable to formulate a `WRITE` operation
> +to the receiving xenstored and the `<perm-as-string>|+` list should be
> +similarly suitable to formulate a subsequent `SET_PERMS` operation.
> +`<perm-count>` specifies the number of entries in the `<perm-as-string>|+`
> +list and `<value|>` must be placed at the end because it may contain NUL
> +octets.

What is the size of <perm-count>? Also, we may want perm-count to be 
aligned to its size so we don't have to worry of unaligned access.

How about moving <perm-count> at the beginning of the data blob?

> +
> +
> +**watch data**
> +
> +
> +`<path>|<token>|`
> +
> +`<path>` again is absolute and, together with `<token>`, should
> +be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
> +
> +
> +**transaction data**
> +
> +
> +`<transid-count>|<transid>|+`
> +
> +Each `<transid>` should be a uint32_t value represented as unsigned decimal
> +suitable for passing as a *tx_id* to the re-defined `TRANSACTION_START`
> +operation (see below). `<transid-count>` is the number of entries in the
> +`<transid>|+` list.

What is the size of <transid-count>?. Also the | suggests we would have 
a NUL byte between the two numbers. This would mean the <transid> will 
not be aligned to its size.

> +
> +
> +### Protocol Extension
> +
> +Before xenstore state is migrated it is necessary to wait for any pending
> +reads, writes, watch registrations etc. to complete, and also to make sure
> +that xenstored does not start processing any new requests (so that new
> +requests remain pending on the shared ring for subsequent processing on the
> +new host). Hence the following operation is needed:
> +
> +```
> +QUIESCE                 <domid>|
> +
> +Complete processing of any request issued by the specified domain, and
> +do not process any further requests from the shared ring.
> +``` > +
> +The `WATCH` operation does not allow specification of a `<domid>`; it is
> +assumed that the watch pertains to the domain that owns the shared ring
> +over which the operation is passed. Hence, for the tool-stack to be able
> +to register a watch on behalf of a domain a new operation is needed:
> +
> +```
> +ADD_DOMAIN_WATCHES      <domid>|<watch>|+
> +
> +Adds watches on behalf of the specified domain.
> +
> +<watch> is a NUL separated tuple of <path>|<token>. The semantics of this
> +operation are identical to the domain issuing WATCH <path>|<token>| for
> +each <watch>.
> +```
> +
> +The watch information for a domain also needs to be extracted from the
> +sending xenstored so the following operation is also needed:
> +
> +```
> +GET_DOMAIN_WATCHES      <domid>|<index>   <gencnt>|<watch>|*
> +
> +Gets the list of watches that are currently registered for the domain.
> +
> +<watch> is a NUL separated tuple of <path>|<token>. The sub-list returned
> +will start at <index> items into the the overall list of watches and may
> +be truncated (at a <watch> boundary) such that the returned data fits
> +within XENSTORE_PAYLOAD_MAX.
> +
> +If <index> is beyond the end of the overall list then the returned sub-
> +list will be empty. If the value of <gencnt> changes then it indicates
> +that the overall watch list has changed and thus it may be necessary
> +to re-issue the operation for previous values of <index>.
> +```
> +
> +To deal with transactions that were pending when the domain is migrated
> +it is necessary to start transactions with the same `<trans-id>` in the
> +receiving xenstored but for them to result in an `EAGAIN` when the
> +`TRANSACTION_END` operation is peformed. Thus the `TRANSACTION_START`
> +operation needs to be re-defined as follows:
> +
> +```
> +TRANSACTION_START	|			<transid>|
> +	<transid> is an opaque uint32_t represented as unsigned decimal.
> +    If tx_id is 0 for this operation then a new transaction will be started
> +    with a tx_id allocated by xenstored. If a non-0 tx_id is specified then
> +    a transaction with that tx_id will be started and automatically marked
> +    `conflicting'. The tx_id will always be passed back in <transid>.
> +    After this, the tx_id may be used in the request header field for
> +    other operations.
> +    When a transaction is started whole db is copied; reads and writes
> +    happen on the copy.
> +```

The transaction ID are per connection. As the toolstack will issue the 
command on restore, you would reserve the ID for the wrong domain. So I 
think you want to introduce a new command maybe 
RESERVE_DOMAIN_TRANSACTIONS?

Cheers,
Paul Durrant March 6, 2020, 12:19 p.m. UTC | #2
> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 06 March 2020 12:03
> To: pdurrant@amzn.com; xen-devel@lists.xenproject.org
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <George.Dunlap@eu.citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
> <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>
> Subject: RE: [EXTERNAL][PATCH v6 2/2] docs/designs: Add a design document for migration of xenstore
> data
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Hi Paul,
> 
> On 05/03/2020 17:30, pdurrant@amzn.com wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > This patch details proposes extra migration data and xenstore protocol
> > extensions to support non-cooperative live migration of guests.
> >
> > NOTE: doc/misc/xenstore.txt is also amened to replace the <mfn> term
> >        for the INTRODUCE operation with the <gfn>, since this is what
> >        it actually is.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Julien Grall <julien@xen.org>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Wei Liu <wl@xen.org>
> >
> > v6:
> >   - Addressed comments from Julien
> >
> > v5:
> >   - Add QUIESCE
> >   - Make semantics of <index> in GET_DOMAIN_WATCHES more clear
> >
> > v4:
> >   - Drop the restrictions on special paths
> >
> > v3:
> >   - New in v3
> > ---
> >   docs/designs/xenstore-migration.md | 171 +++++++++++++++++++++++++++++
> >   docs/misc/xenstore.txt             |   6 +-
> >   2 files changed, 174 insertions(+), 3 deletions(-)
> >   create mode 100644 docs/designs/xenstore-migration.md
> >
> > diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
> > new file mode 100644
> > index 0000000000..7e61f462f0
> > --- /dev/null
> > +++ b/docs/designs/xenstore-migration.md
> > @@ -0,0 +1,171 @@
> > +# Xenstore Migration
> > +
> > +## Background
> > +
> > +The design for *Non-Cooperative Migration of Guests*[1] explains that extra
> > +save records are required in the migrations stream to allow a guest running
> > +PV drivers to be migrated without its co-operation. Moreover the save
> > +records must include details of registered xenstore watches as well as
> > +content; information that cannot currently be recovered from `xenstored`,
> > +and hence some extension to the xenstore protocol[2] will also be required.
> > +
> > +The *libxenlight Domain Image Format* specification[3] already defines a
> > +record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
> > +transferring xenstore data pertaining to the domain directly as it is
> > +specified such that keys are relative to the path
> > +`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
> > +define at least one new save record type.
> > +
> > +## Proposal
> > +
> > +### New Save Record
> > +
> > +A new mandatory record type should be defined within the libxenlight Domain
> > +Image Format:
> > +
> > +`0x00000007: DOMAIN_XENSTORE_DATA`
> > +
> > +The format of each of these new records should be as follows:
> > +
> > +
> > +```
> > +0     1     2     3     4     5     6     7 octet
> > ++------------------------+------------------------+
> > +| type                   | record specific data   |
> > ++------------------------+                        |
> > +...
> > ++-------------------------------------------------+
> > +```
> > +
> > +NB: The record data does not contain a length because the libxenlight record
> > +header specifies the length.
> > +
> > +
> > +| Field  | Description                                      |
> > +|--------|--------------------------------------------------|
> > +| `type` | 0x00000000: invalid                              |
> > +|        | 0x00000001: node data                            |
> > +|        | 0x00000002: watch data                           |
> > +|        | 0x00000003: transaction data                     |
> > +|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
> > +
> > +
> > +where data is always in the form of a tuple as follows
> 
> NIT: missing full stop.
> 

Ok.

> > +
> > +
> > +**node data**
> > +
> > +
> > +`<path>|<perm-count>|<perm-as-string>|+<value|>`
> > +
> In the xenstore spec, | means a 'nul (zero) byte'. Are you using the
> same terminology here? If so, it may be worth mentioning it.
> 

Ok, I'll cut'n'paste the format definitions in for ease of reference I think.

> > +
> > +`<path>` and `<value|>` should be suitable to formulate a `WRITE` operation
> > +to the receiving xenstored and the `<perm-as-string>|+` list should be
> > +similarly suitable to formulate a subsequent `SET_PERMS` operation.
> > +`<perm-count>` specifies the number of entries in the `<perm-as-string>|+`
> > +list and `<value|>` must be placed at the end because it may contain NUL
> > +octets.
> 
> What is the size of <perm-count>? Also, we may want perm-count to be
> aligned to its size so we don't have to worry of unaligned access.
> 
> How about moving <perm-count> at the beginning of the data blob?
>

Not sure we really need to care about alignment... I'll leave as-is for the moment.
 
> > +
> > +
> > +**watch data**
> > +
> > +
> > +`<path>|<token>|`
> > +
> > +`<path>` again is absolute and, together with `<token>`, should
> > +be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
> > +
> > +
> > +**transaction data**
> > +
> > +
> > +`<transid-count>|<transid>|+`
> > +
> > +Each `<transid>` should be a uint32_t value represented as unsigned decimal
> > +suitable for passing as a *tx_id* to the re-defined `TRANSACTION_START`
> > +operation (see below). `<transid-count>` is the number of entries in the
> > +`<transid>|+` list.
> 
> What is the size of <transid-count>?. Also the | suggests we would have
> a NUL byte between the two numbers. This would mean the <transid> will
> not be aligned to its size.
> 

I was considering transid as text encoded (to match the TRANSACTION_START operation) but I agree the mix is odd. Perhaps I'll just use structures for the migration records instead.

> > +
> > +
> > +### Protocol Extension
> > +
> > +Before xenstore state is migrated it is necessary to wait for any pending
> > +reads, writes, watch registrations etc. to complete, and also to make sure
> > +that xenstored does not start processing any new requests (so that new
> > +requests remain pending on the shared ring for subsequent processing on the
> > +new host). Hence the following operation is needed:
> > +
> > +```
> > +QUIESCE                 <domid>|
> > +
> > +Complete processing of any request issued by the specified domain, and
> > +do not process any further requests from the shared ring.
> > +``` > +
> > +The `WATCH` operation does not allow specification of a `<domid>`; it is
> > +assumed that the watch pertains to the domain that owns the shared ring
> > +over which the operation is passed. Hence, for the tool-stack to be able
> > +to register a watch on behalf of a domain a new operation is needed:
> > +
> > +```
> > +ADD_DOMAIN_WATCHES      <domid>|<watch>|+
> > +
> > +Adds watches on behalf of the specified domain.
> > +
> > +<watch> is a NUL separated tuple of <path>|<token>. The semantics of this
> > +operation are identical to the domain issuing WATCH <path>|<token>| for
> > +each <watch>.
> > +```
> > +
> > +The watch information for a domain also needs to be extracted from the
> > +sending xenstored so the following operation is also needed:
> > +
> > +```
> > +GET_DOMAIN_WATCHES      <domid>|<index>   <gencnt>|<watch>|*
> > +
> > +Gets the list of watches that are currently registered for the domain.
> > +
> > +<watch> is a NUL separated tuple of <path>|<token>. The sub-list returned
> > +will start at <index> items into the the overall list of watches and may
> > +be truncated (at a <watch> boundary) such that the returned data fits
> > +within XENSTORE_PAYLOAD_MAX.
> > +
> > +If <index> is beyond the end of the overall list then the returned sub-
> > +list will be empty. If the value of <gencnt> changes then it indicates
> > +that the overall watch list has changed and thus it may be necessary
> > +to re-issue the operation for previous values of <index>.
> > +```
> > +
> > +To deal with transactions that were pending when the domain is migrated
> > +it is necessary to start transactions with the same `<trans-id>` in the
> > +receiving xenstored but for them to result in an `EAGAIN` when the
> > +`TRANSACTION_END` operation is peformed. Thus the `TRANSACTION_START`
> > +operation needs to be re-defined as follows:
> > +
> > +```
> > +TRANSACTION_START    |                       <transid>|
> > +     <transid> is an opaque uint32_t represented as unsigned decimal.
> > +    If tx_id is 0 for this operation then a new transaction will be started
> > +    with a tx_id allocated by xenstored. If a non-0 tx_id is specified then
> > +    a transaction with that tx_id will be started and automatically marked
> > +    `conflicting'. The tx_id will always be passed back in <transid>.
> > +    After this, the tx_id may be used in the request header field for
> > +    other operations.
> > +    When a transaction is started whole db is copied; reads and writes
> > +    happen on the copy.
> > +```
> 
> The transaction ID are per connection. As the toolstack will issue the
> command on restore, you would reserve the ID for the wrong domain. So I
> think you want to introduce a new command maybe
> RESERVE_DOMAIN_TRANSACTIONS?

Yes, of course you are correct. So much for trying to re-use things :-( I'll spec a new command.

  Paul

> 
> Cheers,
> 
> --
> Julien Grall
Julien Grall March 6, 2020, 4:18 p.m. UTC | #3
Hi Paul,

On 06/03/2020 12:19, Paul Durrant wrote:
>>> +
>>> +`<path>` and `<value|>` should be suitable to formulate a `WRITE` operation
>>> +to the receiving xenstored and the `<perm-as-string>|+` list should be
>>> +similarly suitable to formulate a subsequent `SET_PERMS` operation.
>>> +`<perm-count>` specifies the number of entries in the `<perm-as-string>|+`
>>> +list and `<value|>` must be placed at the end because it may contain NUL
>>> +octets.
>>
>> What is the size of <perm-count>? Also, we may want perm-count to be
>> aligned to its size so we don't have to worry of unaligned access.
>>
>> How about moving <perm-count> at the beginning of the data blob?
>>
> 
> Not sure we really need to care about alignment... I'll leave as-is for the moment.

Well, you are just going to make the software more complex because you 
will have to do multiple memory access to read the field (unalign access 
are not supported everywhere). The fact that field is just after a 
flexible size field is going to make the code much worse.

Re-ordering the fields in the stream is going to be so much easier to 
do. So I would strongly recommend to rework your stream here.

Cheers,
diff mbox series

Patch

diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
new file mode 100644
index 0000000000..7e61f462f0
--- /dev/null
+++ b/docs/designs/xenstore-migration.md
@@ -0,0 +1,171 @@ 
+# Xenstore Migration
+
+## Background
+
+The design for *Non-Cooperative Migration of Guests*[1] explains that extra
+save records are required in the migrations stream to allow a guest running
+PV drivers to be migrated without its co-operation. Moreover the save
+records must include details of registered xenstore watches as well as
+content; information that cannot currently be recovered from `xenstored`,
+and hence some extension to the xenstore protocol[2] will also be required.
+
+The *libxenlight Domain Image Format* specification[3] already defines a
+record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
+transferring xenstore data pertaining to the domain directly as it is
+specified such that keys are relative to the path
+`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
+define at least one new save record type.
+
+## Proposal
+
+### New Save Record
+
+A new mandatory record type should be defined within the libxenlight Domain
+Image Format:
+
+`0x00000007: DOMAIN_XENSTORE_DATA`
+
+The format of each of these new records should be as follows:
+
+
+```
+0     1     2     3     4     5     6     7 octet
++------------------------+------------------------+
+| type                   | record specific data   |
++------------------------+                        |
+...
++-------------------------------------------------+
+```
+
+NB: The record data does not contain a length because the libxenlight record
+header specifies the length.
+
+
+| Field  | Description                                      |
+|--------|--------------------------------------------------|
+| `type` | 0x00000000: invalid                              |
+|        | 0x00000001: node data                            |
+|        | 0x00000002: watch data                           |
+|        | 0x00000003: transaction data                     |
+|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
+
+
+where data is always in the form of a tuple as follows
+
+
+**node data**
+
+
+`<path>|<perm-count>|<perm-as-string>|+<value|>`
+
+
+`<path>` and `<value|>` should be suitable to formulate a `WRITE` operation
+to the receiving xenstored and the `<perm-as-string>|+` list should be
+similarly suitable to formulate a subsequent `SET_PERMS` operation.
+`<perm-count>` specifies the number of entries in the `<perm-as-string>|+`
+list and `<value|>` must be placed at the end because it may contain NUL
+octets.
+
+
+**watch data**
+
+
+`<path>|<token>|`
+
+`<path>` again is absolute and, together with `<token>`, should
+be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
+
+
+**transaction data**
+
+
+`<transid-count>|<transid>|+`
+
+Each `<transid>` should be a uint32_t value represented as unsigned decimal
+suitable for passing as a *tx_id* to the re-defined `TRANSACTION_START`
+operation (see below). `<transid-count>` is the number of entries in the
+`<transid>|+` list.
+
+
+### Protocol Extension
+
+Before xenstore state is migrated it is necessary to wait for any pending
+reads, writes, watch registrations etc. to complete, and also to make sure
+that xenstored does not start processing any new requests (so that new
+requests remain pending on the shared ring for subsequent processing on the
+new host). Hence the following operation is needed:
+
+```
+QUIESCE                 <domid>|
+
+Complete processing of any request issued by the specified domain, and
+do not process any further requests from the shared ring.
+```
+
+The `WATCH` operation does not allow specification of a `<domid>`; it is
+assumed that the watch pertains to the domain that owns the shared ring
+over which the operation is passed. Hence, for the tool-stack to be able
+to register a watch on behalf of a domain a new operation is needed:
+
+```
+ADD_DOMAIN_WATCHES      <domid>|<watch>|+
+
+Adds watches on behalf of the specified domain.
+
+<watch> is a NUL separated tuple of <path>|<token>. The semantics of this
+operation are identical to the domain issuing WATCH <path>|<token>| for
+each <watch>.
+```
+
+The watch information for a domain also needs to be extracted from the
+sending xenstored so the following operation is also needed:
+
+```
+GET_DOMAIN_WATCHES      <domid>|<index>   <gencnt>|<watch>|*
+
+Gets the list of watches that are currently registered for the domain.
+
+<watch> is a NUL separated tuple of <path>|<token>. The sub-list returned
+will start at <index> items into the the overall list of watches and may
+be truncated (at a <watch> boundary) such that the returned data fits
+within XENSTORE_PAYLOAD_MAX.
+
+If <index> is beyond the end of the overall list then the returned sub-
+list will be empty. If the value of <gencnt> changes then it indicates
+that the overall watch list has changed and thus it may be necessary
+to re-issue the operation for previous values of <index>.
+```
+
+To deal with transactions that were pending when the domain is migrated
+it is necessary to start transactions with the same `<trans-id>` in the
+receiving xenstored but for them to result in an `EAGAIN` when the
+`TRANSACTION_END` operation is peformed. Thus the `TRANSACTION_START`
+operation needs to be re-defined as follows:
+
+```
+TRANSACTION_START	|			<transid>|
+	<transid> is an opaque uint32_t represented as unsigned decimal.
+    If tx_id is 0 for this operation then a new transaction will be started
+    with a tx_id allocated by xenstored. If a non-0 tx_id is specified then
+    a transaction with that tx_id will be started and automatically marked
+    `conflicting'. The tx_id will always be passed back in <transid>.
+    After this, the tx_id may be used in the request header field for
+    other operations.
+    When a transaction is started whole db is copied; reads and writes
+    happen on the copy.
+```
+
+It may also be desirable to state in the protocol specification that
+the `INTRODUCE` operation should not clear the `<gfn>` specified such that
+a `RELEASE` operation followed by an `INTRODUCE` operation form an
+idempotent pair. The current implementation of *C xentored* does this
+(in the `domain_conn_reset()` function) but this could be dropped as this
+behaviour is not currently specified and the page will always be zeroed
+for a newly created domain.
+
+
+* * *
+
+[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
+[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt
+[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/specs/libxl-migration-stream.pandoc
diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 6f8569d576..51e6b12931 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -254,7 +254,7 @@  TRANSACTION_END		F|
 
 ---------- Domain management and xenstored communications ----------
 
-INTRODUCE		<domid>|<mfn>|<evtchn>|?
+INTRODUCE		<domid>|<gfn>|<evtchn>|?
 	Notifies xenstored to communicate with this domain.
 
 	INTRODUCE is currently only used by xend (during domain
@@ -262,12 +262,12 @@  INTRODUCE		<domid>|<mfn>|<evtchn>|?
 	xenstored prevents its use other than by dom0.
 
 	<domid> must be a real domain id (not 0 and not a special
-	DOMID_... value).  <mfn> must be a machine page in that domain
+	DOMID_... value).  <gfn> must be a machine page in that domain
 	represented in signed decimal (!).  <evtchn> must be event
 	channel is an unbound event channel in <domid> (likewise in
 	decimal), on which xenstored will call bind_interdomain.
 	Violations of these rules may result in undefined behaviour;
-	for example passing a high-bit-set 32-bit mfn as an unsigned
+	for example passing a high-bit-set 32-bit gfn as an unsigned
 	decimal will attempt to use 0x7fffffff instead (!).
 
 RELEASE			<domid>|