diff mbox

[v6,12/18] tools/libx{l, c}: add back channel to libxc

Message ID 1451442548-26974-13-git-send-email-wency@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wen Congyang Dec. 30, 2015, 2:29 a.m. UTC
In COLO mode, both VMs are running, and are considered in sync if the
visible network traffic is identical.  After some time, they fall out of
sync.

At this point, the two VMs have definitely diverged.  Lets call the
primary dirty bitmap set A, while the secondary dirty bitmap set B.

Sets A and B are different.

Under normal migration, the page data for set A will be sent form the
primary to the secondary.

However, the set difference B - A (lets call this C) is out-of-date on
the secondary (with respect to the primary) and will not be sent by the
primary, as it was not memory dirtied by the primary.  The secondary
needs the page data for C to reconstruct an exact copy of the primary at
the checkpoint.

The secondary cannot calculate C as it doesn't know A.  Instead, the
secondary must send B to the primary, at which point the primary
calculates the union of A and B (lets call this D) which is all the
pages dirtied by both the primary and the secondary, and sends all page
data covered by D.

In the general case, D is a superset of both A and B.  Without the
backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
copy of the primary.

We transfer the dirty bitmap on libxc side, so we need to introduce back
channel to libxc.

Note: it is different from the paper. We change the original design to
the current one, according to our following concerns:
1. The original design needs extra memory on Secondary host. When there's
   multiple backups on one host, the memory cost is high.
2. The memory cache code will be another 1k+, it will make the review
   more time consuming.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
commit message:
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxc/include/xenguest.h   |  4 ++--
 tools/libxc/xc_nomigrate.c       |  4 ++--
 tools/libxc/xc_sr_restore.c      |  2 +-
 tools/libxc/xc_sr_save.c         |  2 +-
 tools/libxl/libxl_save_callout.c | 39 ++++++++++++++++++++++++++-------------
 tools/libxl/libxl_save_helper.c  |  8 ++++++--
 6 files changed, 38 insertions(+), 21 deletions(-)

Comments

Konrad Rzeszutek Wilk Jan. 25, 2016, 7:41 p.m. UTC | #1
On Wed, Dec 30, 2015 at 10:29:02AM +0800, Wen Congyang wrote:
> In COLO mode, both VMs are running, and are considered in sync if the
> visible network traffic is identical.  After some time, they fall out of
> sync.
> 
> At this point, the two VMs have definitely diverged.  Lets call the
> primary dirty bitmap set A, while the secondary dirty bitmap set B.
> 
> Sets A and B are different.
> 
> Under normal migration, the page data for set A will be sent form the

s/form/from/

> primary to the secondary.
> 
> However, the set difference B - A (lets call this C) is out-of-date on
> the secondary (with respect to the primary) and will not be sent by the
> primary, as it was not memory dirtied by the primary.  The secondary

s/primary/primary (to secondary)/

> needs the page data for C to reconstruct an exact copy of the primary at

s/the page data/C page data/

> the checkpoint.
> 
> The secondary cannot calculate C as it doesn't know A.  Instead, the
> secondary must send B to the primary, at which point the primary
> calculates the union of A and B (lets call this D) which is all the
> pages dirtied by both the primary and the secondary, and sends all page
> data covered by D.

You could invert this - the primary could send A to secondary? I presume
this non-optimal as the 'A' set is much much bigger than 'C' set?

It may be good to include this in the commit description.

> 
> In the general case, D is a superset of both A and B.  Without the
> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
> copy of the primary.
> 
> We transfer the dirty bitmap on libxc side, so we need to introduce back
> channel to libxc.

> 
> Note: it is different from the paper. We change the original design to
> the current one, according to our following concerns:
> 1. The original design needs extra memory on Secondary host. When there's
>    multiple backups on one host, the memory cost is high.
> 2. The memory cache code will be another 1k+, it will make the review
>    more time consuming.

Well, that 2) is a very good reason :-)
> 
> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
> commit message:

? Huh?

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Campbell <Ian.Campbell@citrix.com>
> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>

.. snip..
> index 05159bb..d4dc501 100644
> --- a/tools/libxc/xc_sr_restore.c
> +++ b/tools/libxc/xc_sr_restore.c
> @@ -722,7 +722,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>                        unsigned long *console_gfn, domid_t console_domid,
>                        unsigned int hvm, unsigned int pae, int superpages,
>                        int checkpointed_stream,
> -                      struct restore_callbacks *callbacks)
> +                      struct restore_callbacks *callbacks, int back_fd)
>  {
>      struct xc_sr_context ctx =
>          {
> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
> index 8ffd71d..a49d083 100644
> --- a/tools/libxc/xc_sr_save.c
> +++ b/tools/libxc/xc_sr_save.c
> @@ -824,7 +824,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
>  int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
>                     uint32_t max_iters, uint32_t max_factor, uint32_t flags,
>                     struct save_callbacks* callbacks, int hvm,
> -                   int checkpointed_stream)
> +                   int checkpointed_stream, int back_fd)
>  {
>      struct xc_sr_context ctx =
>          {


But where is the code?

Or is that suppose to be done in another patch? If so you may want to
mention that in the commit description?
Wen Congyang Jan. 26, 2016, 8:03 a.m. UTC | #2
On 01/26/2016 03:41 AM, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 30, 2015 at 10:29:02AM +0800, Wen Congyang wrote:
>> In COLO mode, both VMs are running, and are considered in sync if the
>> visible network traffic is identical.  After some time, they fall out of
>> sync.
>>
>> At this point, the two VMs have definitely diverged.  Lets call the
>> primary dirty bitmap set A, while the secondary dirty bitmap set B.
>>
>> Sets A and B are different.
>>
>> Under normal migration, the page data for set A will be sent form the
> 
> s/form/from/
> 
>> primary to the secondary.
>>
>> However, the set difference B - A (lets call this C) is out-of-date on
>> the secondary (with respect to the primary) and will not be sent by the
>> primary, as it was not memory dirtied by the primary.  The secondary
> 
> s/primary/primary (to secondary)/
> 
>> needs the page data for C to reconstruct an exact copy of the primary at
> 
> s/the page data/C page data/
> 
>> the checkpoint.
>>
>> The secondary cannot calculate C as it doesn't know A.  Instead, the
>> secondary must send B to the primary, at which point the primary
>> calculates the union of A and B (lets call this D) which is all the
>> pages dirtied by both the primary and the secondary, and sends all page
>> data covered by D.
> 
> You could invert this - the primary could send A to secondary? I presume
> this non-optimal as the 'A' set is much much bigger than 'C' set?

'C' set is the one in 'B' set but not in 'A' set.

> 
> It may be good to include this in the commit description.
> 
>>
>> In the general case, D is a superset of both A and B.  Without the
>> backchannel dirty bitmap, a COLO checkpoint can't reconstruct a valid
>> copy of the primary.
>>
>> We transfer the dirty bitmap on libxc side, so we need to introduce back
>> channel to libxc.
> 
>>
>> Note: it is different from the paper. We change the original design to
>> the current one, according to our following concerns:
>> 1. The original design needs extra memory on Secondary host. When there's
>>    multiple backups on one host, the memory cost is high.
>> 2. The memory cache code will be another 1k+, it will make the review
>>    more time consuming.
> 
> Well, that 2) is a very good reason :-)
>>
>> Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
>> commit message:
> 
> ? Huh?

I don't know what it is. Will remove it in the next version.

> 
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
> 
> .. snip..
>> index 05159bb..d4dc501 100644
>> --- a/tools/libxc/xc_sr_restore.c
>> +++ b/tools/libxc/xc_sr_restore.c
>> @@ -722,7 +722,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>>                        unsigned long *console_gfn, domid_t console_domid,
>>                        unsigned int hvm, unsigned int pae, int superpages,
>>                        int checkpointed_stream,
>> -                      struct restore_callbacks *callbacks)
>> +                      struct restore_callbacks *callbacks, int back_fd)
>>  {
>>      struct xc_sr_context ctx =
>>          {
>> diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
>> index 8ffd71d..a49d083 100644
>> --- a/tools/libxc/xc_sr_save.c
>> +++ b/tools/libxc/xc_sr_save.c
>> @@ -824,7 +824,7 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type)
>>  int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
>>                     uint32_t max_iters, uint32_t max_factor, uint32_t flags,
>>                     struct save_callbacks* callbacks, int hvm,
>> -                   int checkpointed_stream)
>> +                   int checkpointed_stream, int back_fd)
>>  {
>>      struct xc_sr_context ctx =
>>          {
> 
> 
> But where is the code?
> 
> Or is that suppose to be done in another patch? If so you may want to
> mention that in the commit description?

Do you mean where is the code that uses back_fd? It is in another series:
http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02904.html

Thanks
Wen Congyang

> 
> 
> 
> .
>
Konrad Rzeszutek Wilk Jan. 26, 2016, 2:29 p.m. UTC | #3
> > Or is that suppose to be done in another patch? If so you may want to
> > mention that in the commit description?
> 
> Do you mean where is the code that uses back_fd? It is in another series:
> http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02904.html

Ah right that big patchset one. Hadn't looked at that yet - it is a bit hard
without having a git tree on which the foundation patches (this patch
series) are applied so I can look at the contents of the functions.
Wen Congyang Jan. 27, 2016, 12:52 a.m. UTC | #4
On 01/26/2016 10:29 PM, Konrad Rzeszutek Wilk wrote:
>>> Or is that suppose to be done in another patch? If so you may want to
>>> mention that in the commit description?
>>
>> Do you mean where is the code that uses back_fd? It is in another series:
>> http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02904.html
> 
> Ah right that big patchset one. Hadn't looked at that yet - it is a bit hard
> without having a git tree on which the foundation patches (this patch
> series) are applied so I can look at the contents of the functions.

Yes, I will provide a git tree to help review.

Thanks
Wen Congyang

> 
> 
> 
> 
> .
>
diff mbox

Patch

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index e8bc46d..bd133af 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -82,7 +82,7 @@  struct save_callbacks {
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
                    struct save_callbacks* callbacks, int hvm,
-                   int checkpointed_stream);
+                   int checkpointed_stream, int back_fd);
 
 /* callbacks provided by xc_domain_restore */
 struct restore_callbacks {
@@ -121,7 +121,7 @@  int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_mfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       int checkpointed_stream,
-                      struct restore_callbacks *callbacks);
+                      struct restore_callbacks *callbacks, int back_fd);
 
 /**
  * This function will create a domain for a paravirtualized Linux
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index c9124df..089f767 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -23,7 +23,7 @@ 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,
                    uint32_t max_factor, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
-                   int checkpointed_stream)
+                   int checkpointed_stream, int back_fd)
 {
     errno = ENOSYS;
     return -1;
@@ -35,7 +35,7 @@  int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_mfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       int checkpointed_stream,
-                      struct restore_callbacks *callbacks)
+                      struct restore_callbacks *callbacks, int back_fd)
 {
     errno = ENOSYS;
     return -1;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 05159bb..d4dc501 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -722,7 +722,7 @@  int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
                       unsigned long *console_gfn, domid_t console_domid,
                       unsigned int hvm, unsigned int pae, int superpages,
                       int checkpointed_stream,
-                      struct restore_callbacks *callbacks)
+                      struct restore_callbacks *callbacks, int back_fd)
 {
     struct xc_sr_context ctx =
         {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 8ffd71d..a49d083 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -824,7 +824,7 @@  static int save(struct xc_sr_context *ctx, uint16_t guest_type)
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
                    uint32_t max_iters, uint32_t max_factor, uint32_t flags,
                    struct save_callbacks* callbacks, int hvm,
-                   int checkpointed_stream)
+                   int checkpointed_stream, int back_fd)
 {
     struct xc_sr_context ctx =
         {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 416b318..631e3e2 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -27,7 +27,7 @@ 
  */
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
                        const char *mode_arg,
-                       int stream_fd,
+                       int stream_fd, int back_fd,
                        const int *preserve_fds, int num_preserve_fds,
                        const unsigned long *argnums, int num_argnums);
 
@@ -50,6 +50,7 @@  void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     /* Convenience aliases */
     const uint32_t domid = dcs->guest_domid;
     const int restore_fd = dcs->libxc_fd;
+    const int send_fd = dcs->send_fd;
     libxl__domain_build_state *const state = &dcs->build_state;
 
     unsigned cbflags =
@@ -71,7 +72,7 @@  void libxl__xc_domain_restore(libxl__egc *egc, libxl__domain_create_state *dcs,
     shs->caller_state = dcs;
     shs->need_results = 1;
 
-    run_helper(egc, shs, "--restore-domain", restore_fd, 0, 0,
+    run_helper(egc, shs, "--restore-domain", restore_fd, send_fd, 0, 0,
                argnums, ARRAY_SIZE(argnums));
 }
 
@@ -95,7 +96,7 @@  void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss,
     shs->caller_state = dss;
     shs->need_results = 0;
 
-    run_helper(egc, shs, "--save-domain", dss->fd,
+    run_helper(egc, shs, "--save-domain", dss->fd, dss->recv_fd,
                NULL, 0,
                argnums, ARRAY_SIZE(argnums));
     return;
@@ -118,14 +119,29 @@  void libxl__save_helper_init(libxl__save_helper_state *shs)
 }
 
 /*----- helper execution -----*/
+static int dup_fd_helper(libxl__gc *gc, int fd, const char *what)
+{
+    int dup_fd = fd;
+
+    if (fd <= 2) {
+        dup_fd = dup(fd);
+        if (dup_fd < 0) {
+            LOGE(ERROR,"dup %s", what);
+            exit(-1);
+        }
+    }
+    libxl_fd_set_cloexec(CTX, dup_fd, 0);
+
+    return dup_fd;
+}
 
 static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
-                       const char *mode_arg, int stream_fd,
+                       const char *mode_arg, int stream_fd, int back_fd,
                        const int *preserve_fds, int num_preserve_fds,
                        const unsigned long *argnums, int num_argnums)
 {
     STATE_AO_GC(shs->ao);
-    const char *args[4 + num_argnums];
+    const char *args[5 + num_argnums];
     const char **arg = args;
     int i, rc;
 
@@ -153,6 +169,7 @@  static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
     *arg++ = getenv("LIBXL_SAVE_HELPER") ?: LIBEXEC_BIN "/" "libxl-save-helper";
     *arg++ = mode_arg;
     const char **stream_fd_arg = arg++;
+    const char **back_fd_arg = arg++;
     for (i=0; i<num_argnums; i++)
         *arg++ = GCSPRINTF("%lu", argnums[i]);
     *arg++ = 0;
@@ -177,16 +194,12 @@  static void run_helper(libxl__egc *egc, libxl__save_helper_state *shs,
 
     pid_t pid = libxl__ev_child_fork(gc, &shs->child, helper_exited);
     if (!pid) {
-        if (stream_fd <= 2) {
-            stream_fd = dup(stream_fd);
-            if (stream_fd < 0) {
-                LOGE(ERROR,"dup migration stream fd");
-                exit(-1);
-            }
-        }
-        libxl_fd_set_cloexec(CTX, stream_fd, 0);
+        stream_fd = dup_fd_helper(gc, stream_fd, "migration stream fd");
         *stream_fd_arg = GCSPRINTF("%d", stream_fd);
 
+        back_fd = dup_fd_helper(gc, back_fd, "migration back channel fd");
+        *back_fd_arg = GCSPRINTF("%d", back_fd);
+
         for (i=0; i<num_preserve_fds; i++)
             if (preserve_fds[i] >= 0) {
                 assert(preserve_fds[i] > 2);
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 6bdcf13..9bdcf41 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -238,6 +238,7 @@  static struct restore_callbacks helper_restore_callbacks;
 int main(int argc, char **argv)
 {
     int r;
+    int back_fd;
 
 #define NEXTARG (++argv, assert(*argv), *argv)
 
@@ -247,6 +248,7 @@  int main(int argc, char **argv)
     if (!strcmp(mode,"--save-domain")) {
 
         io_fd =                    atoi(NEXTARG);
+        back_fd =                  atoi(NEXTARG);
         uint32_t dom =             strtoul(NEXTARG,0,10);
         uint32_t max_iters =       strtoul(NEXTARG,0,10);
         uint32_t max_factor =      strtoul(NEXTARG,0,10);
@@ -262,12 +264,14 @@  int main(int argc, char **argv)
         setup_signals(save_signal_handler);
 
         r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-                           &helper_save_callbacks, hvm, checkpointed_stream);
+                           &helper_save_callbacks, hvm, checkpointed_stream,
+                           back_fd);
         complete(r);
 
     } else if (!strcmp(mode,"--restore-domain")) {
 
         io_fd =                    atoi(NEXTARG);
+        back_fd =                  atoi(NEXTARG);
         uint32_t dom =             strtoul(NEXTARG,0,10);
         unsigned store_evtchn =    strtoul(NEXTARG,0,10);
         domid_t store_domid =      strtoul(NEXTARG,0,10);
@@ -292,7 +296,7 @@  int main(int argc, char **argv)
                               store_domid, console_evtchn, &console_mfn,
                               console_domid, hvm, pae, superpages,
                               checkpointed,
-                              &helper_restore_callbacks);
+                              &helper_restore_callbacks, back_fd);
         helper_stub_restore_results(store_mfn,console_mfn,0);
         complete(r);