diff mbox

[v3,2/3] Introduce migration precopy policy

Message ID 1506365735-133776-3-git-send-email-Jennifer.Herbert@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jennifer Herbert Sept. 25, 2017, 6:55 p.m. UTC
This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
  mechanism of the precopy stage
- it can't take into account facts external to the immediate
  migration context, such external state transfer state, interactive
  user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
  what to do at the end of the precopy (they get an unconditional
  transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
  callbacks
- during the precopy phase of live migrations, consult this policy after
  each batch of pages transmitted and take the dictated action, which
  may be to a) abort the migration entirely, b) continue with the
  precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
  precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

---

v3:

Have changed/added comments, removed iteration parmeter from
update_progress_string and made a few formatting corrections.

v2:

Have made a few formatting corrections, added typedef as suggested.

This is updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product.
I hope this will aid Josh in pushing the remainder of his series.
---
 tools/libxc/include/xenguest.h |  37 +++++++++++++--
 tools/libxc/xc_sr_common.h     |   6 +--
 tools/libxc/xc_sr_save.c       | 105 ++++++++++++++++++++++++++++-------------
 3 files changed, 109 insertions(+), 39 deletions(-)

Comments

Wei Liu Sept. 26, 2017, 8:36 a.m. UTC | #1
On Mon, Sep 25, 2017 at 07:55:34PM +0100, Jennifer Herbert wrote:
> This Patch allows a migration precopy policy to be specified.
> 
> The precopy phase of the xc_domain_save() live migration algorithm has
> historically been implemented to run until either a) (almost) no pages
> are dirty or b) some fixed, hard-coded maximum number of precopy
> iterations has been exceeded.  This policy and its implementation are
> less than ideal for a few reasons:
> - the logic of the policy is intertwined with the control flow of the
>   mechanism of the precopy stage
> - it can't take into account facts external to the immediate
>   migration context, such external state transfer state, interactive
>   user input, or the passage of wall-clock time.
> - it does not permit the user to change their mind, over time, about
>   what to do at the end of the precopy (they get an unconditional
>   transition into the stop-and-copy phase of the migration)
> 
> To permit callers to implement arbitrary higher-level policies governing
> when the live migration precopy phase should end, and what should be
> done next:
> - add a precopy_policy() callback to the xc_domain_save() user-supplied
>   callbacks
> - during the precopy phase of live migrations, consult this policy after
>   each batch of pages transmitted and take the dictated action, which
>   may be to a) abort the migration entirely, b) continue with the
>   precopy, or c) proceed to the stop-and-copy phase.
> - provide an implementation of the old policy, used when
>   precopy_policy callback  is not provided.
> 
> Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
> Signed-off-by: Joshua Otto <jtotto@uwaterloo.ca>
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper Sept. 26, 2017, 10:58 a.m. UTC | #2
On 25/09/17 19:55, Jennifer Herbert wrote:
> +/*
> + * A precopy_policy callback may not be running in the same address
> + * space as libxc an so precopy_stats is passed by value.
> + */

Please take a step back and thing about what is written here...

As I've said repeatedly, the structure vs pointer argument here is
entirely unrelated to the IPC boundary.

I am also unhappy that, after multiple review requests saying "turn this
into a pointer", its remained being passed by value.  It is not ok to
hack things like this up simply because its the easier cause of action.

~Andrew

> +typedef int (*precopy_policy_t)(struct precopy_stats, void *);
> +
>  /* callbacks provided by xc_domain_save */
>  struct save_callbacks {
>      /* Called after expiration of checkpoint interval,
>
Andrew Cooper Sept. 26, 2017, 11:02 a.m. UTC | #3
On 25/09/17 19:55, Jennifer Herbert wrote:
> @@ -46,7 +60,22 @@ struct save_callbacks {
>       */
>      int (*suspend)(void* data);
>  
> -    /* Called after the guest's dirty pages have been
> +    /*
> +     * Called before and after every batch of page data sent during
> +     * the precopy phase of a live migration to ask the caller what
> +     * to do next based on the current state of the precopy migration.

How is the callback supposed to determine whether it is before or ahead
of the data batch?

As far as I can tell, its not possible.

~Andrew

> +     *
> +     * Should return one of the values listed below:
> +     */
> +#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
> +                                        * and tidy up. */
> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
> +#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
> +                                        * remaining dirty pages. */
> +    precopy_policy_t precopy_policy;
> +
> +    /*
> +     * Called after the guest's dirty pages have been
>       *  copied into an output buffer.
>       * Callback function resumes the guest & the device model,
>       *  returns to xc_domain_save.
>
Jennifer Herbert Sept. 26, 2017, 11:38 a.m. UTC | #4
On 26/09/17 11:58, Andrew Cooper wrote:
> On 25/09/17 19:55, Jennifer Herbert wrote:
>> +/*
>> + * A precopy_policy callback may not be running in the same address
>> + * space as libxc an so precopy_stats is passed by value.
>> + */
> Please take a step back and thing about what is written here...
>
> As I've said repeatedly, the structure vs pointer argument here is
> entirely unrelated to the IPC boundary.
>
> I am also unhappy that, after multiple review requests saying "turn this
> into a pointer", its remained being passed by value.  It is not ok to
> hack things like this up simply because its the easier cause of action.

Passing tiny structures by value is not a hack - it is perfectly valid 
C.  Bare in mind that a pointer is not a zero
sized entity - it itself needs to be copied.  Passing by value here is 
quite possibly faster, and use less memory.   Furthermore, the multiple 
reviews concluded that this was related to IPC boundary, and that the 
tiny struct size was acceptable.

Maybe you could elaborate on why you do not think it is related to the 
IPC boundary.

Instead, we could pass each element of the structure, as an individual 
parameters.  This would seem a retrograde
step to me, but  would address your sensitivities.

-jenny
Jennifer Herbert Sept. 26, 2017, 11:49 a.m. UTC | #5
On 26/09/17 12:02, Andrew Cooper wrote:
> On 25/09/17 19:55, Jennifer Herbert wrote:
>> @@ -46,7 +60,22 @@ struct save_callbacks {
>>        */
>>       int (*suspend)(void* data);
>>   
>> -    /* Called after the guest's dirty pages have been
>> +    /*
>> +     * Called before and after every batch of page data sent during
>> +     * the precopy phase of a live migration to ask the caller what
>> +     * to do next based on the current state of the precopy migration.
> How is the callback supposed to determine whether it is before or ahead
> of the data batch?
>
> As far as I can tell, its not possible.
>
> ~Andrew

You look at the dirty_count.  It works well for me.
I can add that info to the comment.

-jenny

>> +     *
>> +     * Should return one of the values listed below:
>> +     */
>> +#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
>> +                                        * and tidy up. */
>> +#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
>> +#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
>> +                                        * remaining dirty pages. */
>> +    precopy_policy_t precopy_policy;
>> +
>> +    /*
>> +     * Called after the guest's dirty pages have been
>>        *  copied into an output buffer.
>>        * Callback function resumes the guest & the device model,
>>        *  returns to xc_domain_save.
>>
diff mbox

Patch

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6626f0c..de97f1b 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,20 @@ 
  */
 struct xenevtchn_handle;
 
+/* For save's precopy_policy(). */
+struct precopy_stats
+{
+    unsigned int iteration;
+    unsigned int total_written;
+    long dirty_count; /* -1 if unknown */
+};
+
+/*
+ * A precopy_policy callback may not be running in the same address
+ * space as libxc an so precopy_stats is passed by value.
+ */
+typedef int (*precopy_policy_t)(struct precopy_stats, void *);
+
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
     /* Called after expiration of checkpoint interval,
@@ -46,7 +60,22 @@  struct save_callbacks {
      */
     int (*suspend)(void* data);
 
-    /* Called after the guest's dirty pages have been
+    /*
+     * Called before and after every batch of page data sent during
+     * the precopy phase of a live migration to ask the caller what
+     * to do next based on the current state of the precopy migration.
+     *
+     * Should return one of the values listed below:
+     */
+#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
+                                        * and tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
+                                        * remaining dirty pages. */
+    precopy_policy_t precopy_policy;
+
+    /*
+     * Called after the guest's dirty pages have been
      *  copied into an output buffer.
      * Callback function resumes the guest & the device model,
      *  returns to xc_domain_save.
@@ -55,7 +84,8 @@  struct save_callbacks {
      */
     int (*postcopy)(void* data);
 
-    /* Called after the memory checkpoint has been flushed
+    /*
+     * Called after the memory checkpoint has been flushed
      * out into the network. Typical actions performed in this
      * callback include:
      *   (a) send the saved device model state (for HVM guests),
@@ -65,7 +95,8 @@  struct save_callbacks {
      *
      * returns:
      * 0: terminate checkpointing gracefully
-     * 1: take another checkpoint */
+     * 1: take another checkpoint
+     */
     int (*checkpoint)(void* data);
 
     /*
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22a..3635704 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -198,12 +198,10 @@  struct xc_sr_context
             /* Further debugging information in the stream. */
             bool debug;
 
-            /* Parameters for tweaking live migration. */
-            unsigned max_iterations;
-            unsigned dirty_threshold;
-
             unsigned long p2m_size;
 
+            struct precopy_stats stats;
+
             xen_pfn_t *batch_pfns;
             unsigned nr_batch_pfns;
             unsigned long *deferred_pages;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 1e7502d..5a40e58 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -446,14 +446,13 @@  static int enable_logdirty(struct xc_sr_context *ctx)
     return 0;
 }
 
-static int update_progress_string(struct xc_sr_context *ctx,
-                                  char **str, unsigned iter)
+static int update_progress_string(struct xc_sr_context *ctx, char **str)
 {
     xc_interface *xch = ctx->xch;
     char *new_str = NULL;
+    unsigned int iter = ctx->save.stats.iteration;
 
-    if ( asprintf(&new_str, "Frames iteration %u of %u",
-                  iter, ctx->save.max_iterations) == -1 )
+    if ( asprintf(&new_str, "Frames iteration %u", iter) == -1 )
     {
         PERROR("Unable to allocate new progress string");
         return -1;
@@ -467,6 +466,28 @@  static int update_progress_string(struct xc_sr_context *ctx,
 }
 
 /*
+ * This is the live migration precopy policy - it's called periodically during
+ * the precopy phase of live migrations, and is responsible for deciding when
+ * the precopy phase should terminate and what should be done next.
+ *
+ * The policy implemented here behaves identically to the policy previously
+ * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of
+ * the live migration when there are either fewer than 50 dirty pages, or more
+ * than 5 precopy rounds have completed.
+ */
+#define SPP_MAX_ITERATIONS      5
+#define SPP_TARGET_DIRTY_COUNT 50
+
+static int simple_precopy_policy(struct precopy_stats stats, void *user)
+{
+    return ((stats.dirty_count >= 0 &&
+            stats.dirty_count < SPP_TARGET_DIRTY_COUNT) ||
+            stats.iteration >= SPP_MAX_ITERATIONS)
+        ? XGS_POLICY_STOP_AND_COPY
+        : XGS_POLICY_CONTINUE_PRECOPY;
+}
+
+/*
  * Send memory while guest is running.
  */
 static int send_memory_live(struct xc_sr_context *ctx)
@@ -474,21 +495,59 @@  static int send_memory_live(struct xc_sr_context *ctx)
     xc_interface *xch = ctx->xch;
     xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
     char *progress_str = NULL;
-    unsigned x;
+    unsigned int x = 0;
     int rc;
+    int policy_decision;
 
-    rc = update_progress_string(ctx, &progress_str, 0);
-    if ( rc )
-        goto out;
+    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                    &ctx->save.dirty_bitmap_hbuf);
 
-    rc = send_all_pages(ctx);
+    precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
+    void *data = ctx->save.callbacks->data;
+
+    struct precopy_stats *policy_stats;
+
+    rc = update_progress_string(ctx, &progress_str);
     if ( rc )
         goto out;
 
-    for ( x = 1;
-          ((x < ctx->save.max_iterations) &&
-           (stats.dirty_count > ctx->save.dirty_threshold)); ++x )
+    ctx->save.stats = (struct precopy_stats)
+        { .dirty_count   = ctx->save.p2m_size };
+    policy_stats = &ctx->save.stats;
+
+    if ( precopy_policy == NULL )
+         precopy_policy = simple_precopy_policy;
+
+    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
+
+    for ( ; ; )
     {
+        policy_decision = precopy_policy(*policy_stats, data);
+        x++;
+
+        if ( stats.dirty_count > 0 && policy_decision != XGS_POLICY_ABORT )
+        {
+            rc = update_progress_string(ctx, &progress_str);
+            if ( rc )
+                goto out;
+
+            rc = send_dirty_pages(ctx, stats.dirty_count);
+            if ( rc )
+                goto out;
+        }
+
+        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
+            break;
+
+        policy_stats->iteration     = x;
+        policy_stats->total_written += policy_stats->dirty_count;
+        policy_stats->dirty_count   = -1;
+
+        policy_decision = precopy_policy(*policy_stats, data);
+
+        if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
+           break;
+
         if ( xc_shadow_control(
                  xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
                  &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
@@ -499,16 +558,8 @@  static int send_memory_live(struct xc_sr_context *ctx)
             goto out;
         }
 
-        if ( stats.dirty_count == 0 )
-            break;
+        policy_stats->dirty_count = stats.dirty_count;
 
-        rc = update_progress_string(ctx, &progress_str, x);
-        if ( rc )
-            goto out;
-
-        rc = send_dirty_pages(ctx, stats.dirty_count);
-        if ( rc )
-            goto out;
     }
 
  out:
@@ -600,8 +651,7 @@  static int suspend_and_send_dirty(struct xc_sr_context *ctx)
 
     if ( ctx->save.live )
     {
-        rc = update_progress_string(ctx, &progress_str,
-                                    ctx->save.max_iterations);
+        rc = update_progress_string(ctx, &progress_str);
         if ( rc )
             goto out;
     }
@@ -937,15 +987,6 @@  int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
            stream_type == XC_MIG_STREAM_REMUS ||
            stream_type == XC_MIG_STREAM_COLO);
 
-    /*
-     * TODO: Find some time to better tweak the live migration algorithm.
-     *
-     * These parameters are better than the legacy algorithm especially for
-     * busy guests.
-     */
-    ctx.save.max_iterations = 5;
-    ctx.save.dirty_threshold = 50;
-
     /* Sanity checks for callbacks. */
     if ( hvm )
         assert(callbacks->switch_qemu_logdirty);