diff mbox

[v4,1/5] remus: don't call stream_continue() when doing failover

Message ID 1453095622-14859-2-git-send-email-wency@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wen Congyang Jan. 18, 2016, 5:40 a.m. UTC
stream_continue() is used for migration to read emulator
xenstore data and emulator context. For remus, if we do
failover, we have read it in the checkpoint cycle, and
we only need to complete the stream.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxl/libxl_stream_read.c | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

Comments

Ian Campbell Jan. 18, 2016, 4:45 p.m. UTC | #1
On Mon, 2016-01-18 at 13:40 +0800, Wen Congyang wrote:
> stream_continue() is used for migration to read emulator
> xenstore data and emulator context. For remus, if we do
> failover, we have read it in the checkpoint cycle, and
> we only need to complete the stream.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/libxl/libxl_stream_read.c | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxl/libxl_stream_read.c
> b/tools/libxl/libxl_stream_read.c
> index 258dec4..24305f4 100644
> --- a/tools/libxl/libxl_stream_read.c
> +++ b/tools/libxl/libxl_stream_read.c
> @@ -101,6 +101,19 @@
>   *    - stream_write_emulator_done()
>   *    - stream_continue()
>   *
> + * 4) Failover for remus

I don't think this is really #4 in the list which precedes it. I think a
section "Failover for remus" would be absolutely fine right that the end of
this comment block though, i.e. right after the "Depending on the
contents..." paragraph.

Andy?

> + *    - we buffer all records until a CHECKPOINT_END record is received
> + *    - we will use the records when a CHECKPOINT_END record is received

"we will consume the buffered records..."


> + *    - if we find some internal error, the rc or retval is not 0 in

s/the/then/

> + *      libxl__xc_domain_restore_done(). In this case, we don't resume the
> + *      guest
> + *    - if we need to do failover from primary, the rc and retval are 0

s/the/then/ again and I would say "are both 0" for clarity (assuming that
is indeed the requirement).

> + *      in libxl__xc_domain_restore_done(). In this case, the buffered state
> + *      will be dropped, because we don't receive a CHECKPOINT_END record,

                                       haven't received

> + *      and it is a inconsistent state. In libxl__xc_domain_restore_done(),

"an inconsistent".

I think I would say "... and therefore the buffered state is inconsistent".

-        stream_continue(egc, stream);
> +        if (checkpointed_stream) {
> +            /*
> +             * Failover from primary. Domain state is currently at a
> +             * consistent checkpoint, complete the stream, and call
> +             * stream->completion_callback() to resume the guest.
> +             */
> +            stream_complete(egc, stream, 0);

Is it possible to get here having never received a single CHECKPOINT_END?

Ian.
Wen Congyang Jan. 19, 2016, 1:05 a.m. UTC | #2
On 01/19/2016 12:45 AM, Ian Campbell wrote:
> On Mon, 2016-01-18 at 13:40 +0800, Wen Congyang wrote:
>> stream_continue() is used for migration to read emulator
>> xenstore data and emulator context. For remus, if we do
>> failover, we have read it in the checkpoint cycle, and
>> we only need to complete the stream.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  tools/libxl/libxl_stream_read.c | 35 ++++++++++++++++++++++++++++++-----
>>  1 file changed, 30 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/libxl/libxl_stream_read.c
>> b/tools/libxl/libxl_stream_read.c
>> index 258dec4..24305f4 100644
>> --- a/tools/libxl/libxl_stream_read.c
>> +++ b/tools/libxl/libxl_stream_read.c
>> @@ -101,6 +101,19 @@
>>   *    - stream_write_emulator_done()
>>   *    - stream_continue()
>>   *
>> + * 4) Failover for remus
> 
> I don't think this is really #4 in the list which precedes it. I think a
> section "Failover for remus" would be absolutely fine right that the end of
> this comment block though, i.e. right after the "Depending on the
> contents..." paragraph.
> 
> Andy?
> 
>> + *    - we buffer all records until a CHECKPOINT_END record is received
>> + *    - we will use the records when a CHECKPOINT_END record is received
> 
> "we will consume the buffered records..."
> 
> 
>> + *    - if we find some internal error, the rc or retval is not 0 in
> 
> s/the/then/
> 
>> + *      libxl__xc_domain_restore_done(). In this case, we don't resume the
>> + *      guest
>> + *    - if we need to do failover from primary, the rc and retval are 0
> 
> s/the/then/ again and I would say "are both 0" for clarity (assuming that
> is indeed the requirement).
> 
>> + *      in libxl__xc_domain_restore_done(). In this case, the buffered state
>> + *      will be dropped, because we don't receive a CHECKPOINT_END record,
> 
>                                        haven't received
> 
>> + *      and it is a inconsistent state. In libxl__xc_domain_restore_done(),
> 
> "an inconsistent".
> 
> I think I would say "... and therefore the buffered state is inconsistent".

Sorry for may poor English... I will fix it in the next version.

> 
> -        stream_continue(egc, stream);
>> +        if (checkpointed_stream) {
>> +            /*
>> +             * Failover from primary. Domain state is currently at a
>> +             * consistent checkpoint, complete the stream, and call
>> +             * stream->completion_callback() to resume the guest.
>> +             */
>> +            stream_complete(egc, stream, 0);
> 
> Is it possible to get here having never received a single CHECKPOINT_END?

I will check the code first. I think xc_domain_restore() should not return
0 if it doesn't receive a single CHECKPOINT_END.

Thanks
Wen Congyang

> 
> Ian.
> 
> 
> .
>
diff mbox

Patch

diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c
index 258dec4..24305f4 100644
--- a/tools/libxl/libxl_stream_read.c
+++ b/tools/libxl/libxl_stream_read.c
@@ -101,6 +101,19 @@ 
  *    - stream_write_emulator_done()
  *    - stream_continue()
  *
+ * 4) Failover for remus
+ *    - we buffer all records until a CHECKPOINT_END record is received
+ *    - we will use the records when a CHECKPOINT_END record is received
+ *    - if we find some internal error, the rc or retval is not 0 in
+ *      libxl__xc_domain_restore_done(). In this case, we don't resume the
+ *      guest
+ *    - if we need to do failover from primary, the rc and retval are 0
+ *      in libxl__xc_domain_restore_done(). In this case, the buffered state
+ *      will be dropped, because we don't receive a CHECKPOINT_END record,
+ *      and it is a inconsistent state. In libxl__xc_domain_restore_done(),
+ *      we just complete the stream and stream->completion_callback() will
+ *      be called to resume the guest
+ *
  * Depending on the contents of the stream, there are likely to be several
  * parallel tasks being managed.  check_all_finished() is used to join all
  * tasks in both success and error cases.
@@ -758,6 +771,9 @@  void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
     libxl__stream_read_state *stream = &dcs->srs;
     STATE_AO_GC(dcs->ao);
 
+    /* convenience aliases */
+    const int checkpointed_stream = dcs->restore_params.checkpointed_stream;
+
     if (rc)
         goto err;
 
@@ -777,11 +793,20 @@  void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void,
      * If the stream is not still alive, we must not continue any work.
      */
     if (libxl__stream_read_inuse(stream)) {
-        /*
-         * Libxc has indicated that it is done with the stream.  Resume reading
-         * libxl records from it.
-         */
-        stream_continue(egc, stream);
+        if (checkpointed_stream) {
+            /*
+             * Failover from primary. Domain state is currently at a
+             * consistent checkpoint, complete the stream, and call
+             * stream->completion_callback() to resume the guest.
+             */
+            stream_complete(egc, stream, 0);
+        } else {
+            /*
+             * Libxc has indicated that it is done with the stream.
+             * Resume reading libxl records from it.
+             */
+            stream_continue(egc, stream);
+        }
     }
 }