Message ID | 1453095622-14859-2-git-send-email-wency@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2016-01-18 at 13:40 +0800, Wen Congyang wrote: > stream_continue() is used for migration to read emulator > xenstore data and emulator context. For remus, if we do > failover, we have read it in the checkpoint cycle, and > we only need to complete the stream. > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > tools/libxl/libxl_stream_read.c | 35 ++++++++++++++++++++++++++++++----- > 1 file changed, 30 insertions(+), 5 deletions(-) > > diff --git a/tools/libxl/libxl_stream_read.c > b/tools/libxl/libxl_stream_read.c > index 258dec4..24305f4 100644 > --- a/tools/libxl/libxl_stream_read.c > +++ b/tools/libxl/libxl_stream_read.c > @@ -101,6 +101,19 @@ > * - stream_write_emulator_done() > * - stream_continue() > * > + * 4) Failover for remus I don't think this is really #4 in the list which precedes it. I think a section "Failover for remus" would be absolutely fine right that the end of this comment block though, i.e. right after the "Depending on the contents..." paragraph. Andy? > + * - we buffer all records until a CHECKPOINT_END record is received > + * - we will use the records when a CHECKPOINT_END record is received "we will consume the buffered records..." > + * - if we find some internal error, the rc or retval is not 0 in s/the/then/ > + * libxl__xc_domain_restore_done(). In this case, we don't resume the > + * guest > + * - if we need to do failover from primary, the rc and retval are 0 s/the/then/ again and I would say "are both 0" for clarity (assuming that is indeed the requirement). > + * in libxl__xc_domain_restore_done(). In this case, the buffered state > + * will be dropped, because we don't receive a CHECKPOINT_END record, haven't received > + * and it is a inconsistent state. In libxl__xc_domain_restore_done(), "an inconsistent". I think I would say "... and therefore the buffered state is inconsistent". - stream_continue(egc, stream); > + if (checkpointed_stream) { > + /* > + * Failover from primary. Domain state is currently at a > + * consistent checkpoint, complete the stream, and call > + * stream->completion_callback() to resume the guest. > + */ > + stream_complete(egc, stream, 0); Is it possible to get here having never received a single CHECKPOINT_END? Ian.
On 01/19/2016 12:45 AM, Ian Campbell wrote: > On Mon, 2016-01-18 at 13:40 +0800, Wen Congyang wrote: >> stream_continue() is used for migration to read emulator >> xenstore data and emulator context. For remus, if we do >> failover, we have read it in the checkpoint cycle, and >> we only need to complete the stream. >> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> >> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> tools/libxl/libxl_stream_read.c | 35 ++++++++++++++++++++++++++++++----- >> 1 file changed, 30 insertions(+), 5 deletions(-) >> >> diff --git a/tools/libxl/libxl_stream_read.c >> b/tools/libxl/libxl_stream_read.c >> index 258dec4..24305f4 100644 >> --- a/tools/libxl/libxl_stream_read.c >> +++ b/tools/libxl/libxl_stream_read.c >> @@ -101,6 +101,19 @@ >> * - stream_write_emulator_done() >> * - stream_continue() >> * >> + * 4) Failover for remus > > I don't think this is really #4 in the list which precedes it. I think a > section "Failover for remus" would be absolutely fine right that the end of > this comment block though, i.e. right after the "Depending on the > contents..." paragraph. > > Andy? > >> + * - we buffer all records until a CHECKPOINT_END record is received >> + * - we will use the records when a CHECKPOINT_END record is received > > "we will consume the buffered records..." > > >> + * - if we find some internal error, the rc or retval is not 0 in > > s/the/then/ > >> + * libxl__xc_domain_restore_done(). In this case, we don't resume the >> + * guest >> + * - if we need to do failover from primary, the rc and retval are 0 > > s/the/then/ again and I would say "are both 0" for clarity (assuming that > is indeed the requirement). > >> + * in libxl__xc_domain_restore_done(). In this case, the buffered state >> + * will be dropped, because we don't receive a CHECKPOINT_END record, > > haven't received > >> + * and it is a inconsistent state. In libxl__xc_domain_restore_done(), > > "an inconsistent". > > I think I would say "... and therefore the buffered state is inconsistent". Sorry for may poor English... I will fix it in the next version. > > - stream_continue(egc, stream); >> + if (checkpointed_stream) { >> + /* >> + * Failover from primary. Domain state is currently at a >> + * consistent checkpoint, complete the stream, and call >> + * stream->completion_callback() to resume the guest. >> + */ >> + stream_complete(egc, stream, 0); > > Is it possible to get here having never received a single CHECKPOINT_END? I will check the code first. I think xc_domain_restore() should not return 0 if it doesn't receive a single CHECKPOINT_END. Thanks Wen Congyang > > Ian. > > > . >
diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index 258dec4..24305f4 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -101,6 +101,19 @@ * - stream_write_emulator_done() * - stream_continue() * + * 4) Failover for remus + * - we buffer all records until a CHECKPOINT_END record is received + * - we will use the records when a CHECKPOINT_END record is received + * - if we find some internal error, the rc or retval is not 0 in + * libxl__xc_domain_restore_done(). In this case, we don't resume the + * guest + * - if we need to do failover from primary, the rc and retval are 0 + * in libxl__xc_domain_restore_done(). In this case, the buffered state + * will be dropped, because we don't receive a CHECKPOINT_END record, + * and it is a inconsistent state. In libxl__xc_domain_restore_done(), + * we just complete the stream and stream->completion_callback() will + * be called to resume the guest + * * Depending on the contents of the stream, there are likely to be several * parallel tasks being managed. check_all_finished() is used to join all * tasks in both success and error cases. @@ -758,6 +771,9 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, libxl__stream_read_state *stream = &dcs->srs; STATE_AO_GC(dcs->ao); + /* convenience aliases */ + const int checkpointed_stream = dcs->restore_params.checkpointed_stream; + if (rc) goto err; @@ -777,11 +793,20 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, * If the stream is not still alive, we must not continue any work. */ if (libxl__stream_read_inuse(stream)) { - /* - * Libxc has indicated that it is done with the stream. Resume reading - * libxl records from it. - */ - stream_continue(egc, stream); + if (checkpointed_stream) { + /* + * Failover from primary. Domain state is currently at a + * consistent checkpoint, complete the stream, and call + * stream->completion_callback() to resume the guest. + */ + stream_complete(egc, stream, 0); + } else { + /* + * Libxc has indicated that it is done with the stream. + * Resume reading libxl records from it. + */ + stream_continue(egc, stream); + } } }