diff mbox

[v4,2/5] remus: resume immediately if libxl__xc_domain_save_done() completes

Message ID 1453095622-14859-3-git-send-email-wency@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wen Congyang Jan. 18, 2016, 5:40 a.m. UTC
For example: if the secondary host is down, and we fail to send the data to
the secondary host. xc_domain_save() returns 0. So in the function
libxl__xc_domain_save_done(), rc is 0(the helper program exits normally),
and retval is 0(it is xc_domain_save()'s return value). In such case, we
just need to complete the stream.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxl/libxl.c              |  5 ++++-
 tools/libxl/libxl_stream_write.c | 14 ++++++++++++--
 2 files changed, 16 insertions(+), 3 deletions(-)

Comments

Ian Campbell Jan. 18, 2016, 4:51 p.m. UTC | #1
On Mon, 2016-01-18 at 13:40 +0800, Wen Congyang wrote:
> For example: if the secondary host is down, and we fail to send the data to
> the secondary host. xc_domain_save() returns 0. So in the function
> libxl__xc_domain_save_done(), rc is 0(the helper program exits normally),
> and retval is 0(it is xc_domain_save()'s return value). In such case, we
> just need to complete the stream.

What if the secondary host isn't actually down but just communication has
failed for some reason? Won't both primary and secondary start their
respective versions of the domain? What are the consequences of that?
(Corruption?)

I suppose this is a consequence of the lack of STONITH or splitbrain
handling within Remus. Are there any plans to address this?

> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  tools/libxl/libxl.c              |  5 ++++-
>  tools/libxl/libxl_stream_write.c | 14 ++++++++++++--
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index abb2845..d50c3fb 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -884,7 +884,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx,
> libxl_domain_remus_info *info,
>  
>      assert(info);
>  
> -    /* Point of no return */
> +    /*
> +     * This function doesn't return until something is wrong, and
> +     * we need to do failover from secondary.

I was actually hoping for user/API documentation (i.e. in a public header)
rather than a code comment, I suppose this will do though.


> +        if (dss->remus)
> +            /*
> +             * For remus, if libxl__xc_domain_save_done() completes,
> +             * there was an error sending data to the secondary.
> +             * Resume the primary ASAP. The caller doesn't care of the
> +             * return value(Please refer to libxl__remus_teardown())

There should usually be a space before a ( in text/prose (also in the
changelog).

> +             */
> +            stream_complete(egc, stream, 0);
> +        else
> +            write_emulator_xenstore_record(egc, stream);
> +    }
>  }
>  
>  static void write_emulator_xenstore_record(libxl__egc *egc,
Wen Congyang Jan. 19, 2016, 1:01 a.m. UTC | #2
On 01/19/2016 12:51 AM, Ian Campbell wrote:
> On Mon, 2016-01-18 at 13:40 +0800, Wen Congyang wrote:
>> For example: if the secondary host is down, and we fail to send the data to
>> the secondary host. xc_domain_save() returns 0. So in the function
>> libxl__xc_domain_save_done(), rc is 0(the helper program exits normally),
>> and retval is 0(it is xc_domain_save()'s return value). In such case, we
>> just need to complete the stream.
> 
> What if the secondary host isn't actually down but just communication has
> failed for some reason? Won't both primary and secondary start their
> respective versions of the domain? What are the consequences of that?
> (Corruption?)
> 
> I suppose this is a consequence of the lack of STONITH or splitbrain
> handling within Remus. Are there any plans to address this?

IIRC, Shriram Rajagopalan has some ideas about it(check the external heartbeat?).
There is no way to avoid splitbrain unless we have more than two hosts(at least
three hosts). If we want to avoid splitbrain, we may need to destroy both primary
and secondary guests.

An example:
            judge host
            /        \
           1          2
          /            \
primary host  <-3->  secondary host

If connection 3 has problem:
case A: connection 1 and 2 is OK, we can select one to run, and another one will
        be destroyed (we have a judge host)
case B: one of connection 1 and is OK, the another connection has problem. The
        guest on the host that can connects to judge host will continue to run.
        The another guest will be destroyed.
case C: both connection 1 and 2 have problem. The two guest will be destroyed.

> 
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  tools/libxl/libxl.c              |  5 ++++-
>>  tools/libxl/libxl_stream_write.c | 14 ++++++++++++--
>>  2 files changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index abb2845..d50c3fb 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -884,7 +884,10 @@ int libxl_domain_remus_start(libxl_ctx *ctx,
>> libxl_domain_remus_info *info,
>>  
>>      assert(info);
>>  
>> -    /* Point of no return */
>> +    /*
>> +     * This function doesn't return until something is wrong, and
>> +     * we need to do failover from secondary.
> 
> I was actually hoping for user/API documentation (i.e. in a public header)
> rather than a code comment, I suppose this will do though.

OK, I will fix it.

> 
> 
>> +        if (dss->remus)
>> +            /*
>> +             * For remus, if libxl__xc_domain_save_done() completes,
>> +             * there was an error sending data to the secondary.
>> +             * Resume the primary ASAP. The caller doesn't care of the
>> +             * return value(Please refer to libxl__remus_teardown())
> 
> There should usually be a space before a ( in text/prose (also in the
> changelog).

OK, I will fix it.

> 
>> +             */
>> +            stream_complete(egc, stream, 0);
>> +        else
>> +            write_emulator_xenstore_record(egc, stream);
>> +    }
>>  }
>>  
>>  static void write_emulator_xenstore_record(libxl__egc *egc,
> 
> 
> .
>
Ian Campbell Jan. 19, 2016, 11:01 a.m. UTC | #3
On Tue, 2016-01-19 at 09:01 +0800, Wen Congyang wrote:
> On 01/19/2016 12:51 AM, Ian Campbell wrote:
> > On Mon, 2016-01-18 at 13:40 +0800, Wen Congyang wrote:
> > > For example: if the secondary host is down, and we fail to send the
> > > data to
> > > the secondary host. xc_domain_save() returns 0. So in the function
> > > libxl__xc_domain_save_done(), rc is 0(the helper program exits
> > > normally),
> > > and retval is 0(it is xc_domain_save()'s return value). In such case,
> > > we
> > > just need to complete the stream.
> > 
> > What if the secondary host isn't actually down but just communication
> > has
> > failed for some reason? Won't both primary and secondary start their
> > respective versions of the domain? What are the consequences of that?
> > (Corruption?)
> > 
> > I suppose this is a consequence of the lack of STONITH or splitbrain
> > handling within Remus. Are there any plans to address this?
> 
> IIRC, Shriram Rajagopalan has some ideas about it(check the external heartbeat?).
> There is no way to avoid splitbrain unless we have more than two hosts(at least
> three hosts). If we want to avoid splitbrain, we may need to destroy both primary
> and secondary guests.

I think there's plenty of existing systems for taking care of this side of
fault-tolerance/HA (e.g. linux-ha, Pacemaker, Corosync, etc), we don't need
(or want) to reinvent that particular wheel here.

I think we just need a story on how one would integrate with such a system
in order to say that Remus is properly usable in real world scenarios (i.e.
before we can remove the "proof-of-concept" wording from the man page).

That might just be a documentation exercise, or it might require some hooks
etc adding to (lib)xl in order to allow such integrations, I'm not sure
what's needed.

IIRC Ian expressed a similar sentiment when Remus support was first added
to libxl.

Ian.
diff mbox

Patch

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index abb2845..d50c3fb 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -884,7 +884,10 @@  int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
 
     assert(info);
 
-    /* Point of no return */
+    /*
+     * This function doesn't return until something is wrong, and
+     * we need to do failover from secondary.
+     */
     libxl__remus_setup(egc, dss);
     return AO_INPROGRESS;
 
diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c
index 80d9208..2f077c5 100644
--- a/tools/libxl/libxl_stream_write.c
+++ b/tools/libxl/libxl_stream_write.c
@@ -354,8 +354,18 @@  void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
      * alive, and check_all_finished() may have torn it down around us.
      * If the stream is not still alive, we must not continue any work.
      */
-    if (libxl__stream_write_inuse(stream))
-        write_emulator_xenstore_record(egc, stream);
+    if (libxl__stream_write_inuse(stream)) {
+        if (dss->remus)
+            /*
+             * For remus, if libxl__xc_domain_save_done() completes,
+             * there was an error sending data to the secondary.
+             * Resume the primary ASAP. The caller doesn't care of the
+             * return value(Please refer to libxl__remus_teardown())
+             */
+            stream_complete(egc, stream, 0);
+        else
+            write_emulator_xenstore_record(egc, stream);
+    }
 }
 
 static void write_emulator_xenstore_record(libxl__egc *egc,