diff mbox

[RFC,XEN,15/16] tools/libxl: handle return code of libxl__qmp_initializations()

Message ID 20161010003235.4213-16-haozhong.zhang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Haozhong Zhang Oct. 10, 2016, 12:32 a.m. UTC
If any error code is returned when creating a domain, stop the domain
creation.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_create.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Konrad Rzeszutek Wilk Jan. 27, 2017, 10:11 p.m. UTC | #1
On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> If any error code is returned when creating a domain, stop the domain
> creation.

This looks like it is a bug-fix that can be spun off from this
patchset?

> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_create.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index d986cd2..24e8368 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
>      if (dcs->sdss.dm.guest_domid) {
>          if (d_config->b_info.device_model_version
>              == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
> -            libxl__qmp_initializations(gc, domid, d_config);
> +            ret = libxl__qmp_initializations(gc, domid, d_config);
> +            if (ret)
> +                goto error_out;
>          }
>      }
>  
> -- 
> 2.10.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Haozhong Zhang Feb. 8, 2017, 6:07 a.m. UTC | #2
On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
>On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
>> If any error code is returned when creating a domain, stop the domain
>> creation.
>
>This looks like it is a bug-fix that can be spun off from this
>patchset?
>

Yes, if everyone considers it's really a bug and the fix does not
cause compatibility problem (e.g. xl w/o this patch does not abort the
domain creation if it fails to connect to QEMU VNC port).

Thanks,
Haozhong

>>
>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> ---
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>  tools/libxl/libxl_create.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index d986cd2..24e8368 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
>>      if (dcs->sdss.dm.guest_domid) {
>>          if (d_config->b_info.device_model_version
>>              == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
>> -            libxl__qmp_initializations(gc, domid, d_config);
>> +            ret = libxl__qmp_initializations(gc, domid, d_config);
>> +            if (ret)
>> +                goto error_out;
>>          }
>>      }
>>
>> --
>> 2.10.1
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
Wei Liu Feb. 8, 2017, 10:31 a.m. UTC | #3
On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > If any error code is returned when creating a domain, stop the domain
> > > creation.
> > 
> > This looks like it is a bug-fix that can be spun off from this
> > patchset?
> > 
> 
> Yes, if everyone considers it's really a bug and the fix does not
> cause compatibility problem (e.g. xl w/o this patch does not abort the
> domain creation if it fails to connect to QEMU VNC port).
> 

I'm two minded here. If the failure to connect is caused by some
temporary glitches in QEMU and we're sure it will eventually succeed,
there is no need to abort domain creation. If failure to connect is due
to permanent glitches, we should abort.

OOI how did you discover this issue? That could be the key to understand
the issue here.

Wei.

> Thanks,
> Haozhong
> 
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > ---
> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > > Cc: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > >  tools/libxl/libxl_create.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> > > index d986cd2..24e8368 100644
> > > --- a/tools/libxl/libxl_create.c
> > > +++ b/tools/libxl/libxl_create.c
> > > @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
> > >      if (dcs->sdss.dm.guest_domid) {
> > >          if (d_config->b_info.device_model_version
> > >              == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
> > > -            libxl__qmp_initializations(gc, domid, d_config);
> > > +            ret = libxl__qmp_initializations(gc, domid, d_config);
> > > +            if (ret)
> > > +                goto error_out;
> > >          }
> > >      }
> > > 
> > > --
> > > 2.10.1
> > > 
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > https://lists.xen.org/xen-devel
Haozhong Zhang Feb. 9, 2017, 2:47 a.m. UTC | #4
On 02/08/17 10:31 +0000, Wei Liu wrote:
>On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
>> On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
>> > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
>> > > If any error code is returned when creating a domain, stop the domain
>> > > creation.
>> >
>> > This looks like it is a bug-fix that can be spun off from this
>> > patchset?
>> >
>>
>> Yes, if everyone considers it's really a bug and the fix does not
>> cause compatibility problem (e.g. xl w/o this patch does not abort the
>> domain creation if it fails to connect to QEMU VNC port).
>>
>
>I'm two minded here. If the failure to connect is caused by some
>temporary glitches in QEMU and we're sure it will eventually succeed,
>there is no need to abort domain creation. If failure to connect is due
>to permanent glitches, we should abort.
>

Sorry, I should say "*query* QEMU VNC port" instead of *connect*.

libxl__qmp_initializations() currently does following tasks.
1/ Create a QMP socket.

   I think all failures in 1/ should be considered as permanent. It
   does not only fail the following tasks, but also fails the device
   hotplug which needs to cooperate with QEMU.

2/ If 1/ succeeds, query qmp about parameters of serial port and fill
   them in xenstore.
3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
   address, port) of VNC and fill them in xenstore.

   If we assume Xen always send the correct QMP commands and
   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
   socket errors (see qmp_next()), which are hard to tell whether they
   are permanent or temporal. However, if the missing of serial port
   or VNC is considered as not affecting the execution of guest
   domain, we may ignore failures here.

>OOI how did you discover this issue? That could be the key to understand
>the issue here.

The next patch adds code in libxl__qmp_initialization() to query qmp
about vNVDIMM parameters (e.g. the base gpfn which is calculated by
QEMU) and return error code if it fails. While I was developing that
patch, I found xl didn't stop even if bugs in my QEMU patches failed
the code in my Xen patch.

Maybe we could let libxl__qmp_initializations() report whether a
failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
can continue, but it needs to warn those failures.

Thanks,
Haozhong

>
>Wei.
>
>> Thanks,
>> Haozhong
>>
>> > >
>> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>> > > ---
>> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> > > Cc: Wei Liu <wei.liu2@citrix.com>
>> > > ---
>> > >  tools/libxl/libxl_create.c | 4 +++-
>> > >  1 file changed, 3 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> > > index d986cd2..24e8368 100644
>> > > --- a/tools/libxl/libxl_create.c
>> > > +++ b/tools/libxl/libxl_create.c
>> > > @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc,
>> > >      if (dcs->sdss.dm.guest_domid) {
>> > >          if (d_config->b_info.device_model_version
>> > >              == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
>> > > -            libxl__qmp_initializations(gc, domid, d_config);
>> > > +            ret = libxl__qmp_initializations(gc, domid, d_config);
>> > > +            if (ret)
>> > > +                goto error_out;
>> > >          }
>> > >      }
>> > >
>> > > --
>> > > 2.10.1
>> > >
>> > >
>> > > _______________________________________________
>> > > Xen-devel mailing list
>> > > Xen-devel@lists.xen.org
>> > > https://lists.xen.org/xen-devel
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xen.org
>https://lists.xen.org/xen-devel
Wei Liu Feb. 9, 2017, 10:13 a.m. UTC | #5
On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
> On 02/08/17 10:31 +0000, Wei Liu wrote:
> > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > > > If any error code is returned when creating a domain, stop the domain
> > > > > creation.
> > > >
> > > > This looks like it is a bug-fix that can be spun off from this
> > > > patchset?
> > > >
> > > 
> > > Yes, if everyone considers it's really a bug and the fix does not
> > > cause compatibility problem (e.g. xl w/o this patch does not abort the
> > > domain creation if it fails to connect to QEMU VNC port).
> > > 
> > 
> > I'm two minded here. If the failure to connect is caused by some
> > temporary glitches in QEMU and we're sure it will eventually succeed,
> > there is no need to abort domain creation. If failure to connect is due
> > to permanent glitches, we should abort.
> > 
> 
> Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
> 
> libxl__qmp_initializations() currently does following tasks.
> 1/ Create a QMP socket.
> 
>   I think all failures in 1/ should be considered as permanent. It
>   does not only fail the following tasks, but also fails the device
>   hotplug which needs to cooperate with QEMU.
> 
> 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
>   them in xenstore.
> 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
>   address, port) of VNC and fill them in xenstore.
> 
>   If we assume Xen always send the correct QMP commands and
>   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
>   socket errors (see qmp_next()), which are hard to tell whether they
>   are permanent or temporal. However, if the missing of serial port
>   or VNC is considered as not affecting the execution of guest
>   domain, we may ignore failures here.
> 
> > OOI how did you discover this issue? That could be the key to understand
> > the issue here.
> 
> The next patch adds code in libxl__qmp_initialization() to query qmp
> about vNVDIMM parameters (e.g. the base gpfn which is calculated by
> QEMU) and return error code if it fails. While I was developing that
> patch, I found xl didn't stop even if bugs in my QEMU patches failed
> the code in my Xen patch.
> 

Right, this should definitely be fatal.

> Maybe we could let libxl__qmp_initializations() report whether a
> failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
> xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
> can continue, but it needs to warn those failures.
> 

Yes, we can do that. It's an internal function, we can change things as
we see fit.

I would suggest you only make vNVDIMM failure fatal as a start.

Wei.
Wei Liu Feb. 9, 2017, 10:16 a.m. UTC | #6
Hmm... not sure why my reply didn't have you in the To: field.

On Thu, Feb 09, 2017 at 10:13:13AM +0000, Wei Liu wrote:
> On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
> > On 02/08/17 10:31 +0000, Wei Liu wrote:
> > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > > > > If any error code is returned when creating a domain, stop the domain
> > > > > > creation.
> > > > >
> > > > > This looks like it is a bug-fix that can be spun off from this
> > > > > patchset?
> > > > >
> > > > 
> > > > Yes, if everyone considers it's really a bug and the fix does not
> > > > cause compatibility problem (e.g. xl w/o this patch does not abort the
> > > > domain creation if it fails to connect to QEMU VNC port).
> > > > 
> > > 
> > > I'm two minded here. If the failure to connect is caused by some
> > > temporary glitches in QEMU and we're sure it will eventually succeed,
> > > there is no need to abort domain creation. If failure to connect is due
> > > to permanent glitches, we should abort.
> > > 
> > 
> > Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
> > 
> > libxl__qmp_initializations() currently does following tasks.
> > 1/ Create a QMP socket.
> > 
> >   I think all failures in 1/ should be considered as permanent. It
> >   does not only fail the following tasks, but also fails the device
> >   hotplug which needs to cooperate with QEMU.
> > 
> > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
> >   them in xenstore.
> > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
> >   address, port) of VNC and fill them in xenstore.
> > 
> >   If we assume Xen always send the correct QMP commands and
> >   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
> >   socket errors (see qmp_next()), which are hard to tell whether they
> >   are permanent or temporal. However, if the missing of serial port
> >   or VNC is considered as not affecting the execution of guest
> >   domain, we may ignore failures here.
> > 
> > > OOI how did you discover this issue? That could be the key to understand
> > > the issue here.
> > 
> > The next patch adds code in libxl__qmp_initialization() to query qmp
> > about vNVDIMM parameters (e.g. the base gpfn which is calculated by
> > QEMU) and return error code if it fails. While I was developing that
> > patch, I found xl didn't stop even if bugs in my QEMU patches failed
> > the code in my Xen patch.
> > 
> 
> Right, this should definitely be fatal.
> 
> > Maybe we could let libxl__qmp_initializations() report whether a
> > failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
> > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
> > can continue, but it needs to warn those failures.
> > 
> 
> Yes, we can do that. It's an internal function, we can change things as
> we see fit.
> 
> I would suggest you only make vNVDIMM failure fatal as a start.
> 
> Wei.
Haozhong Zhang Feb. 10, 2017, 2:37 a.m. UTC | #7
On 02/09/17 10:13 +0000, Wei Liu wrote:
>On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
>> On 02/08/17 10:31 +0000, Wei Liu wrote:
>> > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
>> > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
>> > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
>> > > > > If any error code is returned when creating a domain, stop the domain
>> > > > > creation.
>> > > >
>> > > > This looks like it is a bug-fix that can be spun off from this
>> > > > patchset?
>> > > >
>> > >
>> > > Yes, if everyone considers it's really a bug and the fix does not
>> > > cause compatibility problem (e.g. xl w/o this patch does not abort the
>> > > domain creation if it fails to connect to QEMU VNC port).
>> > >
>> >
>> > I'm two minded here. If the failure to connect is caused by some
>> > temporary glitches in QEMU and we're sure it will eventually succeed,
>> > there is no need to abort domain creation. If failure to connect is due
>> > to permanent glitches, we should abort.
>> >
>>
>> Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
>>
>> libxl__qmp_initializations() currently does following tasks.
>> 1/ Create a QMP socket.
>>
>>   I think all failures in 1/ should be considered as permanent. It
>>   does not only fail the following tasks, but also fails the device
>>   hotplug which needs to cooperate with QEMU.
>>
>> 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
>>   them in xenstore.
>> 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
>>   address, port) of VNC and fill them in xenstore.
>>
>>   If we assume Xen always send the correct QMP commands and
>>   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
>>   socket errors (see qmp_next()), which are hard to tell whether they
>>   are permanent or temporal. However, if the missing of serial port
>>   or VNC is considered as not affecting the execution of guest
>>   domain, we may ignore failures here.
>>
>> > OOI how did you discover this issue? That could be the key to understand
>> > the issue here.
>>
>> The next patch adds code in libxl__qmp_initialization() to query qmp
>> about vNVDIMM parameters (e.g. the base gpfn which is calculated by
>> QEMU) and return error code if it fails. While I was developing that
>> patch, I found xl didn't stop even if bugs in my QEMU patches failed
>> the code in my Xen patch.
>>
>
>Right, this should definitely be fatal.
>
>> Maybe we could let libxl__qmp_initializations() report whether a
>> failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
>> xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
>> can continue, but it needs to warn those failures.
>>
>
>Yes, we can do that. It's an internal function, we can change things as
>we see fit.
>
>I would suggest you only make vNVDIMM failure fatal as a start.
>

I'll send a patch out of this series to implement above w/o NVDIMM
stuffs.

Thanks,
Haozhong
Wei Liu Feb. 10, 2017, 8:11 a.m. UTC | #8
On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote:
> On 02/09/17 10:13 +0000, Wei Liu wrote:
> > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
> > > On 02/08/17 10:31 +0000, Wei Liu wrote:
> > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > > > > > If any error code is returned when creating a domain, stop the domain
> > > > > > > creation.
> > > > > >
> > > > > > This looks like it is a bug-fix that can be spun off from this
> > > > > > patchset?
> > > > > >
> > > > >
> > > > > Yes, if everyone considers it's really a bug and the fix does not
> > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the
> > > > > domain creation if it fails to connect to QEMU VNC port).
> > > > >
> > > >
> > > > I'm two minded here. If the failure to connect is caused by some
> > > > temporary glitches in QEMU and we're sure it will eventually succeed,
> > > > there is no need to abort domain creation. If failure to connect is due
> > > > to permanent glitches, we should abort.
> > > >
> > > 
> > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
> > > 
> > > libxl__qmp_initializations() currently does following tasks.
> > > 1/ Create a QMP socket.
> > > 
> > >   I think all failures in 1/ should be considered as permanent. It
> > >   does not only fail the following tasks, but also fails the device
> > >   hotplug which needs to cooperate with QEMU.
> > > 
> > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
> > >   them in xenstore.
> > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
> > >   address, port) of VNC and fill them in xenstore.
> > > 
> > >   If we assume Xen always send the correct QMP commands and
> > >   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
> > >   socket errors (see qmp_next()), which are hard to tell whether they
> > >   are permanent or temporal. However, if the missing of serial port
> > >   or VNC is considered as not affecting the execution of guest
> > >   domain, we may ignore failures here.
> > > 
> > > > OOI how did you discover this issue? That could be the key to understand
> > > > the issue here.
> > > 
> > > The next patch adds code in libxl__qmp_initialization() to query qmp
> > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by
> > > QEMU) and return error code if it fails. While I was developing that
> > > patch, I found xl didn't stop even if bugs in my QEMU patches failed
> > > the code in my Xen patch.
> > > 
> > 
> > Right, this should definitely be fatal.
> > 
> > > Maybe we could let libxl__qmp_initializations() report whether a
> > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
> > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
> > > can continue, but it needs to warn those failures.
> > > 
> > 
> > Yes, we can do that. It's an internal function, we can change things as
> > we see fit.
> > 
> > I would suggest you only make vNVDIMM failure fatal as a start.
> > 
> 
> I'll send a patch out of this series to implement above w/o NVDIMM
> stuffs.
> 

Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're
fine with this function as-is because we don't want to make VNC / serial
error fatal, right?

(not going to work today so please allow me some time to read your
reply)

Wei.



> Thanks,
> Haozhong
Wei Liu Feb. 10, 2017, 8:23 a.m. UTC | #9
On Fri, Feb 10, 2017 at 08:11:20AM +0000, Wei Liu wrote:
> On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote:
> > On 02/09/17 10:13 +0000, Wei Liu wrote:
> > > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
> > > > On 02/08/17 10:31 +0000, Wei Liu wrote:
> > > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
> > > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
> > > > > > > > If any error code is returned when creating a domain, stop the domain
> > > > > > > > creation.
> > > > > > >
> > > > > > > This looks like it is a bug-fix that can be spun off from this
> > > > > > > patchset?
> > > > > > >
> > > > > >
> > > > > > Yes, if everyone considers it's really a bug and the fix does not
> > > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the
> > > > > > domain creation if it fails to connect to QEMU VNC port).
> > > > > >
> > > > >
> > > > > I'm two minded here. If the failure to connect is caused by some
> > > > > temporary glitches in QEMU and we're sure it will eventually succeed,
> > > > > there is no need to abort domain creation. If failure to connect is due
> > > > > to permanent glitches, we should abort.
> > > > >
> > > > 
> > > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
> > > > 
> > > > libxl__qmp_initializations() currently does following tasks.
> > > > 1/ Create a QMP socket.
> > > > 
> > > >   I think all failures in 1/ should be considered as permanent. It
> > > >   does not only fail the following tasks, but also fails the device
> > > >   hotplug which needs to cooperate with QEMU.
> > > > 
> > > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
> > > >   them in xenstore.
> > > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
> > > >   address, port) of VNC and fill them in xenstore.
> > > > 
> > > >   If we assume Xen always send the correct QMP commands and
> > > >   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
> > > >   socket errors (see qmp_next()), which are hard to tell whether they
> > > >   are permanent or temporal. However, if the missing of serial port
> > > >   or VNC is considered as not affecting the execution of guest
> > > >   domain, we may ignore failures here.
> > > > 
> > > > > OOI how did you discover this issue? That could be the key to understand
> > > > > the issue here.
> > > > 
> > > > The next patch adds code in libxl__qmp_initialization() to query qmp
> > > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by
> > > > QEMU) and return error code if it fails. While I was developing that
> > > > patch, I found xl didn't stop even if bugs in my QEMU patches failed
> > > > the code in my Xen patch.
> > > > 
> > > 
> > > Right, this should definitely be fatal.
> > > 
> > > > Maybe we could let libxl__qmp_initializations() report whether a
> > > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
> > > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
> > > > can continue, but it needs to warn those failures.
> > > > 
> > > 
> > > Yes, we can do that. It's an internal function, we can change things as
> > > we see fit.
> > > 
> > > I would suggest you only make vNVDIMM failure fatal as a start.
> > > 
> > 
> > I'll send a patch out of this series to implement above w/o NVDIMM
> > stuffs.
> > 
> 
> Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're
> fine with this function as-is because we don't want to make VNC / serial
> error fatal, right?
> 
> (not going to work today so please allow me some time to read your
> reply)
> 
> Wei.
> 
> 
> 
> > Thanks,
> > Haozhong
Haozhong Zhang Feb. 10, 2017, 8:24 a.m. UTC | #10
On 02/10/17 08:11 +0000, Wei Liu wrote:
>On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote:
>> On 02/09/17 10:13 +0000, Wei Liu wrote:
>> > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote:
>> > > On 02/08/17 10:31 +0000, Wei Liu wrote:
>> > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote:
>> > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote:
>> > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote:
>> > > > > > > If any error code is returned when creating a domain, stop the domain
>> > > > > > > creation.
>> > > > > >
>> > > > > > This looks like it is a bug-fix that can be spun off from this
>> > > > > > patchset?
>> > > > > >
>> > > > >
>> > > > > Yes, if everyone considers it's really a bug and the fix does not
>> > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the
>> > > > > domain creation if it fails to connect to QEMU VNC port).
>> > > > >
>> > > >
>> > > > I'm two minded here. If the failure to connect is caused by some
>> > > > temporary glitches in QEMU and we're sure it will eventually succeed,
>> > > > there is no need to abort domain creation. If failure to connect is due
>> > > > to permanent glitches, we should abort.
>> > > >
>> > >
>> > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*.
>> > >
>> > > libxl__qmp_initializations() currently does following tasks.
>> > > 1/ Create a QMP socket.
>> > >
>> > >   I think all failures in 1/ should be considered as permanent. It
>> > >   does not only fail the following tasks, but also fails the device
>> > >   hotplug which needs to cooperate with QEMU.
>> > >
>> > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill
>> > >   them in xenstore.
>> > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password,
>> > >   address, port) of VNC and fill them in xenstore.
>> > >
>> > >   If we assume Xen always send the correct QMP commands and
>> > >   parameters, the QMP failures in 2/ and 3/ will be caused by QMP
>> > >   socket errors (see qmp_next()), which are hard to tell whether they
>> > >   are permanent or temporal. However, if the missing of serial port
>> > >   or VNC is considered as not affecting the execution of guest
>> > >   domain, we may ignore failures here.
>> > >
>> > > > OOI how did you discover this issue? That could be the key to understand
>> > > > the issue here.
>> > >
>> > > The next patch adds code in libxl__qmp_initialization() to query qmp
>> > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by
>> > > QEMU) and return error code if it fails. While I was developing that
>> > > patch, I found xl didn't stop even if bugs in my QEMU patches failed
>> > > the code in my Xen patch.
>> > >
>> >
>> > Right, this should definitely be fatal.
>> >
>> > > Maybe we could let libxl__qmp_initializations() report whether a
>> > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/),
>> > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl
>> > > can continue, but it needs to warn those failures.
>> > >
>> >
>> > Yes, we can do that. It's an internal function, we can change things as
>> > we see fit.
>> >
>> > I would suggest you only make vNVDIMM failure fatal as a start.
>> >
>>
>> I'll send a patch out of this series to implement above w/o NVDIMM
>> stuffs.
>>
>
>Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're
>fine with this function as-is because we don't want to make VNC / serial
>error fatal, right?
>

I misunderstood that xl should fail if encountering errors in 1/, but
now you indicate it's fine to leave it as-is, so no patch will be
needed until NVDIMM support is added.

Haozhong

>(not going to work today so please allow me some time to read your
>reply)
>
>Wei.
diff mbox

Patch

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index d986cd2..24e8368 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1499,7 +1499,9 @@  static void domcreate_devmodel_started(libxl__egc *egc,
     if (dcs->sdss.dm.guest_domid) {
         if (d_config->b_info.device_model_version
             == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
-            libxl__qmp_initializations(gc, domid, d_config);
+            ret = libxl__qmp_initializations(gc, domid, d_config);
+            if (ret)
+                goto error_out;
         }
     }