Message ID | 20161010003235.4213-16-haozhong.zhang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: > If any error code is returned when creating a domain, stop the domain > creation. This looks like it is a bug-fix that can be spun off from this patchset? > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > --- > Cc: Ian Jackson <ian.jackson@eu.citrix.com> > Cc: Wei Liu <wei.liu2@citrix.com> > --- > tools/libxl/libxl_create.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index d986cd2..24e8368 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc, > if (dcs->sdss.dm.guest_domid) { > if (d_config->b_info.device_model_version > == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { > - libxl__qmp_initializations(gc, domid, d_config); > + ret = libxl__qmp_initializations(gc, domid, d_config); > + if (ret) > + goto error_out; > } > } > > -- > 2.10.1 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: >On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: >> If any error code is returned when creating a domain, stop the domain >> creation. > >This looks like it is a bug-fix that can be spun off from this >patchset? > Yes, if everyone considers it's really a bug and the fix does not cause compatibility problem (e.g. xl w/o this patch does not abort the domain creation if it fails to connect to QEMU VNC port). Thanks, Haozhong >> >> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> >> --- >> Cc: Ian Jackson <ian.jackson@eu.citrix.com> >> Cc: Wei Liu <wei.liu2@citrix.com> >> --- >> tools/libxl/libxl_create.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c >> index d986cd2..24e8368 100644 >> --- a/tools/libxl/libxl_create.c >> +++ b/tools/libxl/libxl_create.c >> @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc, >> if (dcs->sdss.dm.guest_domid) { >> if (d_config->b_info.device_model_version >> == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { >> - libxl__qmp_initializations(gc, domid, d_config); >> + ret = libxl__qmp_initializations(gc, domid, d_config); >> + if (ret) >> + goto error_out; >> } >> } >> >> -- >> 2.10.1 >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> https://lists.xen.org/xen-devel
On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote: > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: > > > If any error code is returned when creating a domain, stop the domain > > > creation. > > > > This looks like it is a bug-fix that can be spun off from this > > patchset? > > > > Yes, if everyone considers it's really a bug and the fix does not > cause compatibility problem (e.g. xl w/o this patch does not abort the > domain creation if it fails to connect to QEMU VNC port). > I'm two minded here. If the failure to connect is caused by some temporary glitches in QEMU and we're sure it will eventually succeed, there is no need to abort domain creation. If failure to connect is due to permanent glitches, we should abort. OOI how did you discover this issue? That could be the key to understand the issue here. Wei. > Thanks, > Haozhong > > > > > > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> > > > --- > > > Cc: Ian Jackson <ian.jackson@eu.citrix.com> > > > Cc: Wei Liu <wei.liu2@citrix.com> > > > --- > > > tools/libxl/libxl_create.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > > > index d986cd2..24e8368 100644 > > > --- a/tools/libxl/libxl_create.c > > > +++ b/tools/libxl/libxl_create.c > > > @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc, > > > if (dcs->sdss.dm.guest_domid) { > > > if (d_config->b_info.device_model_version > > > == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { > > > - libxl__qmp_initializations(gc, domid, d_config); > > > + ret = libxl__qmp_initializations(gc, domid, d_config); > > > + if (ret) > > > + goto error_out; > > > } > > > } > > > > > > -- > > > 2.10.1 > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xen.org > > > https://lists.xen.org/xen-devel
On 02/08/17 10:31 +0000, Wei Liu wrote: >On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote: >> On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: >> > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: >> > > If any error code is returned when creating a domain, stop the domain >> > > creation. >> > >> > This looks like it is a bug-fix that can be spun off from this >> > patchset? >> > >> >> Yes, if everyone considers it's really a bug and the fix does not >> cause compatibility problem (e.g. xl w/o this patch does not abort the >> domain creation if it fails to connect to QEMU VNC port). >> > >I'm two minded here. If the failure to connect is caused by some >temporary glitches in QEMU and we're sure it will eventually succeed, >there is no need to abort domain creation. If failure to connect is due >to permanent glitches, we should abort. > Sorry, I should say "*query* QEMU VNC port" instead of *connect*. libxl__qmp_initializations() currently does following tasks. 1/ Create a QMP socket. I think all failures in 1/ should be considered as permanent. It does not only fail the following tasks, but also fails the device hotplug which needs to cooperate with QEMU. 2/ If 1/ succeeds, query qmp about parameters of serial port and fill them in xenstore. 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password, address, port) of VNC and fill them in xenstore. If we assume Xen always send the correct QMP commands and parameters, the QMP failures in 2/ and 3/ will be caused by QMP socket errors (see qmp_next()), which are hard to tell whether they are permanent or temporal. However, if the missing of serial port or VNC is considered as not affecting the execution of guest domain, we may ignore failures here. >OOI how did you discover this issue? That could be the key to understand >the issue here. The next patch adds code in libxl__qmp_initialization() to query qmp about vNVDIMM parameters (e.g. the base gpfn which is calculated by QEMU) and return error code if it fails. While I was developing that patch, I found xl didn't stop even if bugs in my QEMU patches failed the code in my Xen patch. Maybe we could let libxl__qmp_initializations() report whether a failure can be tolerant. For non-tolerant failures (e.g. those in 1/), xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl can continue, but it needs to warn those failures. Thanks, Haozhong > >Wei. > >> Thanks, >> Haozhong >> >> > > >> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> >> > > --- >> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com> >> > > Cc: Wei Liu <wei.liu2@citrix.com> >> > > --- >> > > tools/libxl/libxl_create.c | 4 +++- >> > > 1 file changed, 3 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c >> > > index d986cd2..24e8368 100644 >> > > --- a/tools/libxl/libxl_create.c >> > > +++ b/tools/libxl/libxl_create.c >> > > @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc, >> > > if (dcs->sdss.dm.guest_domid) { >> > > if (d_config->b_info.device_model_version >> > > == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { >> > > - libxl__qmp_initializations(gc, domid, d_config); >> > > + ret = libxl__qmp_initializations(gc, domid, d_config); >> > > + if (ret) >> > > + goto error_out; >> > > } >> > > } >> > > >> > > -- >> > > 2.10.1 >> > > >> > > >> > > _______________________________________________ >> > > Xen-devel mailing list >> > > Xen-devel@lists.xen.org >> > > https://lists.xen.org/xen-devel > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xen.org >https://lists.xen.org/xen-devel
On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote: > On 02/08/17 10:31 +0000, Wei Liu wrote: > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote: > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: > > > > > If any error code is returned when creating a domain, stop the domain > > > > > creation. > > > > > > > > This looks like it is a bug-fix that can be spun off from this > > > > patchset? > > > > > > > > > > Yes, if everyone considers it's really a bug and the fix does not > > > cause compatibility problem (e.g. xl w/o this patch does not abort the > > > domain creation if it fails to connect to QEMU VNC port). > > > > > > > I'm two minded here. If the failure to connect is caused by some > > temporary glitches in QEMU and we're sure it will eventually succeed, > > there is no need to abort domain creation. If failure to connect is due > > to permanent glitches, we should abort. > > > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*. > > libxl__qmp_initializations() currently does following tasks. > 1/ Create a QMP socket. > > I think all failures in 1/ should be considered as permanent. It > does not only fail the following tasks, but also fails the device > hotplug which needs to cooperate with QEMU. > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill > them in xenstore. > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password, > address, port) of VNC and fill them in xenstore. > > If we assume Xen always send the correct QMP commands and > parameters, the QMP failures in 2/ and 3/ will be caused by QMP > socket errors (see qmp_next()), which are hard to tell whether they > are permanent or temporal. However, if the missing of serial port > or VNC is considered as not affecting the execution of guest > domain, we may ignore failures here. > > > OOI how did you discover this issue? That could be the key to understand > > the issue here. > > The next patch adds code in libxl__qmp_initialization() to query qmp > about vNVDIMM parameters (e.g. the base gpfn which is calculated by > QEMU) and return error code if it fails. While I was developing that > patch, I found xl didn't stop even if bugs in my QEMU patches failed > the code in my Xen patch. > Right, this should definitely be fatal. > Maybe we could let libxl__qmp_initializations() report whether a > failure can be tolerant. For non-tolerant failures (e.g. those in 1/), > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl > can continue, but it needs to warn those failures. > Yes, we can do that. It's an internal function, we can change things as we see fit. I would suggest you only make vNVDIMM failure fatal as a start. Wei.
Hmm... not sure why my reply didn't have you in the To: field. On Thu, Feb 09, 2017 at 10:13:13AM +0000, Wei Liu wrote: > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote: > > On 02/08/17 10:31 +0000, Wei Liu wrote: > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote: > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: > > > > > > If any error code is returned when creating a domain, stop the domain > > > > > > creation. > > > > > > > > > > This looks like it is a bug-fix that can be spun off from this > > > > > patchset? > > > > > > > > > > > > > Yes, if everyone considers it's really a bug and the fix does not > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the > > > > domain creation if it fails to connect to QEMU VNC port). > > > > > > > > > > I'm two minded here. If the failure to connect is caused by some > > > temporary glitches in QEMU and we're sure it will eventually succeed, > > > there is no need to abort domain creation. If failure to connect is due > > > to permanent glitches, we should abort. > > > > > > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*. > > > > libxl__qmp_initializations() currently does following tasks. > > 1/ Create a QMP socket. > > > > I think all failures in 1/ should be considered as permanent. It > > does not only fail the following tasks, but also fails the device > > hotplug which needs to cooperate with QEMU. > > > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill > > them in xenstore. > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password, > > address, port) of VNC and fill them in xenstore. > > > > If we assume Xen always send the correct QMP commands and > > parameters, the QMP failures in 2/ and 3/ will be caused by QMP > > socket errors (see qmp_next()), which are hard to tell whether they > > are permanent or temporal. However, if the missing of serial port > > or VNC is considered as not affecting the execution of guest > > domain, we may ignore failures here. > > > > > OOI how did you discover this issue? That could be the key to understand > > > the issue here. > > > > The next patch adds code in libxl__qmp_initialization() to query qmp > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by > > QEMU) and return error code if it fails. While I was developing that > > patch, I found xl didn't stop even if bugs in my QEMU patches failed > > the code in my Xen patch. > > > > Right, this should definitely be fatal. > > > Maybe we could let libxl__qmp_initializations() report whether a > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/), > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl > > can continue, but it needs to warn those failures. > > > > Yes, we can do that. It's an internal function, we can change things as > we see fit. > > I would suggest you only make vNVDIMM failure fatal as a start. > > Wei.
On 02/09/17 10:13 +0000, Wei Liu wrote: >On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote: >> On 02/08/17 10:31 +0000, Wei Liu wrote: >> > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote: >> > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: >> > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: >> > > > > If any error code is returned when creating a domain, stop the domain >> > > > > creation. >> > > > >> > > > This looks like it is a bug-fix that can be spun off from this >> > > > patchset? >> > > > >> > > >> > > Yes, if everyone considers it's really a bug and the fix does not >> > > cause compatibility problem (e.g. xl w/o this patch does not abort the >> > > domain creation if it fails to connect to QEMU VNC port). >> > > >> > >> > I'm two minded here. If the failure to connect is caused by some >> > temporary glitches in QEMU and we're sure it will eventually succeed, >> > there is no need to abort domain creation. If failure to connect is due >> > to permanent glitches, we should abort. >> > >> >> Sorry, I should say "*query* QEMU VNC port" instead of *connect*. >> >> libxl__qmp_initializations() currently does following tasks. >> 1/ Create a QMP socket. >> >> I think all failures in 1/ should be considered as permanent. It >> does not only fail the following tasks, but also fails the device >> hotplug which needs to cooperate with QEMU. >> >> 2/ If 1/ succeeds, query qmp about parameters of serial port and fill >> them in xenstore. >> 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password, >> address, port) of VNC and fill them in xenstore. >> >> If we assume Xen always send the correct QMP commands and >> parameters, the QMP failures in 2/ and 3/ will be caused by QMP >> socket errors (see qmp_next()), which are hard to tell whether they >> are permanent or temporal. However, if the missing of serial port >> or VNC is considered as not affecting the execution of guest >> domain, we may ignore failures here. >> >> > OOI how did you discover this issue? That could be the key to understand >> > the issue here. >> >> The next patch adds code in libxl__qmp_initialization() to query qmp >> about vNVDIMM parameters (e.g. the base gpfn which is calculated by >> QEMU) and return error code if it fails. While I was developing that >> patch, I found xl didn't stop even if bugs in my QEMU patches failed >> the code in my Xen patch. >> > >Right, this should definitely be fatal. > >> Maybe we could let libxl__qmp_initializations() report whether a >> failure can be tolerant. For non-tolerant failures (e.g. those in 1/), >> xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl >> can continue, but it needs to warn those failures. >> > >Yes, we can do that. It's an internal function, we can change things as >we see fit. > >I would suggest you only make vNVDIMM failure fatal as a start. > I'll send a patch out of this series to implement above w/o NVDIMM stuffs. Thanks, Haozhong
On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote: > On 02/09/17 10:13 +0000, Wei Liu wrote: > > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote: > > > On 02/08/17 10:31 +0000, Wei Liu wrote: > > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote: > > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: > > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: > > > > > > > If any error code is returned when creating a domain, stop the domain > > > > > > > creation. > > > > > > > > > > > > This looks like it is a bug-fix that can be spun off from this > > > > > > patchset? > > > > > > > > > > > > > > > > Yes, if everyone considers it's really a bug and the fix does not > > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the > > > > > domain creation if it fails to connect to QEMU VNC port). > > > > > > > > > > > > > I'm two minded here. If the failure to connect is caused by some > > > > temporary glitches in QEMU and we're sure it will eventually succeed, > > > > there is no need to abort domain creation. If failure to connect is due > > > > to permanent glitches, we should abort. > > > > > > > > > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*. > > > > > > libxl__qmp_initializations() currently does following tasks. > > > 1/ Create a QMP socket. > > > > > > I think all failures in 1/ should be considered as permanent. It > > > does not only fail the following tasks, but also fails the device > > > hotplug which needs to cooperate with QEMU. > > > > > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill > > > them in xenstore. > > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password, > > > address, port) of VNC and fill them in xenstore. > > > > > > If we assume Xen always send the correct QMP commands and > > > parameters, the QMP failures in 2/ and 3/ will be caused by QMP > > > socket errors (see qmp_next()), which are hard to tell whether they > > > are permanent or temporal. However, if the missing of serial port > > > or VNC is considered as not affecting the execution of guest > > > domain, we may ignore failures here. > > > > > > > OOI how did you discover this issue? That could be the key to understand > > > > the issue here. > > > > > > The next patch adds code in libxl__qmp_initialization() to query qmp > > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by > > > QEMU) and return error code if it fails. While I was developing that > > > patch, I found xl didn't stop even if bugs in my QEMU patches failed > > > the code in my Xen patch. > > > > > > > Right, this should definitely be fatal. > > > > > Maybe we could let libxl__qmp_initializations() report whether a > > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/), > > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl > > > can continue, but it needs to warn those failures. > > > > > > > Yes, we can do that. It's an internal function, we can change things as > > we see fit. > > > > I would suggest you only make vNVDIMM failure fatal as a start. > > > > I'll send a patch out of this series to implement above w/o NVDIMM > stuffs. > Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're fine with this function as-is because we don't want to make VNC / serial error fatal, right? (not going to work today so please allow me some time to read your reply) Wei. > Thanks, > Haozhong
On Fri, Feb 10, 2017 at 08:11:20AM +0000, Wei Liu wrote: > On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote: > > On 02/09/17 10:13 +0000, Wei Liu wrote: > > > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote: > > > > On 02/08/17 10:31 +0000, Wei Liu wrote: > > > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote: > > > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: > > > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: > > > > > > > > If any error code is returned when creating a domain, stop the domain > > > > > > > > creation. > > > > > > > > > > > > > > This looks like it is a bug-fix that can be spun off from this > > > > > > > patchset? > > > > > > > > > > > > > > > > > > > Yes, if everyone considers it's really a bug and the fix does not > > > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the > > > > > > domain creation if it fails to connect to QEMU VNC port). > > > > > > > > > > > > > > > > I'm two minded here. If the failure to connect is caused by some > > > > > temporary glitches in QEMU and we're sure it will eventually succeed, > > > > > there is no need to abort domain creation. If failure to connect is due > > > > > to permanent glitches, we should abort. > > > > > > > > > > > > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*. > > > > > > > > libxl__qmp_initializations() currently does following tasks. > > > > 1/ Create a QMP socket. > > > > > > > > I think all failures in 1/ should be considered as permanent. It > > > > does not only fail the following tasks, but also fails the device > > > > hotplug which needs to cooperate with QEMU. > > > > > > > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill > > > > them in xenstore. > > > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password, > > > > address, port) of VNC and fill them in xenstore. > > > > > > > > If we assume Xen always send the correct QMP commands and > > > > parameters, the QMP failures in 2/ and 3/ will be caused by QMP > > > > socket errors (see qmp_next()), which are hard to tell whether they > > > > are permanent or temporal. However, if the missing of serial port > > > > or VNC is considered as not affecting the execution of guest > > > > domain, we may ignore failures here. > > > > > > > > > OOI how did you discover this issue? That could be the key to understand > > > > > the issue here. > > > > > > > > The next patch adds code in libxl__qmp_initialization() to query qmp > > > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by > > > > QEMU) and return error code if it fails. While I was developing that > > > > patch, I found xl didn't stop even if bugs in my QEMU patches failed > > > > the code in my Xen patch. > > > > > > > > > > Right, this should definitely be fatal. > > > > > > > Maybe we could let libxl__qmp_initializations() report whether a > > > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/), > > > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl > > > > can continue, but it needs to warn those failures. > > > > > > > > > > Yes, we can do that. It's an internal function, we can change things as > > > we see fit. > > > > > > I would suggest you only make vNVDIMM failure fatal as a start. > > > > > > > I'll send a patch out of this series to implement above w/o NVDIMM > > stuffs. > > > > Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're > fine with this function as-is because we don't want to make VNC / serial > error fatal, right? > > (not going to work today so please allow me some time to read your > reply) > > Wei. > > > > > Thanks, > > Haozhong
On 02/10/17 08:11 +0000, Wei Liu wrote: >On Fri, Feb 10, 2017 at 10:37:44AM +0800, Haozhong Zhang wrote: >> On 02/09/17 10:13 +0000, Wei Liu wrote: >> > On Thu, Feb 09, 2017 at 10:47:01AM +0800, Haozhong Zhang wrote: >> > > On 02/08/17 10:31 +0000, Wei Liu wrote: >> > > > On Wed, Feb 08, 2017 at 02:07:26PM +0800, Haozhong Zhang wrote: >> > > > > On 01/27/17 17:11 -0500, Konrad Rzeszutek Wilk wrote: >> > > > > > On Mon, Oct 10, 2016 at 08:32:34AM +0800, Haozhong Zhang wrote: >> > > > > > > If any error code is returned when creating a domain, stop the domain >> > > > > > > creation. >> > > > > > >> > > > > > This looks like it is a bug-fix that can be spun off from this >> > > > > > patchset? >> > > > > > >> > > > > >> > > > > Yes, if everyone considers it's really a bug and the fix does not >> > > > > cause compatibility problem (e.g. xl w/o this patch does not abort the >> > > > > domain creation if it fails to connect to QEMU VNC port). >> > > > > >> > > > >> > > > I'm two minded here. If the failure to connect is caused by some >> > > > temporary glitches in QEMU and we're sure it will eventually succeed, >> > > > there is no need to abort domain creation. If failure to connect is due >> > > > to permanent glitches, we should abort. >> > > > >> > > >> > > Sorry, I should say "*query* QEMU VNC port" instead of *connect*. >> > > >> > > libxl__qmp_initializations() currently does following tasks. >> > > 1/ Create a QMP socket. >> > > >> > > I think all failures in 1/ should be considered as permanent. It >> > > does not only fail the following tasks, but also fails the device >> > > hotplug which needs to cooperate with QEMU. >> > > >> > > 2/ If 1/ succeeds, query qmp about parameters of serial port and fill >> > > them in xenstore. >> > > 3/ If 1/ and 2/ succeed, set and query qmp about parameters (password, >> > > address, port) of VNC and fill them in xenstore. >> > > >> > > If we assume Xen always send the correct QMP commands and >> > > parameters, the QMP failures in 2/ and 3/ will be caused by QMP >> > > socket errors (see qmp_next()), which are hard to tell whether they >> > > are permanent or temporal. However, if the missing of serial port >> > > or VNC is considered as not affecting the execution of guest >> > > domain, we may ignore failures here. >> > > >> > > > OOI how did you discover this issue? That could be the key to understand >> > > > the issue here. >> > > >> > > The next patch adds code in libxl__qmp_initialization() to query qmp >> > > about vNVDIMM parameters (e.g. the base gpfn which is calculated by >> > > QEMU) and return error code if it fails. While I was developing that >> > > patch, I found xl didn't stop even if bugs in my QEMU patches failed >> > > the code in my Xen patch. >> > > >> > >> > Right, this should definitely be fatal. >> > >> > > Maybe we could let libxl__qmp_initializations() report whether a >> > > failure can be tolerant. For non-tolerant failures (e.g. those in 1/), >> > > xl should stop. For tolerant failures (e.g. those in 2/ and 3/), xl >> > > can continue, but it needs to warn those failures. >> > > >> > >> > Yes, we can do that. It's an internal function, we can change things as >> > we see fit. >> > >> > I would suggest you only make vNVDIMM failure fatal as a start. >> > >> >> I'll send a patch out of this series to implement above w/o NVDIMM >> stuffs. >> > >Sorry, I'm not sure I follow, correct me if I'm wrong: I think we're >fine with this function as-is because we don't want to make VNC / serial >error fatal, right? > I misunderstood that xl should fail if encountering errors in 1/, but now you indicate it's fine to leave it as-is, so no patch will be needed until NVDIMM support is added. Haozhong >(not going to work today so please allow me some time to read your >reply) > >Wei.
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index d986cd2..24e8368 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1499,7 +1499,9 @@ static void domcreate_devmodel_started(libxl__egc *egc, if (dcs->sdss.dm.guest_domid) { if (d_config->b_info.device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { - libxl__qmp_initializations(gc, domid, d_config); + ret = libxl__qmp_initializations(gc, domid, d_config); + if (ret) + goto error_out; } }
If any error code is returned when creating a domain, stop the domain creation. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> --- Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> --- tools/libxl/libxl_create.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)