Message ID | 20170510151209.32767-1-pasic@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 10 May 2017 17:12:09 +0200 Halil Pasic <pasic@linux.vnet.ibm.com> wrote: > Prior to the virtio-ccw-2.7 machine (and commit 2a79eb1a), our virtio > devices residing under the virtual-css bus do not have qdev_path based > migration stream identifiers (because their qdev_path is NULL). The ids > are instead generated when the device is registered as a composition of > the so called idstr, which takes the vmsd name as its value, and an > instance_id, which is which is calculated as a maximal instance_id > registered with the same idstr plus one, or zero (if none was registered > previously). > > That means, under certain circumstances, one device might try, and even > succeed, to load the state of a different device. This can lead to > trouble. > > Let us fail the migration if the above problem is detected during load. > > How to reproduce the problem: > 1) start qemu-system-s390x making sure you have the following devices > defined on your command line: > -device virtio-rng-ccw,id=rng1,devno=fe.0.0001 > -device virtio-rng-ccw,id=rng2,devno=fe.0.0002 > 2) detach the devices and reattach in reverse order using the monitor: > (qemu) device_del rng1 > (qemu) device_del rng2 > (qemu) device_add virtio-rng-ccw,id=rng2,devno=fe.0.0002 > (qemu) device_add virtio-rng-ccw,id=rng1,devno=fe.0.0001 > 3) save the state of the vm into a temporary file and quit QEMU: > (qemu) migrate "exec:gzip -c > /tmp/tmp_vmstate.gz" > (qemu) q > 4) use your command line from step 1 with > -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz" > appended to reproduce the problem (while trying to to load the saved vm) > > CC: qemu-stable@nongnu.org > Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> > Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> > --- > > Hi! > > I also wonder what is the best way to do this with vmstate. I know there > are VMSTATE_*_EQUAL macros for integers, and I have partially modelled my > patch after that, but there we only get a != b as error message, which is > satisfactory for detecting bugs which are supposed to get fixed. In this > particular case having a verbose error message should be really helpful > and thus important. > > I'm asking because I'm currently working on a vmstate conversion of the > s390x css and virtio-ccw stuff (find my latest patch set here > https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01364.html). > > Regards, > Halil > --- > hw/s390x/css.c | 14 ++++++++++++++ > hw/s390x/virtio-ccw.c | 6 +++++- > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/hw/s390x/css.c b/hw/s390x/css.c > index 15c4f4b..6cff3a3 100644 > --- a/hw/s390x/css.c > +++ b/hw/s390x/css.c > @@ -14,6 +14,7 @@ > #include "qapi/visitor.h" > #include "hw/qdev.h" > #include "qemu/bitops.h" > +#include "qemu/error-report.h" > #include "exec/address-spaces.h" > #include "cpu.h" > #include "hw/s390x/ioinst.h" > @@ -1721,13 +1722,26 @@ void subch_device_save(SubchDev *s, QEMUFile *f) > int subch_device_load(SubchDev *s, QEMUFile *f) > { > SubchDev *old_s; > + Error *err = NULL; > uint16_t old_schid = s->schid; > + uint16_t old_devno = s->devno; > int i; > > s->cssid = qemu_get_byte(f); > s->ssid = qemu_get_byte(f); > s->schid = qemu_get_be16(f); > s->devno = qemu_get_be16(f); > + if (s->devno != old_devno) { > + /* Only possible if machine < 2.7 (no css_dev_path) */ > + > + error_setg(&err, "%x != %x", old_devno, s->devno); > + error_append_hint(&err, "Devno mismatch, tried to load wrong section!" > + " Likely reason: some sequences of plug and unplug" > + " can break migration for machine versions prior" s/prior/prior to/ > + " 2.7 (known design flaw).\n"); > + error_report_err(err); > + return -EINVAL; > + } > /* Re-assign subch. */ > if (old_schid != s->schid) { > old_s = channel_subsys.css[s->cssid]->sch_set[s->ssid]->sch[old_schid]; > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > index e7167e3..4f7efa2 100644 > --- a/hw/s390x/virtio-ccw.c > +++ b/hw/s390x/virtio-ccw.c > @@ -1274,9 +1274,13 @@ static int virtio_ccw_load_config(DeviceState *d, QEMUFile *f) > SubchDev *s = ccw_dev->sch; > VirtIODevice *vdev = virtio_ccw_get_vdev(s); > int len; > + int ret; > > s->driver_data = dev; > - subch_device_load(s, f); > + ret = subch_device_load(s, f); > + if (ret) { > + return ret; > + } > /* Re-fill subch_id after loading the subchannel states.*/ > if (ck->refill_ids) { > ck->refill_ids(ccw_dev); Patch looks sane to me, but I second the question about how to handle this when using vmstates.
On 05/11/2017 01:02 PM, Cornelia Huck wrote: > On Wed, 10 May 2017 17:12:09 +0200 > Halil Pasic <pasic@linux.vnet.ibm.com> wrote: > >> Prior to the virtio-ccw-2.7 machine (and commit 2a79eb1a), our virtio >> devices residing under the virtual-css bus do not have qdev_path based >> migration stream identifiers (because their qdev_path is NULL). The ids >> are instead generated when the device is registered as a composition of >> the so called idstr, which takes the vmsd name as its value, and an >> instance_id, which is which is calculated as a maximal instance_id >> registered with the same idstr plus one, or zero (if none was registered >> previously). >> [..] >> diff --git a/hw/s390x/css.c b/hw/s390x/css.c >> index 15c4f4b..6cff3a3 100644 >> --- a/hw/s390x/css.c >> +++ b/hw/s390x/css.c [..] >> + /* Only possible if machine < 2.7 (no css_dev_path) */ >> + >> + error_setg(&err, "%x != %x", old_devno, s->devno); >> + error_append_hint(&err, "Devno mismatch, tried to load wrong section!" >> + " Likely reason: some sequences of plug and unplug" >> + " can break migration for machine versions prior" > > s/prior/prior to/ Fixed. Thanks! Will wait a couple of days, maybe somebody has something else to say, and then re-spin. Regards, Halil > >> + " 2.7 (known design flaw).\n"); >> + error_report_err(err); >> + return -EINVAL; >> + } >> /* Re-assign subch. */ >> if (old_schid != s->schid) { >> old_s = channel_subsys.css[s->cssid]->sch_set[s->ssid]->sch[old_schid]; >> diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c >> index e7167e3..4f7efa2 100644 >> --- a/hw/s390x/virtio-ccw.c >> +++ b/hw/s390x/virtio-ccw.c >> @@ -1274,9 +1274,13 @@ static int virtio_ccw_load_config(DeviceState *d, QEMUFile *f) >> SubchDev *s = ccw_dev->sch; >> VirtIODevice *vdev = virtio_ccw_get_vdev(s); >> int len; >> + int ret; >> >> s->driver_data = dev; >> - subch_device_load(s, f); >> + ret = subch_device_load(s, f); >> + if (ret) { >> + return ret; >> + } >> /* Re-fill subch_id after loading the subchannel states.*/ >> if (ck->refill_ids) { >> ck->refill_ids(ccw_dev); > > Patch looks sane to me, but I second the question about how to handle > this when using vmstates. >
* Halil Pasic (pasic@linux.vnet.ibm.com) wrote: > Prior to the virtio-ccw-2.7 machine (and commit 2a79eb1a), our virtio > devices residing under the virtual-css bus do not have qdev_path based > migration stream identifiers (because their qdev_path is NULL). The ids > are instead generated when the device is registered as a composition of > the so called idstr, which takes the vmsd name as its value, and an > instance_id, which is which is calculated as a maximal instance_id > registered with the same idstr plus one, or zero (if none was registered > previously). > > That means, under certain circumstances, one device might try, and even > succeed, to load the state of a different device. This can lead to > trouble. > > Let us fail the migration if the above problem is detected during load. > > How to reproduce the problem: > 1) start qemu-system-s390x making sure you have the following devices > defined on your command line: > -device virtio-rng-ccw,id=rng1,devno=fe.0.0001 > -device virtio-rng-ccw,id=rng2,devno=fe.0.0002 > 2) detach the devices and reattach in reverse order using the monitor: > (qemu) device_del rng1 > (qemu) device_del rng2 > (qemu) device_add virtio-rng-ccw,id=rng2,devno=fe.0.0002 > (qemu) device_add virtio-rng-ccw,id=rng1,devno=fe.0.0001 > 3) save the state of the vm into a temporary file and quit QEMU: > (qemu) migrate "exec:gzip -c > /tmp/tmp_vmstate.gz" > (qemu) q > 4) use your command line from step 1 with > -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz" > appended to reproduce the problem (while trying to to load the saved vm) > > CC: qemu-stable@nongnu.org > Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> > Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> > --- > > Hi! > > I also wonder what is the best way to do this with vmstate. I know there > are VMSTATE_*_EQUAL macros for integers, and I have partially modelled my > patch after that, but there we only get a != b as error message, which is > satisfactory for detecting bugs which are supposed to get fixed. In this > particular case having a verbose error message should be really helpful > and thus important. > > I'm asking because I'm currently working on a vmstate conversion of the > s390x css and virtio-ccw stuff (find my latest patch set here > https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01364.html). I think the way to solve that problem will probably be adding a 'hint' parameter to the VMSTATE_*_EQUAL macros that is a constant string, stuff a pointer to that into a possibly new field in VMStateField, and then make the get_*_equal functions include that string in the message like you do. There's a lot of copy and paste but it's not too bad now that Jianjun's patch from a few months ago passed the VMStateField* to the .get/.put. Dave > Regards, > Halil > --- > hw/s390x/css.c | 14 ++++++++++++++ > hw/s390x/virtio-ccw.c | 6 +++++- > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/hw/s390x/css.c b/hw/s390x/css.c > index 15c4f4b..6cff3a3 100644 > --- a/hw/s390x/css.c > +++ b/hw/s390x/css.c > @@ -14,6 +14,7 @@ > #include "qapi/visitor.h" > #include "hw/qdev.h" > #include "qemu/bitops.h" > +#include "qemu/error-report.h" > #include "exec/address-spaces.h" > #include "cpu.h" > #include "hw/s390x/ioinst.h" > @@ -1721,13 +1722,26 @@ void subch_device_save(SubchDev *s, QEMUFile *f) > int subch_device_load(SubchDev *s, QEMUFile *f) > { > SubchDev *old_s; > + Error *err = NULL; > uint16_t old_schid = s->schid; > + uint16_t old_devno = s->devno; > int i; > > s->cssid = qemu_get_byte(f); > s->ssid = qemu_get_byte(f); > s->schid = qemu_get_be16(f); > s->devno = qemu_get_be16(f); > + if (s->devno != old_devno) { > + /* Only possible if machine < 2.7 (no css_dev_path) */ > + > + error_setg(&err, "%x != %x", old_devno, s->devno); > + error_append_hint(&err, "Devno mismatch, tried to load wrong section!" > + " Likely reason: some sequences of plug and unplug" > + " can break migration for machine versions prior" > + " 2.7 (known design flaw).\n"); > + error_report_err(err); > + return -EINVAL; > + } > /* Re-assign subch. */ > if (old_schid != s->schid) { > old_s = channel_subsys.css[s->cssid]->sch_set[s->ssid]->sch[old_schid]; > diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c > index e7167e3..4f7efa2 100644 > --- a/hw/s390x/virtio-ccw.c > +++ b/hw/s390x/virtio-ccw.c > @@ -1274,9 +1274,13 @@ static int virtio_ccw_load_config(DeviceState *d, QEMUFile *f) > SubchDev *s = ccw_dev->sch; > VirtIODevice *vdev = virtio_ccw_get_vdev(s); > int len; > + int ret; > > s->driver_data = dev; > - subch_device_load(s, f); > + ret = subch_device_load(s, f); > + if (ret) { > + return ret; > + } > /* Re-fill subch_id after loading the subchannel states.*/ > if (ck->refill_ids) { > ck->refill_ids(ccw_dev); > -- > 2.10.2 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 05/18/2017 07:47 PM, Dr. David Alan Gilbert wrote: >> Hi! >> >> I also wonder what is the best way to do this with vmstate. I know there >> are VMSTATE_*_EQUAL macros for integers, and I have partially modelled my >> patch after that, but there we only get a != b as error message, which is >> satisfactory for detecting bugs which are supposed to get fixed. In this >> particular case having a verbose error message should be really helpful >> and thus important. >> >> I'm asking because I'm currently working on a vmstate conversion of the >> s390x css and virtio-ccw stuff (find my latest patch set here >> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01364.html). > I think the way to solve that problem will probably be adding a 'hint' > parameter to the VMSTATE_*_EQUAL macros that is a constant string, > stuff a pointer to that into a possibly new field in VMStateField, > and then make the get_*_equal functions include that string in the > message like you do. There's a lot of copy and paste but it's > not too bad now that Jianjun's patch from a few months ago passed > the VMStateField* to the .get/.put. > > Dave > > Thanks Dave! I read your reply like you are seeing this verbose message if VMSTATE_*_EQUAL feature something worth of inclusion. Am I right? If yes, I'm willing to implement it. Halil
* Halil Pasic (pasic@linux.vnet.ibm.com) wrote: > > > On 05/18/2017 07:47 PM, Dr. David Alan Gilbert wrote: > >> Hi! > >> > >> I also wonder what is the best way to do this with vmstate. I know there > >> are VMSTATE_*_EQUAL macros for integers, and I have partially modelled my > >> patch after that, but there we only get a != b as error message, which is > >> satisfactory for detecting bugs which are supposed to get fixed. In this > >> particular case having a verbose error message should be really helpful > >> and thus important. > >> > >> I'm asking because I'm currently working on a vmstate conversion of the > >> s390x css and virtio-ccw stuff (find my latest patch set here > >> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01364.html). > > I think the way to solve that problem will probably be adding a 'hint' > > parameter to the VMSTATE_*_EQUAL macros that is a constant string, > > stuff a pointer to that into a possibly new field in VMStateField, > > and then make the get_*_equal functions include that string in the > > message like you do. There's a lot of copy and paste but it's > > not too bad now that Jianjun's patch from a few months ago passed > > the VMStateField* to the .get/.put. > > > > Dave > > > > > > Thanks Dave! I read your reply like you are seeing this verbose > message if VMSTATE_*_EQUAL feature something worth of inclusion. > Am I right? Yes. > If yes, I'm willing to implement it. Please do! Dave > Halil > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/hw/s390x/css.c b/hw/s390x/css.c index 15c4f4b..6cff3a3 100644 --- a/hw/s390x/css.c +++ b/hw/s390x/css.c @@ -14,6 +14,7 @@ #include "qapi/visitor.h" #include "hw/qdev.h" #include "qemu/bitops.h" +#include "qemu/error-report.h" #include "exec/address-spaces.h" #include "cpu.h" #include "hw/s390x/ioinst.h" @@ -1721,13 +1722,26 @@ void subch_device_save(SubchDev *s, QEMUFile *f) int subch_device_load(SubchDev *s, QEMUFile *f) { SubchDev *old_s; + Error *err = NULL; uint16_t old_schid = s->schid; + uint16_t old_devno = s->devno; int i; s->cssid = qemu_get_byte(f); s->ssid = qemu_get_byte(f); s->schid = qemu_get_be16(f); s->devno = qemu_get_be16(f); + if (s->devno != old_devno) { + /* Only possible if machine < 2.7 (no css_dev_path) */ + + error_setg(&err, "%x != %x", old_devno, s->devno); + error_append_hint(&err, "Devno mismatch, tried to load wrong section!" + " Likely reason: some sequences of plug and unplug" + " can break migration for machine versions prior" + " 2.7 (known design flaw).\n"); + error_report_err(err); + return -EINVAL; + } /* Re-assign subch. */ if (old_schid != s->schid) { old_s = channel_subsys.css[s->cssid]->sch_set[s->ssid]->sch[old_schid]; diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c index e7167e3..4f7efa2 100644 --- a/hw/s390x/virtio-ccw.c +++ b/hw/s390x/virtio-ccw.c @@ -1274,9 +1274,13 @@ static int virtio_ccw_load_config(DeviceState *d, QEMUFile *f) SubchDev *s = ccw_dev->sch; VirtIODevice *vdev = virtio_ccw_get_vdev(s); int len; + int ret; s->driver_data = dev; - subch_device_load(s, f); + ret = subch_device_load(s, f); + if (ret) { + return ret; + } /* Re-fill subch_id after loading the subchannel states.*/ if (ck->refill_ids) { ck->refill_ids(ccw_dev);