Message ID | 20181129152552.14363-1-liq3ea@163.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw: scsi: dc390: add device unrealize function | expand |
Li Qiang <liq3ea@163.com> writes: > Currently the dc390 device has no unrealize function. This > can cause memory leak when hotplug/unplug device. Also more > serious, it will trigger an assert when rehotplug. [...] When a hot-pluggable device doesn't have an ->unrealize() method, unplug is probably broken. I think we should track down such devices for review. Any takers?
On Thu, 29 Nov 2018 at 19:03, Markus Armbruster <armbru@redhat.com> wrote: > > Li Qiang <liq3ea@163.com> writes: > > > Currently the dc390 device has no unrealize function. This > > can cause memory leak when hotplug/unplug device. Also more > > serious, it will trigger an assert when rehotplug. > [...] > > When a hot-pluggable device doesn't have an ->unrealize() method, unplug > is probably broken. I think we should track down such devices for > review. Any takers? Add an assert somewhere and catch it with the usual "instantiate everything" qtest? More generally, what is causing dc390 to be hotpluggable? I can't see anything obvious in the class init. If we have APIs where the default is "you get this weird thing you weren't thinking about but it's broken because you weren't thinking about it" then we will have a whole class of bugs that we then need to squash device by device[*]. I think it would be better for devices to have to explicitly opt in to implementing things like hotplug -- that way the failure mode is just "this isn't hotpluggable", we can report that helpfully to the user, and if anybody cares they can add the support. [*] Also currently true for migration support. We should require devices to either provide a VMState or explicitly say they have no state needing migration or explicitly say they don't support migration, and then assert that they do one or the other of those, rather than having the default be "we'll allow migration but not migrate any of the device's state". thanks -- PMM
Juan, David, there's a migration bit for you towards the end. Peter Maydell <peter.maydell@linaro.org> writes: > On Thu, 29 Nov 2018 at 19:03, Markus Armbruster <armbru@redhat.com> wrote: >> >> Li Qiang <liq3ea@163.com> writes: >> >> > Currently the dc390 device has no unrealize function. This >> > can cause memory leak when hotplug/unplug device. Also more >> > serious, it will trigger an assert when rehotplug. >> [...] >> >> When a hot-pluggable device doesn't have an ->unrealize() method, unplug >> is probably broken. I think we should track down such devices for >> review. Any takers? > > Add an assert somewhere and catch it with the usual > "instantiate everything" qtest? Not as easy as it sounds. For unplug to work, the device and all its parents in the QOM type hierarchy up to "device" have to undo their ->realize() in their ->unrealize(). Example: "dc390" -> "am53c974" -> "pci-device" -> "device" * device: does not implement DeviceClass::realize(), nothing to do. * pci_device: implements DeviceClass::realize(), ::unrealize() as pci_qdev_realize(), pci_qdev_unrealize(). They call PCIDeviceClass::realize() and PCIDeviceClass::exit(), respectively. * am53c974: implements PCIDeviceClass::realize(), ::unrealize() as esp_pci_scsi_realize() and esp_pci_scsi_uninit(). * dc390: implements PCIDeviceClass::realize() as dc390_scsi_realize(), overriding esp_pci_scsi_realize(). dc390_scsi_realize() calls esp_pci_scsi_realize(). Bug: it fails to overide esp_pci_scsi_uninit(). Three patterns: (1) A class implements whatever realize interface its parent class demands. Example: am53c974. (2) A (generally abstract) class implements ::realize() and ::unrealize(), and its implementation calls methods its children may implement. Example: pci_device. (3) A class overrides its parent's implementation, and calls it. Example: dc390. How can we assert any realize-like thing has a matching unrealize-like thing? Any class that calls realize-/unrealize-like methods its children may implement needs to assert the child implements both or none. The troublemaker is (3), where we may end up with an overridden realize-like method and an non-matching, un-overridden unrealize-like method. Got any smart ideas? > More generally, what is causing dc390 to be hotpluggable? > I can't see anything obvious in the class init. It's a PCI device. These are all hot-pluggable by default. > If we > have APIs where the default is "you get this weird thing > you weren't thinking about but it's broken because > you weren't thinking about it" then we will have a whole > class of bugs that we then need to squash device by device[*]. > I think it would be better for devices to have to explicitly > opt in to implementing things like hotplug -- that way the > failure mode is just "this isn't hotpluggable", we can > report that helpfully to the user, and if anybody cares > they can add the support. I tend to agree, although for PCI devices, not being hot-pluggable is kind of weird, except for the ones that only occur soldered onto mainboards. > [*] Also currently true for migration support. We should > require devices to either provide a VMState or explicitly > say they have no state needing migration or explicitly > say they don't support migration, and then assert that > they do one or the other of those, rather than having the > default be "we'll allow migration but not migrate any > of the device's state". True. Cc: Juan & Dave.
On Fri, 30 Nov 2018 at 07:40, Markus Armbruster <armbru@redhat.com> wrote: > Peter Maydell <peter.maydell@linaro.org> writes: > > Add an assert somewhere and catch it with the usual > > "instantiate everything" qtest? > The troublemaker is (3), where we may end up with an overridden > realize-like method and an non-matching, un-overridden unrealize-like > method. Got any smart ideas? Mmm, I see the difficulty. I suppose in theory we could have a custom linter like clang-tidy and check that anything that sets a realize pointer also sets an unrealize, but that's pie-in-the-sky territory at the moment. Asserting that there is an unrealize would catch at least some cases of "forgot to handle hotplug", though, right? > > More generally, what is causing dc390 to be hotpluggable? > > I can't see anything obvious in the class init. > > It's a PCI device. These are all hot-pluggable by default. Maybe we should change that? Most people writing a PCI device are probably not thinking about hotplug. Though in the dc390 case it would probably not help to have to set a dc->hotpluggable flag, because it would inherit that from am53c974 too. I wonder if the dc390 could be structured in a way that it doesn't subclass a complete non-abstract pci device like that... > > If we > > have APIs where the default is "you get this weird thing > > you weren't thinking about but it's broken because > > you weren't thinking about it" then we will have a whole > > class of bugs that we then need to squash device by device[*]. > > I think it would be better for devices to have to explicitly > > opt in to implementing things like hotplug -- that way the > > failure mode is just "this isn't hotpluggable", we can > > report that helpfully to the user, and if anybody cares > > they can add the support. > > I tend to agree, although for PCI devices, not being hot-pluggable is > kind of weird, except for the ones that only occur soldered onto > mainboards. Real hardware PCI devices are not hot-pluggable by default as far as I'm aware; PCIe may be hot-pluggable but I think need to be designed to support it. This is a random non-hotplug-safe PCIe card: https://i.stack.imgur.com/tXug5.jpg PRSNT2# (pin 17 side B) is connected to Ground on pin 18 (these are the rightmost two connections visible); for hotplug it must be connected to PRSNT1# (which is pin 1 side A), AIUI. But the hardware situation is kind of a tangent from how we try to design our APIs. thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> writes: > On Fri, 30 Nov 2018 at 07:40, Markus Armbruster <armbru@redhat.com> wrote: >> Peter Maydell <peter.maydell@linaro.org> writes: >> > Add an assert somewhere and catch it with the usual >> > "instantiate everything" qtest? > >> The troublemaker is (3), where we may end up with an overridden >> realize-like method and an non-matching, un-overridden unrealize-like >> method. Got any smart ideas? > > Mmm, I see the difficulty. I suppose in theory we could have > a custom linter like clang-tidy and check that anything that > sets a realize pointer also sets an unrealize, but that's > pie-in-the-sky territory at the moment. > > Asserting that there is an unrealize would catch at least > some cases of "forgot to handle hotplug", though, right? Yes. I figure it's worth doing. Another idea: track down all the instances of "parent class implements ::realize(), ::unrealize(), which in turn call methods its children may implement", and rename the children's methods to be called realize(), unrealize(), so that we can find offenders with a git-grep -E '->(un)?realize *='. >> > More generally, what is causing dc390 to be hotpluggable? >> > I can't see anything obvious in the class init. >> >> It's a PCI device. These are all hot-pluggable by default. > > Maybe we should change that? Most people writing a > PCI device are probably not thinking about hotplug. More on that below. > Though in the dc390 case it would probably not help to have > to set a dc->hotpluggable flag, because it would inherit that > from am53c974 too. > > I wonder if the dc390 could be structured in a way that > it doesn't subclass a complete non-abstract pci device like > that... Yes, a common abstract base class would be cleaner. >> > If we >> > have APIs where the default is "you get this weird thing >> > you weren't thinking about but it's broken because >> > you weren't thinking about it" then we will have a whole >> > class of bugs that we then need to squash device by device[*]. >> > I think it would be better for devices to have to explicitly >> > opt in to implementing things like hotplug -- that way the >> > failure mode is just "this isn't hotpluggable", we can >> > report that helpfully to the user, and if anybody cares >> > they can add the support. >> >> I tend to agree, although for PCI devices, not being hot-pluggable is >> kind of weird, except for the ones that only occur soldered onto >> mainboards. > > Real hardware PCI devices are not hot-pluggable by default > as far as I'm aware; PCIe may be hot-pluggable but I think > need to be designed to support it. > This is a random non-hotplug-safe PCIe card: > https://i.stack.imgur.com/tXug5.jpg > PRSNT2# (pin 17 side B) is connected to Ground on pin 18 > (these are the rightmost two connections visible); for > hotplug it must be connected to PRSNT1# (which is pin 1 side A), > AIUI. > > But the hardware situation is kind of a tangent from how > we try to design our APIs. It's best not to deviate from real hardware without a good reason. Making it harder to create the same stupid bugs over and over again is a good reason.
diff --git a/hw/scsi/esp-pci.c b/hw/scsi/esp-pci.c index 419fc668ac..09d1331395 100644 --- a/hw/scsi/esp-pci.c +++ b/hw/scsi/esp-pci.c @@ -463,6 +463,14 @@ static void dc390_write_config(PCIDevice *dev, } } +static void dc390_scsi_uninit(PCIDevice *dev) +{ + DC390State *pci = DC390(dev); + + eeprom93xx_free(&dev->qdev, pci->eeprom); + esp_pci_scsi_uninit(dev); +} + static void dc390_scsi_realize(PCIDevice *dev, Error **errp) { DC390State *pci = DC390(dev); @@ -510,6 +518,7 @@ static void dc390_class_init(ObjectClass *klass, void *data) PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); k->realize = dc390_scsi_realize; + k->exit = dc390_scsi_uninit; k->config_read = dc390_read_config; k->config_write = dc390_write_config; set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
Currently the dc390 device has no unrealize function. This can cause memory leak when hotplug/unplug device. Also more serious, it will trigger an assert when rehotplug. The backtrack is following: qemu-system-x86_64: migration/savevm.c:734: vmstate_register_with_alias_id: Assertion `!se->compat || se->instance_id == 0' failed. Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff7fce280 (LWP 5721)] __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt 0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 1 0x00007ffff5a10801 in __GI_abort () at abort.c:79 2 0x00007ffff5a0039a in __assert_fail_base (fmt=0x7ffff5b877d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555555f67f68 "!se->compat || se->instance_id == 0", file=file@entry=0x555555f67d07 "migration/savevm.c", line=line@entry=734, function=function@entry=0x555555f68e50 <__PRETTY_FUNCTION__.32111> "vmstate_register_with_alias_id") at assert.c:92 3 0x00007ffff5a00412 in __GI___assert_fail (assertion=0x555555f67f68 "!se->compat || se->instance_id == 0", file=0x555555f67d07 "migration/savevm.c", line=734, function=0x555555f68e50 <__PRETTY_FUNCTION__.32111> "vmstate_register_with_alias_id") at assert.c:101 4 0x0000555555bfc1d2 in vmstate_register_with_alias_id (dev=0x5555577fff60, instance_id=-1, vmsd=0x555556533840 <vmstate_eeprom>, opaque=0x555556e82ad0, alias_id=-1, required_for_version=0, errp=0x0) at migration/savevm.c:734 5 0x0000555555b3ab79 in vmstate_register (dev=0x5555577fff60, instance_id=0, vmsd=0x555556533840 <vmstate_eeprom>, opaque=0x555556e82ad0) at /home/test/qemu/include/migration/vmstate.h:1067 6 0x0000555555b3b106 in eeprom93xx_new (dev=0x5555577fff60, nwords=64) at hw/nvram/eeprom93xx.c:323 7 0x0000555555b8b587 in dc390_scsi_realize (dev=0x5555577fff60, errp=0x7fffffffd9f8) at hw/scsi/esp-pci.c:482 8 0x0000555555b4b62e in pci_qdev_realize (qdev=0x5555577fff60, errp=0x7fffffffda70) at hw/pci/pci.c:2038 9 0x0000555555a89cb2 in device_set_realized (obj=0x5555577fff60, value=true, errp=0x7fffffffdc40) at hw/core/qdev.c:826 10 0x0000555555c7fbe8 in property_set_bool (obj=0x5555577fff60, v=0x5555567d9c50, name=0x555555edda12 "realized", opaque=0x5555575a6170, errp=0x7fffffffdc40) at qom/object.c:1991 11 0x0000555555c7de64 in object_property_set (obj=0x5555577fff60, v=0x5555567d9c50, name=0x555555edda12 "realized", errp=0x7fffffffdc40) at qom/object.c:1183 12 0x0000555555c80f2a in object_property_set_qobject (obj=0x5555577fff60, value=0x555556e84be0, name=0x555555edda12 "realized", errp=0x7fffffffdc40) at qom/qom-qobject.c:27 13 0x0000555555c7e149 in object_property_set_bool (obj=0x5555577fff60, value=true, name=0x555555edda12 "realized", errp=0x7fffffffdc40) at qom/object.c:1249 14 0x00005555559d3207 in qdev_device_add (opts=0x5555573aab00, errp=0x7fffffffdcb0) at qdev-monitor.c:642 15 0x00005555559d39f0 in qmp_device_add (qdict=0x555556a15000, ret_data=0x0, errp=0x7fffffffdcf0) ---Type <return> to continue, or q <return> to quit--- at qdev-monitor.c:822 16 0x0000555555a01e2a in hmp_device_add (mon=0x5555568b8000, qdict=0x555556a15000) at hmp.c:2067 Signed-off-by: Li Qiang <liq3ea@163.com> --- hw/scsi/esp-pci.c | 9 +++++++++ 1 file changed, 9 insertions(+)