Message ID | 20240116223157.73752-1-mjrosato@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | s390x/pci: fix ISM reset | expand |
17.01.2024 01:31, Matthew Rosato: > Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices > on s390x would enter an error state after reboot. This was previously fixed > by 03451953c79e, using device reset callbacks, however the change in > ef1535901a0 effectively triggers a cold reset of the pci bus before the > device reset callbacks are triggered. > > To resolve this, this series proposes to remove the use of the reset callback > for ISM cleanup and instead trigger ISM reset from subsystem_reset before > triggering bus resets. This has to happen before the bus resets because the > reset of s390-pcihost will trigger reset of the PCI bus followed by the > s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide > unmap that ISM gets upset about. > > /s390-pcihost (s390-pcihost) > /pci.0 (PCI) > /s390-pcibus.0 (s390-pcibus) > > While fixing this, it was also noted that kernel warnings could be seen that > indicate a guest ISC reference count error. That's because in some reset > cases we were not bothering to disable AIF, but would again re-enable it after > the reset (causing the reference count to grow erroneously). This was a base > issue that went unnoticed because the kernel previously did not detect and > issue a warning for this scenario. Is it a -stable material, or not worth picking up for stable? Thanks, /mjt
On 18/01/2024 07.03, Michael Tokarev wrote: > 17.01.2024 01:31, Matthew Rosato: >> Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices >> on s390x would enter an error state after reboot. This was previously fixed >> by 03451953c79e, using device reset callbacks, however the change in >> ef1535901a0 effectively triggers a cold reset of the pci bus before the >> device reset callbacks are triggered. >> >> To resolve this, this series proposes to remove the use of the reset callback >> for ISM cleanup and instead trigger ISM reset from subsystem_reset before >> triggering bus resets. This has to happen before the bus resets because the >> reset of s390-pcihost will trigger reset of the PCI bus followed by the >> s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide >> unmap that ISM gets upset about. >> /s390-pcihost (s390-pcihost) >> /pci.0 (PCI) >> /s390-pcibus.0 (s390-pcibus) >> While fixing this, it was also noted that kernel warnings could be seen that >> indicate a guest ISC reference count error. That's because in some reset >> cases we were not bothering to disable AIF, but would again re-enable it >> after >> the reset (causing the reference count to grow erroneously). This was a base >> issue that went unnoticed because the kernel previously did not detect and >> issue a warning for this scenario. > > Is it a -stable material, or not worth picking up for stable? It's definitely stable material, but IIUC there will be a v2 with some minor fixes. Thomas
18.01.2024 10:19, Thomas Huth: >> Is it a -stable material, or not worth picking up for stable? > > It's definitely stable material, but IIUC there will be a v2 with some minor fixes. Yeah, I figured there will be v2. Just to remind, - please add Cc: qemu-stable@ when appropriate (or mark it any other way, or just forward it qemu-stable@, whatever, - just so it wont get lost). There's no need to do that for this patchset, as I already noticed this one :) Thank you for the comments! /mjt