mbox series

[v2,0/3] s390x/pci: fix ISM reset

Message ID 20240118185151.265329-1-mjrosato@linux.ibm.com (mailing list archive)
Headers show
Series s390x/pci: fix ISM reset | expand

Message

Matthew Rosato Jan. 18, 2024, 6:51 p.m. UTC
Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
on s390x would enter an error state after reboot.  This was previously fixed
by 03451953c79e, using device reset callbacks, however the change in
ef1535901a0 effectively triggers a cold reset of the pci bus before the
device reset callbacks are triggered.

To resolve this, this series proposes to remove the use of the reset callback
for ISM cleanup and instead trigger ISM reset from subsystem_reset before 
triggering bus resets.  This has to happen before the bus resets because the
reset of s390-pcihost will trigger reset of the PCI bus followed by the
s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
unmap that ISM gets upset about.
 
  /s390-pcihost (s390-pcihost)
    /pci.0 (PCI)
    /s390-pcibus.0 (s390-pcibus)
    
While fixing this, it was also noted that kernel warnings could be seen that
indicate a guest ISC reference count error.  That's because in some reset
cases we were not bothering to disable AIF, but would again re-enable it after
the reset (causing the reference count to grow erroneously).  This was a base
issue that went unnoticed because the kernel previously did not detect and
issue a warning for this scenario.


Changes for v2:
- Fold a typo fix from patch 2 into patch 1 where it belongs
- Add block comment re: timing of ISM reset
- Add review tags


Matthew Rosato (3):
  s390x/pci: avoid double enable/disable of aif
  s390x/pci: refresh fh before disabling aif
  s390x/pci: drive ISM reset from subsystem reset

 hw/s390x/s390-pci-bus.c         | 26 ++++++++++++++++---------
 hw/s390x/s390-pci-kvm.c         | 34 +++++++++++++++++++++++++++++++--
 hw/s390x/s390-virtio-ccw.c      |  8 ++++++++
 include/hw/s390x/s390-pci-bus.h |  2 ++
 4 files changed, 59 insertions(+), 11 deletions(-)

Comments

Cédric Le Goater Jan. 18, 2024, 7:43 p.m. UTC | #1
On 1/18/24 19:51, Matthew Rosato wrote:
> Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
> on s390x would enter an error state after reboot.  This was previously fixed
> by 03451953c79e, using device reset callbacks, however the change in
> ef1535901a0 effectively triggers a cold reset of the pci bus before the
> device reset callbacks are triggered.
> 
> To resolve this, this series proposes to remove the use of the reset callback
> for ISM cleanup and instead trigger ISM reset from subsystem_reset before
> triggering bus resets.  This has to happen before the bus resets because the
> reset of s390-pcihost will trigger reset of the PCI bus followed by the
> s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
> unmap that ISM gets upset about.
>   
>    /s390-pcihost (s390-pcihost)
>      /pci.0 (PCI)
>      /s390-pcibus.0 (s390-pcibus)
>      
> While fixing this, it was also noted that kernel warnings could be seen that
> indicate a guest ISC reference count error.  That's because in some reset
> cases we were not bothering to disable AIF, but would again re-enable it after
> the reset (causing the reference count to grow erroneously).  This was a base
> issue that went unnoticed because the kernel previously did not detect and
> issue a warning for this scenario.
> 
> 
> Changes for v2:
> - Fold a typo fix from patch 2 into patch 1 where it belongs
> - Add block comment re: timing of ISM reset
> - Add review tags


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.
Michael Tokarev Jan. 22, 2024, 10:18 a.m. UTC | #2
18.01.2024 21:51, Matthew Rosato :
> Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
> on s390x would enter an error state after reboot.  This was previously fixed
> by 03451953c79e, using device reset callbacks, however the change in
> ef1535901a0 effectively triggers a cold reset of the pci bus before the
> device reset callbacks are triggered.
> 
> To resolve this, this series proposes to remove the use of the reset callback
> for ISM cleanup and instead trigger ISM reset from subsystem_reset before
> triggering bus resets.  This has to happen before the bus resets because the
> reset of s390-pcihost will trigger reset of the PCI bus followed by the
> s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
> unmap that ISM gets upset about.
>   
>    /s390-pcihost (s390-pcihost)
>      /pci.0 (PCI)
>      /s390-pcibus.0 (s390-pcibus)
>      
> While fixing this, it was also noted that kernel warnings could be seen that
> indicate a guest ISC reference count error.  That's because in some reset
> cases we were not bothering to disable AIF, but would again re-enable it after
> the reset (causing the reference count to grow erroneously).  This was a base
> issue that went unnoticed because the kernel previously did not detect and
> issue a warning for this scenario.

> Matthew Rosato (3):
>    s390x/pci: avoid double enable/disable of aif
>    s390x/pci: refresh fh before disabling aif
>    s390x/pci: drive ISM reset from subsystem reset

Is it this a material for -stable, or there's no need to bother?
(changes 1 and 2 applies to 7.2 (while 2 fixes later change),
all 3 applies to 8.1 (while 3 fixes later change), and all 3 can be
picked up for 8.2, I guess).

Thanks,

/mjt
Michael Tokarev Jan. 22, 2024, 10:31 a.m. UTC | #3
22.01.2024 13:18, Michael Tokarev :
..
> Is it this a material for -stable, or there's no need to bother?

Actually it's been Cc'd to qemu-stable@ already, I haven't noticed.
Still there's a question which branches should get which patches.

> (changes 1 and 2 applies to 7.2 (while 2 fixes later change),
> all 3 applies to 8.1 (while 3 fixes later change), and all 3 can be
> picked up for 8.2, I guess).

07b2c8e034d80f s390x/pci: avoid double enable/disable of aif
  Fixes: v7.1.0-416-gd0bc7091c2 s390x/pci: enable adapter event notification for interpreted devices

30e35258e25c75  s390x/pci: refresh fh before disabling aif
  Fixes: v7.2.0-51-g03451953c7 s390x/pci: reset ISM passthrough devices on shutdown and system reset

68c691ca99a253 s390x/pci: drive ISM reset from subsystem reset
  Fixes: v8.1.0-654-gef1535901a s390x: do a subsystem reset before the unprotect on reboot
  Fixes: v7.2.0-51-g03451953c7 s390x/pci: reset ISM passthrough devices on shutdown and system reset

So all 3 are okay for 8.2.

What about 8.1 and 7.2 which are the current still-maintained stable branches?
(I think this 8.1 release will be the last in series).

Thanks,

/mjt
Thomas Huth Jan. 22, 2024, 10:49 a.m. UTC | #4
On 22/01/2024 11.31, Michael Tokarev wrote:
> 22.01.2024 13:18, Michael Tokarev :
> ..
>> Is it this a material for -stable, or there's no need to bother?
> 
> Actually it's been Cc'd to qemu-stable@ already, I haven't noticed.
> Still there's a question which branches should get which patches.
...
> So all 3 are okay for 8.2.
> 
> What about 8.1 and 7.2 which are the current still-maintained stable branches?
> (I think this 8.1 release will be the last in series).

IIUC the main issue that this series fixes is the bug that has been 
uncovered by commit ef1535901a07f2e49fa25c8bcee7f0b73801d824 ("s390x: do a 
subsystem reset before the unprotect on reboot") - and that one is in 8.2 
only. So I think it should be OK to just backport this to 8.2 and skip 8.1 
and 7.2.

  Thomas
Matthew Rosato Jan. 22, 2024, 3:06 p.m. UTC | #5
On 1/22/24 5:49 AM, Thomas Huth wrote:
> On 22/01/2024 11.31, Michael Tokarev wrote:
>> 22.01.2024 13:18, Michael Tokarev :
>> ..
>>> Is it this a material for -stable, or there's no need to bother?
>>
>> Actually it's been Cc'd to qemu-stable@ already, I haven't noticed.
>> Still there's a question which branches should get which patches.
> ...
>> So all 3 are okay for 8.2.
>>
>> What about 8.1 and 7.2 which are the current still-maintained stable branches?
>> (I think this 8.1 release will be the last in series).
> 
> IIUC the main issue that this series fixes is the bug that has been uncovered by commit ef1535901a07f2e49fa25c8bcee7f0b73801d824 ("s390x: do a subsystem reset before the unprotect on reboot") - and that one is in 8.2 only. So I think it should be OK to just backport this to 8.2 and skip 8.1 and 7.2.
> 

Agreed.

Thanks,
Matt