mbox series

[0/7] scsi: EH rework main part

Message ID 20220502215953.5463-1-hare@suse.de (mailing list archive)
Headers show
Series scsi: EH rework main part | expand

Message

Hannes Reinecke May 2, 2022, 9:59 p.m. UTC
Hi all,

now that the prep is done we can convert the call sequence
of the SCSI EH callbacks to use the respective object
(ie struct Scsi_Host or struct scsi_device) and the Scsi command.
With that we don't have to allocate a 'fake' command for
ioctl reset anymore.

As usual, comments and reviews are welcome.

Hannes Reinecke (7):
  scsi: Use Scsi_Host as argument for eh_host_reset_handler
  scsi: Use Scsi_Host and channel number as argument for
    eh_bus_reset_handler()
  scsi: Use scsi_target as argument for eh_target_reset_handler()
  scsi: Use scsi_device as argument to eh_device_reset_handler()
  scsi: Do not allocate scsi command in scsi_ioctl_reset()
  scsi: remove SUBMITTED_BY_SCSI_RESET_IOCTL
  scsi_error: streamline scsi_eh_bus_device_reset()

 Documentation/scsi/scsi_eh.rst                |  16 +-
 Documentation/scsi/scsi_mid_low_api.rst       |  31 +++-
 drivers/infiniband/ulp/srp/ib_srp.c           |  12 +-
 drivers/message/fusion/mptfc.c                |  25 ++-
 drivers/message/fusion/mptsas.c               |  10 +-
 drivers/message/fusion/mptscsih.c             |  86 ++++-----
 drivers/message/fusion/mptscsih.h             |   8 +-
 drivers/message/fusion/mptspi.c               |   8 +-
 drivers/s390/scsi/zfcp_scsi.c                 |  14 +-
 drivers/scsi/3w-9xxx.c                        |  11 +-
 drivers/scsi/3w-sas.c                         |  11 +-
 drivers/scsi/3w-xxxx.c                        |  11 +-
 drivers/scsi/53c700.c                         |  39 ++--
 drivers/scsi/BusLogic.c                       |  14 +-
 drivers/scsi/NCR5380.c                        |   3 +-
 drivers/scsi/a100u2w.c                        |  11 +-
 drivers/scsi/aacraid/linit.c                  |  35 ++--
 drivers/scsi/advansys.c                       |  26 +--
 drivers/scsi/aha152x.c                        |  10 +-
 drivers/scsi/aha1542.c                        |  30 +--
 drivers/scsi/aic7xxx/aic79xx_osm.c            |  37 ++--
 drivers/scsi/aic7xxx/aic7xxx_osm.c            |  10 +-
 drivers/scsi/arcmsr/arcmsr_hba.c              |   6 +-
 drivers/scsi/arm/acornscsi.c                  |   8 +-
 drivers/scsi/arm/fas216.c                     |  18 +-
 drivers/scsi/arm/fas216.h                     |  17 +-
 drivers/scsi/atari_scsi.c                     |   4 +-
 drivers/scsi/be2iscsi/be_main.c               |  12 +-
 drivers/scsi/bfa/bfad_im.c                    |   8 +-
 drivers/scsi/bnx2fc/bnx2fc.h                  |   4 +-
 drivers/scsi/bnx2fc/bnx2fc_io.c               |  10 +-
 drivers/scsi/csiostor/csio_scsi.c             |   3 +-
 drivers/scsi/cxlflash/main.c                  |  10 +-
 drivers/scsi/dc395x.c                         |  25 ++-
 drivers/scsi/dpt_i2o.c                        |  43 +++--
 drivers/scsi/dpti.h                           |   6 +-
 drivers/scsi/esas2r/esas2r.h                  |   8 +-
 drivers/scsi/esas2r/esas2r_main.c             |  55 +++---
 drivers/scsi/esp_scsi.c                       |   8 +-
 drivers/scsi/fdomain.c                        |   3 +-
 drivers/scsi/fnic/fnic.h                      |   4 +-
 drivers/scsi/fnic/fnic_scsi.c                 |   9 +-
 drivers/scsi/hpsa.c                           |  14 +-
 drivers/scsi/hptiop.c                         |   6 +-
 drivers/scsi/ibmvscsi/ibmvfc.c                |  12 +-
 drivers/scsi/ibmvscsi/ibmvscsi.c              |  23 +--
 drivers/scsi/imm.c                            |   4 +-
 drivers/scsi/initio.c                         |  11 +-
 drivers/scsi/ipr.c                            |  35 ++--
 drivers/scsi/ips.c                            |  22 +--
 drivers/scsi/libfc/fc_fcp.c                   |  16 +-
 drivers/scsi/libiscsi.c                       |  19 +-
 drivers/scsi/libsas/sas_scsi_host.c           |  21 ++-
 drivers/scsi/lpfc/lpfc_scsi.c                 |  23 ++-
 drivers/scsi/mac53c94.c                       |   8 +-
 drivers/scsi/megaraid.c                       |   4 +-
 drivers/scsi/megaraid.h                       |   2 +-
 drivers/scsi/megaraid/megaraid_mbox.c         |  14 +-
 drivers/scsi/megaraid/megaraid_sas.h          |   3 +-
 drivers/scsi/megaraid/megaraid_sas_base.c     |  44 ++---
 drivers/scsi/megaraid/megaraid_sas_fusion.c   |  56 +++---
 drivers/scsi/mesh.c                           |  10 +-
 drivers/scsi/mpi3mr/mpi3mr_os.c               | 123 ++++++------
 drivers/scsi/mpt3sas/mpt3sas_scsih.c          |  72 +++----
 drivers/scsi/mvumi.c                          |   7 +-
 drivers/scsi/myrb.c                           |   3 +-
 drivers/scsi/myrs.c                           |   3 +-
 drivers/scsi/ncr53c8xx.c                      |   4 +-
 drivers/scsi/nsp32.c                          |  12 +-
 drivers/scsi/pcmcia/nsp_cs.c                  |  10 +-
 drivers/scsi/pcmcia/nsp_cs.h                  |   6 +-
 drivers/scsi/pcmcia/qlogic_stub.c             |   4 +-
 drivers/scsi/pcmcia/sym53c500_cs.c            |   8 +-
 drivers/scsi/pmcraid.c                        |  27 ++-
 drivers/scsi/ppa.c                            |   4 +-
 drivers/scsi/qedf/qedf_main.c                 |  13 +-
 drivers/scsi/qedi/qedi_iscsi.c                |   3 +-
 drivers/scsi/qla1280.c                        |  36 ++--
 drivers/scsi/qla2xxx/qla_os.c                 |  83 ++++-----
 drivers/scsi/qla4xxx/ql4_os.c                 |  54 +++---
 drivers/scsi/qlogicfas408.c                   |  10 +-
 drivers/scsi/qlogicfas408.h                   |   2 +-
 drivers/scsi/qlogicpti.c                      |   3 +-
 drivers/scsi/scsi_debug.c                     |  78 +++-----
 drivers/scsi/scsi_error.c                     | 175 +++++++++---------
 drivers/scsi/scsi_lib.c                       |   2 -
 drivers/scsi/smartpqi/smartpqi_init.c         |  11 +-
 drivers/scsi/snic/snic.h                      |   5 +-
 drivers/scsi/snic/snic_scsi.c                 |  41 +---
 drivers/scsi/stex.c                           |   7 +-
 drivers/scsi/storvsc_drv.c                    |   4 +-
 drivers/scsi/sym53c8xx_2/sym_glue.c           |  13 +-
 drivers/scsi/ufs/ufshcd.c                     |  14 +-
 drivers/scsi/virtio_scsi.c                    |  12 +-
 drivers/scsi/vmw_pvscsi.c                     |  20 +-
 drivers/scsi/wd33c93.c                        |   5 +-
 drivers/scsi/wd33c93.h                        |   2 +-
 drivers/scsi/wd719x.c                         |  17 +-
 drivers/scsi/xen-scsifront.c                  |  23 ++-
 drivers/staging/rts5208/rtsx.c                |   6 +-
 .../staging/unisys/visorhba/visorhba_main.c   |  24 +--
 drivers/target/loopback/tcm_loop.c            |  17 +-
 drivers/usb/image/microtek.c                  |   4 +-
 drivers/usb/storage/scsiglue.c                |   8 +-
 drivers/usb/storage/uas.c                     |   3 +-
 include/scsi/libfc.h                          |   4 +-
 include/scsi/libiscsi.h                       |   4 +-
 include/scsi/libsas.h                         |   4 +-
 include/scsi/scsi_cmnd.h                      |   1 -
 include/scsi/scsi_host.h                      |   8 +-
 110 files changed, 989 insertions(+), 1076 deletions(-)

Comments

chenxiang May 5, 2022, 2:27 a.m. UTC | #1
Hi Hannes and other guys,

For SCSI EH, i have a question (sorry, it is not related to this 
patchset): for current flow of SCSI EH, if IOs of one disk is failed

(if there are many disks under the same scsi host), it will block all 
the IOs of total scsi host.

So during SCSI EH, all IOs are blocked even if some disks are normal. 
That's the place product line sometimes complain about

as it blocks IO bussiness of some normal disks because of just one bad 
disk during SCSI EH.

Is it possible to split the SCSI EH into two parts, the process of 
recovering the disk and the process of recovering scsi host, at the 
beginning

it just blocks the IOs of the disk and not need to block all the IOs,  
do some recovery related to the disk (such as abort IO/lun reset), and 
if failed,

then block all the IOs and do some recoverys related to scsi host (such 
as host reset) ?


Best regards,

chenxiang


在 2022/5/3 5:59, Hannes Reinecke 写道:
> Hi all,
>
> now that the prep is done we can convert the call sequence
> of the SCSI EH callbacks to use the respective object
> (ie struct Scsi_Host or struct scsi_device) and the Scsi command.
> With that we don't have to allocate a 'fake' command for
> ioctl reset anymore.
>
> As usual, comments and reviews are welcome.
>
> Hannes Reinecke (7):
>    scsi: Use Scsi_Host as argument for eh_host_reset_handler
>    scsi: Use Scsi_Host and channel number as argument for
>      eh_bus_reset_handler()
>    scsi: Use scsi_target as argument for eh_target_reset_handler()
>    scsi: Use scsi_device as argument to eh_device_reset_handler()
>    scsi: Do not allocate scsi command in scsi_ioctl_reset()
>    scsi: remove SUBMITTED_BY_SCSI_RESET_IOCTL
>    scsi_error: streamline scsi_eh_bus_device_reset()
>
>   Documentation/scsi/scsi_eh.rst                |  16 +-
>   Documentation/scsi/scsi_mid_low_api.rst       |  31 +++-
>   drivers/infiniband/ulp/srp/ib_srp.c           |  12 +-
>   drivers/message/fusion/mptfc.c                |  25 ++-
>   drivers/message/fusion/mptsas.c               |  10 +-
>   drivers/message/fusion/mptscsih.c             |  86 ++++-----
>   drivers/message/fusion/mptscsih.h             |   8 +-
>   drivers/message/fusion/mptspi.c               |   8 +-
>   drivers/s390/scsi/zfcp_scsi.c                 |  14 +-
>   drivers/scsi/3w-9xxx.c                        |  11 +-
>   drivers/scsi/3w-sas.c                         |  11 +-
>   drivers/scsi/3w-xxxx.c                        |  11 +-
>   drivers/scsi/53c700.c                         |  39 ++--
>   drivers/scsi/BusLogic.c                       |  14 +-
>   drivers/scsi/NCR5380.c                        |   3 +-
>   drivers/scsi/a100u2w.c                        |  11 +-
>   drivers/scsi/aacraid/linit.c                  |  35 ++--
>   drivers/scsi/advansys.c                       |  26 +--
>   drivers/scsi/aha152x.c                        |  10 +-
>   drivers/scsi/aha1542.c                        |  30 +--
>   drivers/scsi/aic7xxx/aic79xx_osm.c            |  37 ++--
>   drivers/scsi/aic7xxx/aic7xxx_osm.c            |  10 +-
>   drivers/scsi/arcmsr/arcmsr_hba.c              |   6 +-
>   drivers/scsi/arm/acornscsi.c                  |   8 +-
>   drivers/scsi/arm/fas216.c                     |  18 +-
>   drivers/scsi/arm/fas216.h                     |  17 +-
>   drivers/scsi/atari_scsi.c                     |   4 +-
>   drivers/scsi/be2iscsi/be_main.c               |  12 +-
>   drivers/scsi/bfa/bfad_im.c                    |   8 +-
>   drivers/scsi/bnx2fc/bnx2fc.h                  |   4 +-
>   drivers/scsi/bnx2fc/bnx2fc_io.c               |  10 +-
>   drivers/scsi/csiostor/csio_scsi.c             |   3 +-
>   drivers/scsi/cxlflash/main.c                  |  10 +-
>   drivers/scsi/dc395x.c                         |  25 ++-
>   drivers/scsi/dpt_i2o.c                        |  43 +++--
>   drivers/scsi/dpti.h                           |   6 +-
>   drivers/scsi/esas2r/esas2r.h                  |   8 +-
>   drivers/scsi/esas2r/esas2r_main.c             |  55 +++---
>   drivers/scsi/esp_scsi.c                       |   8 +-
>   drivers/scsi/fdomain.c                        |   3 +-
>   drivers/scsi/fnic/fnic.h                      |   4 +-
>   drivers/scsi/fnic/fnic_scsi.c                 |   9 +-
>   drivers/scsi/hpsa.c                           |  14 +-
>   drivers/scsi/hptiop.c                         |   6 +-
>   drivers/scsi/ibmvscsi/ibmvfc.c                |  12 +-
>   drivers/scsi/ibmvscsi/ibmvscsi.c              |  23 +--
>   drivers/scsi/imm.c                            |   4 +-
>   drivers/scsi/initio.c                         |  11 +-
>   drivers/scsi/ipr.c                            |  35 ++--
>   drivers/scsi/ips.c                            |  22 +--
>   drivers/scsi/libfc/fc_fcp.c                   |  16 +-
>   drivers/scsi/libiscsi.c                       |  19 +-
>   drivers/scsi/libsas/sas_scsi_host.c           |  21 ++-
>   drivers/scsi/lpfc/lpfc_scsi.c                 |  23 ++-
>   drivers/scsi/mac53c94.c                       |   8 +-
>   drivers/scsi/megaraid.c                       |   4 +-
>   drivers/scsi/megaraid.h                       |   2 +-
>   drivers/scsi/megaraid/megaraid_mbox.c         |  14 +-
>   drivers/scsi/megaraid/megaraid_sas.h          |   3 +-
>   drivers/scsi/megaraid/megaraid_sas_base.c     |  44 ++---
>   drivers/scsi/megaraid/megaraid_sas_fusion.c   |  56 +++---
>   drivers/scsi/mesh.c                           |  10 +-
>   drivers/scsi/mpi3mr/mpi3mr_os.c               | 123 ++++++------
>   drivers/scsi/mpt3sas/mpt3sas_scsih.c          |  72 +++----
>   drivers/scsi/mvumi.c                          |   7 +-
>   drivers/scsi/myrb.c                           |   3 +-
>   drivers/scsi/myrs.c                           |   3 +-
>   drivers/scsi/ncr53c8xx.c                      |   4 +-
>   drivers/scsi/nsp32.c                          |  12 +-
>   drivers/scsi/pcmcia/nsp_cs.c                  |  10 +-
>   drivers/scsi/pcmcia/nsp_cs.h                  |   6 +-
>   drivers/scsi/pcmcia/qlogic_stub.c             |   4 +-
>   drivers/scsi/pcmcia/sym53c500_cs.c            |   8 +-
>   drivers/scsi/pmcraid.c                        |  27 ++-
>   drivers/scsi/ppa.c                            |   4 +-
>   drivers/scsi/qedf/qedf_main.c                 |  13 +-
>   drivers/scsi/qedi/qedi_iscsi.c                |   3 +-
>   drivers/scsi/qla1280.c                        |  36 ++--
>   drivers/scsi/qla2xxx/qla_os.c                 |  83 ++++-----
>   drivers/scsi/qla4xxx/ql4_os.c                 |  54 +++---
>   drivers/scsi/qlogicfas408.c                   |  10 +-
>   drivers/scsi/qlogicfas408.h                   |   2 +-
>   drivers/scsi/qlogicpti.c                      |   3 +-
>   drivers/scsi/scsi_debug.c                     |  78 +++-----
>   drivers/scsi/scsi_error.c                     | 175 +++++++++---------
>   drivers/scsi/scsi_lib.c                       |   2 -
>   drivers/scsi/smartpqi/smartpqi_init.c         |  11 +-
>   drivers/scsi/snic/snic.h                      |   5 +-
>   drivers/scsi/snic/snic_scsi.c                 |  41 +---
>   drivers/scsi/stex.c                           |   7 +-
>   drivers/scsi/storvsc_drv.c                    |   4 +-
>   drivers/scsi/sym53c8xx_2/sym_glue.c           |  13 +-
>   drivers/scsi/ufs/ufshcd.c                     |  14 +-
>   drivers/scsi/virtio_scsi.c                    |  12 +-
>   drivers/scsi/vmw_pvscsi.c                     |  20 +-
>   drivers/scsi/wd33c93.c                        |   5 +-
>   drivers/scsi/wd33c93.h                        |   2 +-
>   drivers/scsi/wd719x.c                         |  17 +-
>   drivers/scsi/xen-scsifront.c                  |  23 ++-
>   drivers/staging/rts5208/rtsx.c                |   6 +-
>   .../staging/unisys/visorhba/visorhba_main.c   |  24 +--
>   drivers/target/loopback/tcm_loop.c            |  17 +-
>   drivers/usb/image/microtek.c                  |   4 +-
>   drivers/usb/storage/scsiglue.c                |   8 +-
>   drivers/usb/storage/uas.c                     |   3 +-
>   include/scsi/libfc.h                          |   4 +-
>   include/scsi/libiscsi.h                       |   4 +-
>   include/scsi/libsas.h                         |   4 +-
>   include/scsi/scsi_cmnd.h                      |   1 -
>   include/scsi/scsi_host.h                      |   8 +-
>   110 files changed, 989 insertions(+), 1076 deletions(-)
>
Hannes Reinecke May 5, 2022, 4:19 p.m. UTC | #2
On 5/4/22 19:27, chenxiang (M) wrote:
> Hi Hannes and other guys,
> 
> For SCSI EH, i have a question (sorry, it is not related to this 
> patchset): for current flow of SCSI EH, if IOs of one disk is failed
> 
> (if there are many disks under the same scsi host), it will block all 
> the IOs of total scsi host.
> 
> So during SCSI EH, all IOs are blocked even if some disks are normal. 
> That's the place product line sometimes complain about
> 
> as it blocks IO bussiness of some normal disks because of just one bad 
> disk during SCSI EH.
> 
> Is it possible to split the SCSI EH into two parts, the process of 
> recovering the disk and the process of recovering scsi host, at the 
> beginning
> 
If it were so easy.
The biggest problem we're facing in SCSI EH is that basically _all_ 
instances I've seen where EH got engaged were due to a command timeout.

Which means that we've sent a command to the HBA, and never heard from 
it again. Now, it were easy if it would just be the command which has 
vanished, but the problem is that we don't know what happened.
It might be the command being ln transit, the drive might be 
unresponsive, or the HBA has gone off the rails altogether.
So until we've established where the command got lost, we have to assume 
the worst and _have_ to treat the HBA as unreliable.
So initially we shouldn't isolate the device, and hope the failure is 
restricted to the device.
Instead we have to stop I/O to the HBA, establish communication 
(typically by sending a TMF), and only restart operations once we get a 
response back from the HBA.

This is especially true for old SCSI parallel HBA, where quite some 
state is being kept in the HBA structure itself. So if we were to send 
another command we would loas the state of the failed command, and 
wouldn't be able to figure out the root cause on why the command had failed.

Cheers,

Hannes
chenxiang May 6, 2022, 3:24 a.m. UTC | #3
Hi Hannes,


Thanks your detailed comments.

在 2022/5/6 0:19, Hannes Reinecke 写道:
> On 5/4/22 19:27, chenxiang (M) wrote:
>> Hi Hannes and other guys,
>>
>> For SCSI EH, i have a question (sorry, it is not related to this 
>> patchset): for current flow of SCSI EH, if IOs of one disk is failed
>>
>> (if there are many disks under the same scsi host), it will block all 
>> the IOs of total scsi host.
>>
>> So during SCSI EH, all IOs are blocked even if some disks are normal. 
>> That's the place product line sometimes complain about
>>
>> as it blocks IO bussiness of some normal disks because of just one 
>> bad disk during SCSI EH.
>>
>> Is it possible to split the SCSI EH into two parts, the process of 
>> recovering the disk and the process of recovering scsi host, at the 
>> beginning
>>
> If it were so easy.
> The biggest problem we're facing in SCSI EH is that basically _all_ 
> instances I've seen where EH got engaged were due to a command timeout.

Right, currently it is always a command timeout which makes EH got 
engaged. The worse situation is that some IOs are failed with response 
while other IOs
are timeout. Then when the first IO with response complete, it tries to 
enter EH (just mark host SHOST_RECOVERY), then it begins to block IOs. 
Normally maybe
after almost 30s, all those IOs are completed (timeout or failed),then 
it enters EH. So the blocking time of this situation is waiting for EH 
(max 30s) + EH (serval seconds ~ 10+seconds).

>
> Which means that we've sent a command to the HBA, and never heard from 
> it again. Now, it were easy if it would just be the command which has 
> vanished, but the problem is that we don't know what happened.
> It might be the command being ln transit, the drive might be 
> unresponsive, or the HBA has gone off the rails altogether.
> So until we've established where the command got lost, we have to 
> assume the worst and _have_ to treat the HBA as unreliable.
> So initially we shouldn't isolate the device, and hope the failure is 
> restricted to the device.
> Instead we have to stop I/O to the HBA, establish communication 
> (typically by sending a TMF), and only restart operations once we get 
> a response back from the HBA.

Ok, but what we see is that hard disk is more easily broken than HBA, 
and usually error handling is due to a bad disk though the other disks 
are normal.
Current SCSI EH is based on scsi host (there is a EH thread for every 
scsi host), I think if SCSI EH is based on scsi device (there is a EH 
thread for every scsi device),
when one IO of one disk is failed or timeout, we just mark the disk as 
RECOVERY and trigger EH of the disk. Only when recovery operation of the 
device also is failed, then
trigger EH of scsi host.  Maybe it can alleviate the issue.
Even if there is something wrong with HBA, once IO of a disk is fialed 
or timeout, it will also stop IOs of the disk immediately and 
separately, and i think maybe it doesn't make much difference.
(In current SCSI EH, i think it also the situation that many IOs are 
still sent to broken HBA, if previous IOs are all timeout).


>
> This is especially true for old SCSI parallel HBA, where quite some 
> state is being kept in the HBA structure itself. So if we were to send 
> another command we would loas the state of the failed command, and 
> wouldn't be able to figure out the root cause on why the command had 
> failed.
>
> Cheers,
>
> Hannes