[RFC,v2,00/18] scsi: scsi_error: Introduce new error handle mechanism

Message ID	20230901094127.2010873-1-haowenchao2@huawei.com (mailing list archive)
Headers	show Return-Path: <linux-scsi-owner@vger.kernel.org> From: Wenchao Hao <haowenchao2@huawei.com> To: "James E . J . Bottomley" <jejb@linux.ibm.com>, "Martin K . Petersen" <martin.petersen@oracle.com>, <linux-scsi@vger.kernel.org> CC: Hannes Reinecke <hare@suse.de>, <linux-kernel@vger.kernel.org>, <louhongxiang@huawei.com>, <lixiaokeng@huawei.com>, Wenchao Hao <haowenchao2@huawei.com> Subject: [RFC PATCH v2 00/18] scsi: scsi_error: Introduce new error handle mechanism Date: Fri, 1 Sep 2023 17:41:08 +0800 Message-ID: <20230901094127.2010873-1-haowenchao2@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII Precedence: bulk
Series	scsi: scsi_error: Introduce new error handle mechanism \| expand [RFC,v2,00/18] scsi: scsi_error: Introduce new error handle mechanism [RFC,v2,01/19] scsi: scsi_error: Define framework for LUN/target based error handle [RFC,v2,02/19] scsi: scsi_error: Move complete variable eh_action from shost to sdevice [RFC,v2,03/19] scsi: scsi_error: Check if to do reset in scsi_try_xxx_reset [RFC,v2,04/19] scsi: scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT [RFC,v2,05/19] scsi: scsi_error: Add helper scsi_eh_sdev_reset to do lun reset [RFC,v2,06/19] scsi: scsi_error: Add flags to mark error handle steps has done [RFC,v2,07/19] scsi: scsi_error: Add helper to handle scsi device's error command list [RFC,v2,08/19] scsi: scsi_error: Add a general LUN based error handler [RFC,v2,09/19] scsi: core: increase/decrease target_busy without check can_queue [RFC,v2,10/19] scsi: scsi_error: Add helper to handle scsi target's error command list [RFC,v2,11/19] scsi: scsi_error: Add a general target based error handler [RFC,v2,12/19] scsi: scsi_debug: Add param to control LUN bassed error handler [RFC,v2,13/19] scsi: scsi_debug: Add param to control target based error handle [RFC,v2,14/19] scsi: mpt3sas: Add param to control LUN based error handle [RFC,v2,15/19] scsi: mpt3sas: Add param to control target based error handle [RFC,v2,16/19] scsi: smartpqi: Add param to control LUN based error handle [RFC,v2,17/19] scsi: megaraid_sas: Add param to control target based error handle [RFC,v2,18/19] scsi: virtio_scsi: Add param to control LUN based error handle [RFC,v2,19/19] scsi: iscsi_tcp: Add param to control LUN based error handle

Wenchao Hao Sept. 1, 2023, 9:41 a.m. UTC

It's unbearable for systems with large scale scsi devices share HBAs to
block all devices' IOs when handle error commands, we need a new error
handle mechanism to address this issue.

I consulted about this issue a year ago, the discuss link can be found in
refenence. Hannes replied about why we have to block the SCSI host
then perform error recovery kindly. I think it's unnecessary to block
SCSI host for all drivers and can try a small level recovery(LUN based for
example) first to avoid block the SCSI host.

The new error handle mechanism introduced in this patchset has been
developed and tested with out self developed hardware since one year
ago, now we want this mechanism can be used by more drivers.

Drivers can decide if using the new error handle mechanism and how to
handle error commands when scsi_device are scanned,the new mechanism
makes SCSI error handle more flexible.

SCSI error recovery strategy after blocking host's IO is mainly
following steps:

- LUN reset
- Target reset
- Bus reset
- Host reset

Some drivers did not implement callbacks for host reset, it's unnecessary
to block host's IO for these drivers. For example, smartpqi only registered
device reset, if device reset failed, it's meaningless to fallback to target
reset, bus reset or host reset any more, because these steps would also
failed.

Here are some drivers we concerned:(there are too many kinds of drivers
to figure out, so here I just list some drivers I am familiar with)

+-------------+--------------+--------------+-----------+------------+
|  drivers    | device_reset | target_reset | bus_reset | host_reset |
+-------------+--------------+--------------+-----------+------------+
| mpt3sas     |     Y        |     Y        |    N      |    Y       |
+-------------+--------------+--------------+-----------+------------+
| smartpqi    |     Y        |     N        |    N      |    N       |
+-------------+--------------+--------------+-----------+------------+
| megaraidsas |     N        |     Y        |    N      |    Y       |
+-------------+--------------+--------------+-----------+------------+
| virtioscsi  |     Y        |     N        |    N      |    N       |
+-------------+--------------+--------------+-----------+------------+
| iscsi_tcp   |     Y        |     Y        |    N      |    N       |
+-------------+--------------+--------------+-----------+------------+
| hisisas     |     Y        |     Y        |    N      |    N       |
+-------------+--------------+--------------+-----------+------------+

For LUN based error handle, when scsi command is classified as error,
we would block the scsi device's IO and try to recover this scsi
device, if still can not recover all error commands, it might
fallback to target or host level recovery.

It's same for target based error handle, but target based error handle
would block the scsi target's IO then try to recover the error commands
of this target.

The first patch defines basic framework to support LUN/target based error
handle mechanism, three key operations are abstracted which are:
 - add error command
 - wake up error handle
 - block IOs when error command is added and recoverying.

Drivers can implement these three function callbacks and setup to SCSI
middle level; I also add a general LUN/target based error handle strategy
which can be called directly from drivers to implement LUN/tartget based
error handle.

The changes of SCSI middle level's error handle are tested with scsi_debug
which support single LUN error injection, the scsi_debug patches can be
found in reference, following scenarios are tested.

Scenario1: LUN based error handle is enabled:
+-----------+---------+-------------------------------------------------------+
| lun reset | TUR     | Desired result                                        |
+ --------- + ------- + ------------------------------------------------------+
| success   | success | retry or finish with  EIO(may offline disk)           |
+ --------- + ------- + ------------------------------------------------------+
| success   | fail    | fallback to host  recovery, retry or finish with      |
|           |         | EIO(may offline disk)                                 |
+ --------- + ------- + ------------------------------------------------------+
| fail      | NA      | fallback to host  recovery, retry or finish with      |
|           |         | EIO(may offline disk)                                 |
+ --------- + ------- + ------------------------------------------------------+

Scenario2: target based error handle is enabled:
+-----------+---------+--------------+---------+------------------------------+
| lun reset | TUR     | target reset | TUR     | Desired result               |
+-----------+---------+--------------+---------+------------------------------+
| success   | success | NA           | NA      | retry or finish with         |
|           |         |              |         | EIO(may offline disk)        |
+-----------+---------+--------------+---------+------------------------------+
| success   | fail    | success      | success | retry or finish with         |
|           |         |              |         | EIO(may offline disk)        |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | success      | success | retry or finish with         |
|           |         |              |         | EIO(may offline disk)        |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | success      | fail    | fallback to host recovery,   |
|           |         |              |         | retry or finish with EIO(may |
|           |         |              |         | offline disk)                |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | fail         | NA      | fallback to host  recovery,  |
|           |         |              |         | retry or finish with EIO(may |
|           |         |              |         | offline disk)                |
+-----------+---------+--------------+---------+------------------------------+

Scenario3: both LUN and target based error handle are enabled:
+-----------+---------+--------------+---------+------------------------------+
| lun reset | TUR     | target reset | TUR     | Desired result               |
+-----------+---------+--------------+---------+------------------------------+
| success   | success | NA           | NA      | retry or finish with         |
|           |         |              |         | EIO(may offline disk)        |
+-----------+---------+--------------+---------+------------------------------+
| success   | fail    | success      | success | lun recovery fallback to     |
|           |         |              |         | target recovery, retry or    |
|           |         |              |         | finish with EIO(may offline  |
|           |         |              |         | disk                         |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | success      | success | lun recovery fallback to     |
|           |         |              |         | target recovery, retry or    |
|           |         |              |         | finish with EIO(may offline  |
|           |         |              |         | disk                         |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | success      | fail    | lun recovery fallback to     |
|           |         |              |         | target recovery, then fall   |
|           |         |              |         | back to host recovery, retry |
|           |         |              |         | or fhinsi with EIO(may       |
|           |         |              |         | offline disk)                |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | fail         | NA      | lun recovery fallback to     |
|           |         |              |         | target recovery, then fall   |
|           |         |              |         | back to host recovery, retry |
|           |         |              |         | or fhinsi with EIO(may       |
|           |         |              |         | offline disk)                |
+-----------+---------+--------------+---------+------------------------------+

References: https://lore.kernel.org/linux-scsi/20230815122316.4129333-1-haowenchao2@huawei.com/
References: https://lore.kernel.org/linux-scsi/71e09bb4-ff0a-23fe-38b4-fe6425670efa@huawei.com/

Wenchao Hao (19):
  scsi: scsi_error: Define framework for LUN/target based error handle
  scsi: scsi_error: Move complete variable eh_action from shost to sdevice
  scsi: scsi_error: Check if to do reset in scsi_try_xxx_reset
  scsi: scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
  scsi: scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
  scsi: scsi_error: Add flags to mark error handle steps has done
  scsi: scsi_error: Add helper to handle scsi device's error command list
  scsi: scsi_error: Add a general LUN based error handler
  scsi: core: increase/decrease target_busy without check can_queue
  scsi: scsi_error: Add helper to handle scsi target's error command list
  scsi: scsi_error: Add a general target based error handler
  scsi: scsi_debug: Add param to control LUN bassed error handler
  scsi: scsi_debug: Add param to control target based error handle
  scsi: mpt3sas: Add param to control LUN based error handle
  scsi: mpt3sas: Add param to control target based error handle
  scsi: smartpqi: Add param to control LUN based error handle
  scsi: megaraid_sas: Add param to control target based error handle
  scsi: virtio_scsi: Add param to control LUN based error handle
  scsi: iscsi_tcp: Add param to control LUN based error handle

 drivers/scsi/iscsi_tcp.c                  |  20 +
 drivers/scsi/megaraid/megaraid_sas_base.c |  20 +
 drivers/scsi/mpt3sas/mpt3sas_scsih.c      |  28 +
 drivers/scsi/scsi_debug.c                 |  24 +
 drivers/scsi/scsi_error.c                 | 756 ++++++++++++++++++++--
 drivers/scsi/scsi_lib.c                   |  23 +-
 drivers/scsi/scsi_priv.h                  |  18 +
 drivers/scsi/smartpqi/smartpqi_init.c     |  14 +
 drivers/scsi/virtio_scsi.c                |  16 +-
 include/scsi/scsi_device.h                |  97 +++
 include/scsi/scsi_eh.h                    |   8 +
 include/scsi/scsi_host.h                  |   2 -
 12 files changed, 963 insertions(+), 63 deletions(-)

Wenchao Hao Sept. 5, 2023, 1:32 a.m. UTC | #1

On 2023/9/1 17:41, Wenchao Hao wrote:
> It's unbearable for systems with large scale scsi devices share HBAs to
> block all devices' IOs when handle error commands, we need a new error
> handle mechanism to address this issue.
> 
> I consulted about this issue a year ago, the discuss link can be found in
> refenence. Hannes replied about why we have to block the SCSI host
> then perform error recovery kindly. I think it's unnecessary to block
> SCSI host for all drivers and can try a small level recovery(LUN based for
> example) first to avoid block the SCSI host.
> 
> The new error handle mechanism introduced in this patchset has been
> developed and tested with out self developed hardware since one year
> ago, now we want this mechanism can be used by more drivers.
> 
> Drivers can decide if using the new error handle mechanism and how to
> handle error commands when scsi_device are scanned,the new mechanism
> makes SCSI error handle more flexible.

Hi, Hannes and Mike, would you help take a look at these changes?

> 
> SCSI error recovery strategy after blocking host's IO is mainly
> following steps:
> 
> - LUN reset
> - Target reset
> - Bus reset
> - Host reset
> 
> Some drivers did not implement callbacks for host reset, it's unnecessary
> to block host's IO for these drivers. For example, smartpqi only registered
> device reset, if device reset failed, it's meaningless to fallback to target
> reset, bus reset or host reset any more, because these steps would also
> failed.
> 
> Here are some drivers we concerned:(there are too many kinds of drivers
> to figure out, so here I just list some drivers I am familiar with)
> 
> +-------------+--------------+--------------+-----------+------------+
> |  drivers    | device_reset | target_reset | bus_reset | host_reset |
> +-------------+--------------+--------------+-----------+------------+
> | mpt3sas     |     Y        |     Y        |    N      |    Y       |
> +-------------+--------------+--------------+-----------+------------+
> | smartpqi    |     Y        |     N        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | megaraidsas |     N        |     Y        |    N      |    Y       |
> +-------------+--------------+--------------+-----------+------------+
> | virtioscsi  |     Y        |     N        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | iscsi_tcp   |     Y        |     Y        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | hisisas     |     Y        |     Y        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> 
> For LUN based error handle, when scsi command is classified as error,
> we would block the scsi device's IO and try to recover this scsi
> device, if still can not recover all error commands, it might
> fallback to target or host level recovery.
> 
> It's same for target based error handle, but target based error handle
> would block the scsi target's IO then try to recover the error commands
> of this target.
> 
> The first patch defines basic framework to support LUN/target based error
> handle mechanism, three key operations are abstracted which are:
>   - add error command
>   - wake up error handle
>   - block IOs when error command is added and recoverying.
> 
> Drivers can implement these three function callbacks and setup to SCSI
> middle level; I also add a general LUN/target based error handle strategy
> which can be called directly from drivers to implement LUN/tartget based
> error handle.
> 
> The changes of SCSI middle level's error handle are tested with scsi_debug
> which support single LUN error injection, the scsi_debug patches can be
> found in reference, following scenarios are tested.
> 
> Scenario1: LUN based error handle is enabled:
> +-----------+---------+-------------------------------------------------------+
> | lun reset | TUR     | Desired result                                        |
> + --------- + ------- + ------------------------------------------------------+
> | success   | success | retry or finish with  EIO(may offline disk)           |
> + --------- + ------- + ------------------------------------------------------+
> | success   | fail    | fallback to host  recovery, retry or finish with      |
> |           |         | EIO(may offline disk)                                 |
> + --------- + ------- + ------------------------------------------------------+
> | fail      | NA      | fallback to host  recovery, retry or finish with      |
> |           |         | EIO(may offline disk)                                 |
> + --------- + ------- + ------------------------------------------------------+
> 
> Scenario2: target based error handle is enabled:
> +-----------+---------+--------------+---------+------------------------------+
> | lun reset | TUR     | target reset | TUR     | Desired result               |
> +-----------+---------+--------------+---------+------------------------------+
> | success   | success | NA           | NA      | retry or finish with         |
> |           |         |              |         | EIO(may offline disk)        |
> +-----------+---------+--------------+---------+------------------------------+
> | success   | fail    | success      | success | retry or finish with         |
> |           |         |              |         | EIO(may offline disk)        |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | success      | success | retry or finish with         |
> |           |         |              |         | EIO(may offline disk)        |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | success      | fail    | fallback to host recovery,   |
> |           |         |              |         | retry or finish with EIO(may |
> |           |         |              |         | offline disk)                |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | fail         | NA      | fallback to host  recovery,  |
> |           |         |              |         | retry or finish with EIO(may |
> |           |         |              |         | offline disk)                |
> +-----------+---------+--------------+---------+------------------------------+
> 
> Scenario3: both LUN and target based error handle are enabled:
> +-----------+---------+--------------+---------+------------------------------+
> | lun reset | TUR     | target reset | TUR     | Desired result               |
> +-----------+---------+--------------+---------+------------------------------+
> | success   | success | NA           | NA      | retry or finish with         |
> |           |         |              |         | EIO(may offline disk)        |
> +-----------+---------+--------------+---------+------------------------------+
> | success   | fail    | success      | success | lun recovery fallback to     |
> |           |         |              |         | target recovery, retry or    |
> |           |         |              |         | finish with EIO(may offline  |
> |           |         |              |         | disk                         |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | success      | success | lun recovery fallback to     |
> |           |         |              |         | target recovery, retry or    |
> |           |         |              |         | finish with EIO(may offline  |
> |           |         |              |         | disk                         |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | success      | fail    | lun recovery fallback to     |
> |           |         |              |         | target recovery, then fall   |
> |           |         |              |         | back to host recovery, retry |
> |           |         |              |         | or fhinsi with EIO(may       |
> |           |         |              |         | offline disk)                |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | fail         | NA      | lun recovery fallback to     |
> |           |         |              |         | target recovery, then fall   |
> |           |         |              |         | back to host recovery, retry |
> |           |         |              |         | or fhinsi with EIO(may       |
> |           |         |              |         | offline disk)                |
> +-----------+---------+--------------+---------+------------------------------+
> 
> References: https://lore.kernel.org/linux-scsi/20230815122316.4129333-1-haowenchao2@huawei.com/
> References: https://lore.kernel.org/linux-scsi/71e09bb4-ff0a-23fe-38b4-fe6425670efa@huawei.com/
> 
> Wenchao Hao (19):
>    scsi: scsi_error: Define framework for LUN/target based error handle
>    scsi: scsi_error: Move complete variable eh_action from shost to sdevice
>    scsi: scsi_error: Check if to do reset in scsi_try_xxx_reset
>    scsi: scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
>    scsi: scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
>    scsi: scsi_error: Add flags to mark error handle steps has done
>    scsi: scsi_error: Add helper to handle scsi device's error command list
>    scsi: scsi_error: Add a general LUN based error handler
>    scsi: core: increase/decrease target_busy without check can_queue
>    scsi: scsi_error: Add helper to handle scsi target's error command list
>    scsi: scsi_error: Add a general target based error handler
>    scsi: scsi_debug: Add param to control LUN bassed error handler
>    scsi: scsi_debug: Add param to control target based error handle
>    scsi: mpt3sas: Add param to control LUN based error handle
>    scsi: mpt3sas: Add param to control target based error handle
>    scsi: smartpqi: Add param to control LUN based error handle
>    scsi: megaraid_sas: Add param to control target based error handle
>    scsi: virtio_scsi: Add param to control LUN based error handle
>    scsi: iscsi_tcp: Add param to control LUN based error handle
> 
>   drivers/scsi/iscsi_tcp.c                  |  20 +
>   drivers/scsi/megaraid/megaraid_sas_base.c |  20 +
>   drivers/scsi/mpt3sas/mpt3sas_scsih.c      |  28 +
>   drivers/scsi/scsi_debug.c                 |  24 +
>   drivers/scsi/scsi_error.c                 | 756 ++++++++++++++++++++--
>   drivers/scsi/scsi_lib.c                   |  23 +-
>   drivers/scsi/scsi_priv.h                  |  18 +
>   drivers/scsi/smartpqi/smartpqi_init.c     |  14 +
>   drivers/scsi/virtio_scsi.c                |  16 +-
>   include/scsi/scsi_device.h                |  97 +++
>   include/scsi/scsi_eh.h                    |   8 +
>   include/scsi/scsi_host.h                  |   2 -
>   12 files changed, 963 insertions(+), 63 deletions(-)
>

Mike Christie Sept. 6, 2023, 12:22 a.m. UTC | #2

On 9/1/23 4:41 AM, Wenchao Hao wrote:
> It's unbearable for systems with large scale scsi devices share HBAs to
> block all devices' IOs when handle error commands, we need a new error
> handle mechanism to address this issue.
> 
> I consulted about this issue a year ago, the discuss link can be found in
> refenence. Hannes replied about why we have to block the SCSI host
> then perform error recovery kindly. I think it's unnecessary to block
> SCSI host for all drivers and can try a small level recovery(LUN based for
> example) first to avoid block the SCSI host.
> 
> The new error handle mechanism introduced in this patchset has been
> developed and tested with out self developed hardware since one year
> ago, now we want this mechanism can be used by more drivers.
> 
> Drivers can decide if using the new error handle mechanism and how to
> handle error commands when scsi_device are scanned,the new mechanism
> makes SCSI error handle more flexible.
> 
> SCSI error recovery strategy after blocking host's IO is mainly
> following steps:
> 
> - LUN reset
> - Target reset
> - Bus reset
> - Host reset
> 
> Some drivers did not implement callbacks for host reset, it's unnecessary
> to block host's IO for these drivers. For example, smartpqi only registered
> device reset, if device reset failed, it's meaningless to fallback to target
> reset, bus reset or host reset any more, because these steps would also
> failed.
> 
> Here are some drivers we concerned:(there are too many kinds of drivers
> to figure out, so here I just list some drivers I am familiar with)
> 
> +-------------+--------------+--------------+-----------+------------+
> |  drivers    | device_reset | target_reset | bus_reset | host_reset |
> +-------------+--------------+--------------+-----------+------------+
> | mpt3sas     |     Y        |     Y        |    N      |    Y       |
> +-------------+--------------+--------------+-----------+------------+
> | smartpqi    |     Y        |     N        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | megaraidsas |     N        |     Y        |    N      |    Y       |
> +-------------+--------------+--------------+-----------+------------+
> | virtioscsi  |     Y        |     N        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | iscsi_tcp   |     Y        |     Y        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | hisisas     |     Y        |     Y        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> 
> For LUN based error handle, when scsi command is classified as error,
> we would block the scsi device's IO and try to recover this scsi
> device, if still can not recover all error commands, it might
> fallback to target or host level recovery.
> 
> It's same for target based error handle, but target based error handle
> would block the scsi target's IO then try to recover the error commands
> of this target.
> 
> The first patch defines basic framework to support LUN/target based error
> handle mechanism, three key operations are abstracted which are:
>  - add error command
>  - wake up error handle
>  - block IOs when error command is added and recoverying.
> 
> Drivers can implement these three function callbacks and setup to SCSI
> middle level; I also add a general LUN/target based error handle strategy
> which can be called directly from drivers to implement LUN/tartget based
> error handle.
> 
> The changes of SCSI middle level's error handle are tested with scsi_debug
> which support single LUN error injection, the scsi_debug patches can be
> found in reference, following scenarios are tested.
> 
> Scenario1: LUN based error handle is enabled:
> +-----------+---------+-------------------------------------------------------+
> | lun reset | TUR     | Desired result                                        |
> + --------- + ------- + ------------------------------------------------------+
> | success   | success | retry or finish with  EIO(may offline disk)           |
> + --------- + ------- + ------------------------------------------------------+
> | success   | fail    | fallback to host  recovery, retry or finish with      |
> |           |         | EIO(may offline disk)                                 |


I didn't get this part about when we offline the disk now. For this LUN case, is
it if the driver has called scsi_device_setup_eh with the fallback arg with false?
If so why doesn't it just try target reset -> bus reset -> host reset like before?


From a high level I have the following questions/comments:

1. The modparam and drivers calling scsi_device_setup_eh/scsi_device_clear_eh
seem uneeded.

If the driver supports performing multiple TMFs/resets in parallel then why
not always enable it?

2. You probably need to work more closely for some of the drivers. For the iscsi
patch, we would want to allocate a tmf per device and per target, or if a cmd
timesout on a second lun, that will always fail since the iscsi host has only
one tmf preallocated and it would be in use by the first reset.

For the other drivers, did you review what they support? If so, how many drivers
can you just turn this on for? Or what drivers did you test so far?

Wenchao Hao Sept. 6, 2023, 11:15 a.m. UTC | #3

On 2023/9/6 8:22, Mike Christie wrote:
> On 9/1/23 4:41 AM, Wenchao Hao wrote:
>> It's unbearable for systems with large scale scsi devices share HBAs to
>> block all devices' IOs when handle error commands, we need a new error
>> handle mechanism to address this issue.
>>
>> I consulted about this issue a year ago, the discuss link can be found in
>> refenence. Hannes replied about why we have to block the SCSI host
>> then perform error recovery kindly. I think it's unnecessary to block
>> SCSI host for all drivers and can try a small level recovery(LUN based for
>> example) first to avoid block the SCSI host.
>>
>> The new error handle mechanism introduced in this patchset has been
>> developed and tested with out self developed hardware since one year
>> ago, now we want this mechanism can be used by more drivers.
>>
>> Drivers can decide if using the new error handle mechanism and how to
>> handle error commands when scsi_device are scanned,the new mechanism
>> makes SCSI error handle more flexible.
>>
>> SCSI error recovery strategy after blocking host's IO is mainly
>> following steps:
>>
>> - LUN reset
>> - Target reset
>> - Bus reset
>> - Host reset
>>
>> Some drivers did not implement callbacks for host reset, it's unnecessary
>> to block host's IO for these drivers. For example, smartpqi only registered
>> device reset, if device reset failed, it's meaningless to fallback to target
>> reset, bus reset or host reset any more, because these steps would also
>> failed.
>>
>> Here are some drivers we concerned:(there are too many kinds of drivers
>> to figure out, so here I just list some drivers I am familiar with)
>>
>> +-------------+--------------+--------------+-----------+------------+
>> |  drivers    | device_reset | target_reset | bus_reset | host_reset |
>> +-------------+--------------+--------------+-----------+------------+
>> | mpt3sas     |     Y        |     Y        |    N      |    Y       |
>> +-------------+--------------+--------------+-----------+------------+
>> | smartpqi    |     Y        |     N        |    N      |    N       |
>> +-------------+--------------+--------------+-----------+------------+
>> | megaraidsas |     N        |     Y        |    N      |    Y       |
>> +-------------+--------------+--------------+-----------+------------+
>> | virtioscsi  |     Y        |     N        |    N      |    N       |
>> +-------------+--------------+--------------+-----------+------------+
>> | iscsi_tcp   |     Y        |     Y        |    N      |    N       |
>> +-------------+--------------+--------------+-----------+------------+
>> | hisisas     |     Y        |     Y        |    N      |    N       |
>> +-------------+--------------+--------------+-----------+------------+
>>
>> For LUN based error handle, when scsi command is classified as error,
>> we would block the scsi device's IO and try to recover this scsi
>> device, if still can not recover all error commands, it might
>> fallback to target or host level recovery.
>>
>> It's same for target based error handle, but target based error handle
>> would block the scsi target's IO then try to recover the error commands
>> of this target.
>>
>> The first patch defines basic framework to support LUN/target based error
>> handle mechanism, three key operations are abstracted which are:
>>   - add error command
>>   - wake up error handle
>>   - block IOs when error command is added and recoverying.
>>
>> Drivers can implement these three function callbacks and setup to SCSI
>> middle level; I also add a general LUN/target based error handle strategy
>> which can be called directly from drivers to implement LUN/tartget based
>> error handle.
>>
>> The changes of SCSI middle level's error handle are tested with scsi_debug
>> which support single LUN error injection, the scsi_debug patches can be
>> found in reference, following scenarios are tested.
>>
>> Scenario1: LUN based error handle is enabled:
>> +-----------+---------+-------------------------------------------------------+
>> | lun reset | TUR     | Desired result                                        |
>> + --------- + ------- + ------------------------------------------------------+
>> | success   | success | retry or finish with  EIO(may offline disk)           |
>> + --------- + ------- + ------------------------------------------------------+
>> | success   | fail    | fallback to host  recovery, retry or finish with      |
>> |           |         | EIO(may offline disk)                                 |
> 
> 
> I didn't get this part about when we offline the disk now. For this LUN case, is
> it if the driver has called scsi_device_setup_eh with the fallback arg with false?
> If so why doesn't it just try target reset -> bus reset -> host reset like before?
> 
> 

There are two scenarios to offline disk when LUN reset succeed and TUR failed:

1. The driver has called scsi_device_setup_eh with the fallback arg with false.
    Drivers who did not implement eh_xxx_reset for target reset, bus reset and
    host reset should set fallback to false because these steps would failed,
    so it's meaningless to try target reset -> bus reset -> host reset any more.

2. The disk would also be offlined by sd_eh_action() defined in sd.c when error
    recovery succeed but command times out contiuously.

int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp) {
	...
	/*
	* If the device keeps failing read/write commands but TEST UNIT
	* READY always completes successfully we assume that medium
	* access is no longer possible and take the device offline.
	*/
	if (sdkp->medium_access_timed_out >= sdkp->max_medium_access_timeouts) {
			scmd_printk(KERN_ERR, scmd,
						"Medium access timeout failure. Offlining disk!\n");
			mutex_lock(&sdev->state_mutex);
			scsi_device_set_state(sdev, SDEV_OFFLINE);
			mutex_unlock(&sdev->state_mutex);

			return SUCCESS;
	}
	...
}

>>From a high level I have the following questions/comments:
> 
> 1. The modparam and drivers calling scsi_device_setup_eh/scsi_device_clear_eh
> seem uneeded.
> 
> If the driver supports performing multiple TMFs/resets in parallel then why
> not always enable it?
> 

Not all hardware/drivers support performing multiple TMFs/resets in parallel,
so I think it is necessary to call scsi_device_setup_eh/scsi_device_clear_eh
in specific drivers.
As to modparam, my original intention is let administrators determine whether
to enable LUN/target based error handle since it's a feature for host with
large scale of scsi devices.
If most users think the modparam is unnecessary, I would remove it.

> 2. You probably need to work more closely for some of the drivers. For the iscsi
> patch, we would want to allocate a tmf per device and per target, or if a cmd
> timesout on a second lun, that will always fail since the iscsi host has only
> one tmf preallocated and it would be in use by the first reset.
> 

Sorry I did not check iscsi's device/target reset in detail.

> For the other drivers, did you review what they support? If so, how many drivers
> can you just turn this on for? Or what drivers did you test so far?
> 
I reviewed the drivers changed in this patchset briefly, and it seems ok for
these drivers to turn this on. The tests are in progress but not finished now.
I would post the results in future.

My intention of posting this patchset is want people discuss about this new
error recovery mechanism. As to drivers, we would test drivers one by one, and
we are welcome others join us to analyze and test more drivers with these

Mike Christie Sept. 6, 2023, 3:56 p.m. UTC | #4

On 9/6/23 6:15 AM, haowenchao (C) wrote:
>>
>> If the driver supports performing multiple TMFs/resets in parallel then why
>> not always enable it?
>>
> 
> Not all hardware/drivers support performing multiple TMFs/resets in parallel,
> so I think it is necessary to call scsi_device_setup_eh/scsi_device_clear_eh
> in specific drivers.

Ah shoot sorry. I edited my email before I sent it and dropped part of it.

For the scsi_device_setup_eh/scsi_device_clear_eh comment I just meant it could
be a scsi_host_template field. scsi-ml would then see it and do the
scsi_device_setup_eh/scsi_device_clear_eh calls for the driver. The drivers
then don't have to deal with doing slave callouts and handling errors.

Also for the error handling case I think we want to still proceed if
scsi_device_setup_eh fails. Just use the old EH in that case.

Wenchao Hao Sept. 7, 2023, 12:38 p.m. UTC | #5

On 2023/9/6 23:56, Mike Christie wrote:
> On 9/6/23 6:15 AM, haowenchao (C) wrote:
>>>
>>> If the driver supports performing multiple TMFs/resets in parallel then why
>>> not always enable it?
>>>
>>
>> Not all hardware/drivers support performing multiple TMFs/resets in parallel,
>> so I think it is necessary to call scsi_device_setup_eh/scsi_device_clear_eh
>> in specific drivers.
> 
> Ah shoot sorry. I edited my email before I sent it and dropped part of it.
> 
> For the scsi_device_setup_eh/scsi_device_clear_eh comment I just meant it could
> be a scsi_host_template field. scsi-ml would then see it and do the
> scsi_device_setup_eh/scsi_device_clear_eh calls for the driver. The drivers
> then don't have to deal with doing slave callouts and handling errors.
> 
> Also for the error handling case I think we want to still proceed if
> scsi_device_setup_eh fails. Just use the old EH in that case.

It looks better to add setup_eh/clear_eh in scsi_host_template, I would update in
next version.

Wenchao Hao Sept. 14, 2023, 6:20 a.m. UTC | #6

On 2023/9/1 17:41, Wenchao Hao wrote:
> It's unbearable for systems with large scale scsi devices share HBAs to
> block all devices' IOs when handle error commands, we need a new error
> handle mechanism to address this issue.
> 
> I consulted about this issue a year ago, the discuss link can be found in
> refenence. Hannes replied about why we have to block the SCSI host
> then perform error recovery kindly. I think it's unnecessary to block
> SCSI host for all drivers and can try a small level recovery(LUN based for
> example) first to avoid block the SCSI host.
> 
> The new error handle mechanism introduced in this patchset has been
> developed and tested with out self developed hardware since one year
> ago, now we want this mechanism can be used by more drivers.
> 
> Drivers can decide if using the new error handle mechanism and how to
> handle error commands when scsi_device are scanned,the new mechanism
> makes SCSI error handle more flexible.
> 
> SCSI error recovery strategy after blocking host's IO is mainly
> following steps:
> 
> - LUN reset
> - Target reset
> - Bus reset
> - Host reset
> 

Mike gave some suggestions and I found a bug in fallback logic, I would
address these and resend in next few days.

> Some drivers did not implement callbacks for host reset, it's unnecessary
> to block host's IO for these drivers. For example, smartpqi only registered
> device reset, if device reset failed, it's meaningless to fallback to target
> reset, bus reset or host reset any more, because these steps would also
> failed.
> 
> Here are some drivers we concerned:(there are too many kinds of drivers
> to figure out, so here I just list some drivers I am familiar with)
> 
> +-------------+--------------+--------------+-----------+------------+
> |  drivers    | device_reset | target_reset | bus_reset | host_reset |
> +-------------+--------------+--------------+-----------+------------+
> | mpt3sas     |     Y        |     Y        |    N      |    Y       |
> +-------------+--------------+--------------+-----------+------------+
> | smartpqi    |     Y        |     N        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | megaraidsas |     N        |     Y        |    N      |    Y       |
> +-------------+--------------+--------------+-----------+------------+
> | virtioscsi  |     Y        |     N        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | iscsi_tcp   |     Y        |     Y        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> | hisisas     |     Y        |     Y        |    N      |    N       |
> +-------------+--------------+--------------+-----------+------------+
> 
> For LUN based error handle, when scsi command is classified as error,
> we would block the scsi device's IO and try to recover this scsi
> device, if still can not recover all error commands, it might
> fallback to target or host level recovery.
> 
> It's same for target based error handle, but target based error handle
> would block the scsi target's IO then try to recover the error commands
> of this target.
> 
> The first patch defines basic framework to support LUN/target based error
> handle mechanism, three key operations are abstracted which are:
>   - add error command
>   - wake up error handle
>   - block IOs when error command is added and recoverying.
> 
> Drivers can implement these three function callbacks and setup to SCSI
> middle level; I also add a general LUN/target based error handle strategy
> which can be called directly from drivers to implement LUN/tartget based
> error handle.
> 
> The changes of SCSI middle level's error handle are tested with scsi_debug
> which support single LUN error injection, the scsi_debug patches can be
> found in reference, following scenarios are tested.
> 
> Scenario1: LUN based error handle is enabled:
> +-----------+---------+-------------------------------------------------------+
> | lun reset | TUR     | Desired result                                        |
> + --------- + ------- + ------------------------------------------------------+
> | success   | success | retry or finish with  EIO(may offline disk)           |
> + --------- + ------- + ------------------------------------------------------+
> | success   | fail    | fallback to host  recovery, retry or finish with      |
> |           |         | EIO(may offline disk)                                 |
> + --------- + ------- + ------------------------------------------------------+
> | fail      | NA      | fallback to host  recovery, retry or finish with      |
> |           |         | EIO(may offline disk)                                 |
> + --------- + ------- + ------------------------------------------------------+
> 
> Scenario2: target based error handle is enabled:
> +-----------+---------+--------------+---------+------------------------------+
> | lun reset | TUR     | target reset | TUR     | Desired result               |
> +-----------+---------+--------------+---------+------------------------------+
> | success   | success | NA           | NA      | retry or finish with         |
> |           |         |              |         | EIO(may offline disk)        |
> +-----------+---------+--------------+---------+------------------------------+
> | success   | fail    | success      | success | retry or finish with         |
> |           |         |              |         | EIO(may offline disk)        |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | success      | success | retry or finish with         |
> |           |         |              |         | EIO(may offline disk)        |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | success      | fail    | fallback to host recovery,   |
> |           |         |              |         | retry or finish with EIO(may |
> |           |         |              |         | offline disk)                |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | fail         | NA      | fallback to host  recovery,  |
> |           |         |              |         | retry or finish with EIO(may |
> |           |         |              |         | offline disk)                |
> +-----------+---------+--------------+---------+------------------------------+
> 
> Scenario3: both LUN and target based error handle are enabled:
> +-----------+---------+--------------+---------+------------------------------+
> | lun reset | TUR     | target reset | TUR     | Desired result               |
> +-----------+---------+--------------+---------+------------------------------+
> | success   | success | NA           | NA      | retry or finish with         |
> |           |         |              |         | EIO(may offline disk)        |
> +-----------+---------+--------------+---------+------------------------------+
> | success   | fail    | success      | success | lun recovery fallback to     |
> |           |         |              |         | target recovery, retry or    |
> |           |         |              |         | finish with EIO(may offline  |
> |           |         |              |         | disk                         |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | success      | success | lun recovery fallback to     |
> |           |         |              |         | target recovery, retry or    |
> |           |         |              |         | finish with EIO(may offline  |
> |           |         |              |         | disk                         |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | success      | fail    | lun recovery fallback to     |
> |           |         |              |         | target recovery, then fall   |
> |           |         |              |         | back to host recovery, retry |
> |           |         |              |         | or fhinsi with EIO(may       |
> |           |         |              |         | offline disk)                |
> +-----------+---------+--------------+---------+------------------------------+
> | fail      | NA      | fail         | NA      | lun recovery fallback to     |
> |           |         |              |         | target recovery, then fall   |
> |           |         |              |         | back to host recovery, retry |
> |           |         |              |         | or fhinsi with EIO(may       |
> |           |         |              |         | offline disk)                |
> +-----------+---------+--------------+---------+------------------------------+
> 
> References: https://lore.kernel.org/linux-scsi/20230815122316.4129333-1-haowenchao2@huawei.com/
> References: https://lore.kernel.org/linux-scsi/71e09bb4-ff0a-23fe-38b4-fe6425670efa@huawei.com/
> 
> Wenchao Hao (19):
>    scsi: scsi_error: Define framework for LUN/target based error handle
>    scsi: scsi_error: Move complete variable eh_action from shost to sdevice
>    scsi: scsi_error: Check if to do reset in scsi_try_xxx_reset
>    scsi: scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
>    scsi: scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
>    scsi: scsi_error: Add flags to mark error handle steps has done
>    scsi: scsi_error: Add helper to handle scsi device's error command list
>    scsi: scsi_error: Add a general LUN based error handler
>    scsi: core: increase/decrease target_busy without check can_queue
>    scsi: scsi_error: Add helper to handle scsi target's error command list
>    scsi: scsi_error: Add a general target based error handler
>    scsi: scsi_debug: Add param to control LUN bassed error handler
>    scsi: scsi_debug: Add param to control target based error handle
>    scsi: mpt3sas: Add param to control LUN based error handle
>    scsi: mpt3sas: Add param to control target based error handle
>    scsi: smartpqi: Add param to control LUN based error handle
>    scsi: megaraid_sas: Add param to control target based error handle
>    scsi: virtio_scsi: Add param to control LUN based error handle
>    scsi: iscsi_tcp: Add param to control LUN based error handle
> 
>   drivers/scsi/iscsi_tcp.c                  |  20 +
>   drivers/scsi/megaraid/megaraid_sas_base.c |  20 +
>   drivers/scsi/mpt3sas/mpt3sas_scsih.c      |  28 +
>   drivers/scsi/scsi_debug.c                 |  24 +
>   drivers/scsi/scsi_error.c                 | 756 ++++++++++++++++++++--
>   drivers/scsi/scsi_lib.c                   |  23 +-
>   drivers/scsi/scsi_priv.h                  |  18 +
>   drivers/scsi/smartpqi/smartpqi_init.c     |  14 +
>   drivers/scsi/virtio_scsi.c                |  16 +-
>   include/scsi/scsi_device.h                |  97 +++
>   include/scsi/scsi_eh.h                    |   8 +
>   include/scsi/scsi_host.h                  |   2 -
>   12 files changed, 963 insertions(+), 63 deletions(-)
>

Christoph Hellwig Sept. 25, 2023, 2:55 p.m. UTC | #7

Before we add another new error handling mechanism we need to fix the
old one first.  Hannes' work on not passing the scsi_cmnd to the various
reset handlers hasn't made a lot of progress in the last five years and
we'll need to urgently fix that first before adding even more
complexity.

Wenchao Hao Sept. 25, 2023, 3:07 p.m. UTC | #8

On 2023/9/25 22:55, Christoph Hellwig wrote:
> Before we add another new error handling mechanism we need to fix the
> old one first.  Hannes' work on not passing the scsi_cmnd to the various
> reset handlers hasn't made a lot of progress in the last five years and
> we'll need to urgently fix that first before adding even more
> complexity.
> 
I observed Hannes's patches posted about one year ago, it has not been
applied yet. I don't know if he is still working on it.

My patches do not depend much on that work, I think the conflict can be
solved fast between two changes.

Mike Christie Sept. 25, 2023, 4:52 p.m. UTC | #9

On 9/14/23 1:20 AM, Wenchao Hao wrote:
> On 2023/9/1 17:41, Wenchao Hao wrote:
>> It's unbearable for systems with large scale scsi devices share HBAs to
>> block all devices' IOs when handle error commands, we need a new error
>> handle mechanism to address this issue.
>>
>> I consulted about this issue a year ago, the discuss link can be found in
>> refenence. Hannes replied about why we have to block the SCSI host
>> then perform error recovery kindly. I think it's unnecessary to block
>> SCSI host for all drivers and can try a small level recovery(LUN based for
>> example) first to avoid block the SCSI host.
>>
>> The new error handle mechanism introduced in this patchset has been
>> developed and tested with out self developed hardware since one year
>> ago, now we want this mechanism can be used by more drivers.
>>
>> Drivers can decide if using the new error handle mechanism and how to
>> handle error commands when scsi_device are scanned,the new mechanism
>> makes SCSI error handle more flexible.
>>
>> SCSI error recovery strategy after blocking host's IO is mainly
>> following steps:
>>
>> - LUN reset
>> - Target reset
>> - Bus reset
>> - Host reset
>>
> 
> Mike gave some suggestions and I found a bug in fallback logic, I would
> address these and resend in next few days.

Please wait to resend. I'm still reviewing the patches. When I commented
last time I just did a quick look over to get an idea for the design and
what your goals were.

Mike Christie Sept. 25, 2023, 5:54 p.m. UTC | #10

On 9/25/23 10:07 AM, Wenchao Hao wrote:
> On 2023/9/25 22:55, Christoph Hellwig wrote:
>> Before we add another new error handling mechanism we need to fix the
>> old one first.  Hannes' work on not passing the scsi_cmnd to the various
>> reset handlers hasn't made a lot of progress in the last five years and
>> we'll need to urgently fix that first before adding even more
>> complexity.
>>
> I observed Hannes's patches posted about one year ago, it has not been
> applied yet. I don't know if he is still working on it.
> 
> My patches do not depend much on that work, I think the conflict can be
> solved fast between two changes.

I think we want to figure out Hannes's patches first.

For a new EH design we will want to be able to do multiple TMFs in parallel
on the same host/target right? 

The problem is that we need to be able to make forward progress in the EH
path and not fail just because we can't allocate memory for a TMF related
struct. To accomplish this now, drivers will use mempools, preallocate TMF
related structs/mem/tags with their scsi_cmnd related structs, preallocate
per host/target/device related structs or ignore what I wrote above and just
fail.

Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd
when it's not needed. That seems nice because after that, then for your new
EH we can begin to standardize on how to handle preallocation of drivers
resources needed to perform TMFs for your new EH. It could be a per
device/target/host callout to allow drivers to preallocate, then scsi-ml calls
into the drivers with that data. It doesn't have to be exactly like that or
anything close. It would be nice for drivers to not have to think about this
type of thing and scsi-ml just to handle the resource management for us when
there are multiple TMFs in progress.

Christoph Hellwig Sept. 26, 2023, 7:26 a.m. UTC | #11

On Mon, Sep 25, 2023 at 12:54:48PM -0500, Mike Christie wrote:
> I think we want to figure out Hannes's patches first.

Yes.

> For a new EH design we will want to be able to do multiple TMFs in parallel
> on the same host/target right? 
> 
> The problem is that we need to be able to make forward progress in the EH
> path and not fail just because we can't allocate memory for a TMF related
> struct. To accomplish this now, drivers will use mempools, preallocate TMF
> related structs/mem/tags with their scsi_cmnd related structs, preallocate
> per host/target/device related structs or ignore what I wrote above and just
> fail.
> 
> Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd
> when it's not needed. That seems nice because after that, then for your new
> EH we can begin to standardize on how to handle preallocation of drivers
> resources needed to perform TMFs for your new EH. It could be a per
> device/target/host callout to allow drivers to preallocate, then scsi-ml calls
> into the drivers with that data. It doesn't have to be exactly like that or
> anything close. It would be nice for drivers to not have to think about this
> type of thing and scsi-ml just to handle the resource management for us when
> there are multiple TMFs in progress.

Exactly!

Wenchao Hao Sept. 26, 2023, 12:57 p.m. UTC | #12

On 2023/9/26 1:54, Mike Christie wrote:
> On 9/25/23 10:07 AM, Wenchao Hao wrote:
>> On 2023/9/25 22:55, Christoph Hellwig wrote:
>>> Before we add another new error handling mechanism we need to fix the
>>> old one first.  Hannes' work on not passing the scsi_cmnd to the various
>>> reset handlers hasn't made a lot of progress in the last five years and
>>> we'll need to urgently fix that first before adding even more
>>> complexity.
>>>
>> I observed Hannes's patches posted about one year ago, it has not been
>> applied yet. I don't know if he is still working on it.
>>
>> My patches do not depend much on that work, I think the conflict can be
>> solved fast between two changes.
> 
> I think we want to figure out Hannes's patches first.
> 
> For a new EH design we will want to be able to do multiple TMFs in parallel
> on the same host/target right?
> 

It's not necessary to do multiple TMFs in parallel, it's ok to make sure
each TMFs do not affect each other.

For example, we have two devices: 0:0:0:0 and 0:0:0:1

Both of them request device reset, they do not happened in parallel, but
would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1
just wait 0:0:0:0 to finish.

> The problem is that we need to be able to make forward progress in the EH
> path and not fail just because we can't allocate memory for a TMF related
> struct. To accomplish this now, drivers will use mempools, preallocate TMF
> related structs/mem/tags with their scsi_cmnd related structs, preallocate
> per host/target/device related structs or ignore what I wrote above and just
> fail.
> 
> Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd
> when it's not needed. That seems nice because after that, then for your new
> EH we can begin to standardize on how to handle preallocation of drivers
> resources needed to perform TMFs for your new EH. It could be a per
> device/target/host callout to allow drivers to preallocate, then scsi-ml calls
> into the drivers with that data. It doesn't have to be exactly like that or
> anything close. It would be nice for drivers to not have to think about this
> type of thing and scsi-ml just to handle the resource management for us when
> there are multiple TMFs in progress.
>

Mike Christie Sept. 26, 2023, 5:37 p.m. UTC | #13

On 9/26/23 7:57 AM, Wenchao Hao wrote:
> On 2023/9/26 1:54, Mike Christie wrote:
>> On 9/25/23 10:07 AM, Wenchao Hao wrote:
>>> On 2023/9/25 22:55, Christoph Hellwig wrote:
>>>> Before we add another new error handling mechanism we need to fix the
>>>> old one first.  Hannes' work on not passing the scsi_cmnd to the various
>>>> reset handlers hasn't made a lot of progress in the last five years and
>>>> we'll need to urgently fix that first before adding even more
>>>> complexity.
>>>>
>>> I observed Hannes's patches posted about one year ago, it has not been
>>> applied yet. I don't know if he is still working on it.
>>>
>>> My patches do not depend much on that work, I think the conflict can be
>>> solved fast between two changes.
>>
>> I think we want to figure out Hannes's patches first.
>>
>> For a new EH design we will want to be able to do multiple TMFs in parallel
>> on the same host/target right?
>>
> 
> It's not necessary to do multiple TMFs in parallel, it's ok to make sure
> each TMFs do not affect each other.
> 
> For example, we have two devices: 0:0:0:0 and 0:0:0:1
> 
> Both of them request device reset, they do not happened in parallel, but
> would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1
> just wait 0:0:0:0 to finish.

I see. I guess we still get the benefit of not having to stop other devices
when doing TMFs.

I think we still want a common way to allocate/free and manage resources
drivers will use during this time. Maybe have a init_device/target/cmd/eh_priv
and exit_device/target/eh_priv (I'm not sure of the name, but something similar
to the init_cmd_priv/exit_cmd_priv we have for normal commands.

scsi-ml then calls into the new eh with the priv data. Drivers don't have to
do the preallocation and worry if it's per device/target/host.

I'm not 100% sure about the low level details. Check out how Hannes's is
handling tag management for TMFs as well.


> 
>> The problem is that we need to be able to make forward progress in the EH
>> path and not fail just because we can't allocate memory for a TMF related
>> struct. To accomplish this now, drivers will use mempools, preallocate TMF
>> related structs/mem/tags with their scsi_cmnd related structs, preallocate
>> per host/target/device related structs or ignore what I wrote above and just
>> fail.
>>
>> Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd
>> when it's not needed. That seems nice because after that, then for your new
>> EH we can begin to standardize on how to handle preallocation of drivers
>> resources needed to perform TMFs for your new EH. It could be a per
>> device/target/host callout to allow drivers to preallocate, then scsi-ml calls
>> into the drivers with that data. It doesn't have to be exactly like that or
>> anything close. It would be nice for drivers to not have to think about this
>> type of thing and scsi-ml just to handle the resource management for us when
>> there are multiple TMFs in progress.
>>
>

Hannes Reinecke Sept. 27, 2023, 6:26 a.m. UTC | #14

On 9/26/23 09:26, Christoph Hellwig wrote:
> On Mon, Sep 25, 2023 at 12:54:48PM -0500, Mike Christie wrote:
>> I think we want to figure out Hannes's patches first.
> 
> Yes.
> 
>> For a new EH design we will want to be able to do multiple TMFs in parallel
>> on the same host/target right?
>>
>> The problem is that we need to be able to make forward progress in the EH
>> path and not fail just because we can't allocate memory for a TMF related
>> struct. To accomplish this now, drivers will use mempools, preallocate TMF
>> related structs/mem/tags with their scsi_cmnd related structs, preallocate
>> per host/target/device related structs or ignore what I wrote above and just
>> fail.
>>
>> Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd
>> when it's not needed. That seems nice because after that, then for your new
>> EH we can begin to standardize on how to handle preallocation of drivers
>> resources needed to perform TMFs for your new EH. It could be a per
>> device/target/host callout to allow drivers to preallocate, then scsi-ml calls
>> into the drivers with that data. It doesn't have to be exactly like that or
>> anything close. It would be nice for drivers to not have to think about this
>> type of thing and scsi-ml just to handle the resource management for us when
>> there are multiple TMFs in progress.
> 
> Exactly!

Yeah, thanks for the vote of support.
Last time I tried the attempt got shot down, as it had been using the 
'wrong' interface. But seeing that there's renewed interest I'll be 
reposting them.

Cheers,

Hannes

Hannes Reinecke Sept. 27, 2023, 7:59 a.m. UTC | #15

On 9/26/23 14:57, Wenchao Hao wrote:
> On 2023/9/26 1:54, Mike Christie wrote:
>> On 9/25/23 10:07 AM, Wenchao Hao wrote:
>>> On 2023/9/25 22:55, Christoph Hellwig wrote:
>>>> Before we add another new error handling mechanism we need to fix the
>>>> old one first.  Hannes' work on not passing the scsi_cmnd to the 
>>>> various
>>>> reset handlers hasn't made a lot of progress in the last five years and
>>>> we'll need to urgently fix that first before adding even more
>>>> complexity.
>>>>
>>> I observed Hannes's patches posted about one year ago, it has not been
>>> applied yet. I don't know if he is still working on it.
>>>
>>> My patches do not depend much on that work, I think the conflict can be
>>> solved fast between two changes.
>>
>> I think we want to figure out Hannes's patches first.
>>
>> For a new EH design we will want to be able to do multiple TMFs in 
>> parallel
>> on the same host/target right?
>>
> 
> It's not necessary to do multiple TMFs in parallel, it's ok to make sure
> each TMFs do not affect each other.
> 
> For example, we have two devices: 0:0:0:0 and 0:0:0:1
> 
> Both of them request device reset, they do not happened in parallel, but
> would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1
> just wait 0:0:0:0 to finish.
> 
Well, not quite. Any higher-order TMFs are serialized by virtue of 
SCSI-EH, but command aborts (which also devolve down to TMFs on certain 
drivers) do run in parallel, and there we will be requiring multiple TMFs.

Cheers,

Hannes

Wenchao Hao Sept. 27, 2023, 9:39 a.m. UTC | #16

On 2023/9/27 1:37, Mike Christie wrote:
> On 9/26/23 7:57 AM, Wenchao Hao wrote:
>> On 2023/9/26 1:54, Mike Christie wrote:
>>> On 9/25/23 10:07 AM, Wenchao Hao wrote:
>>>> On 2023/9/25 22:55, Christoph Hellwig wrote:
>>>>> Before we add another new error handling mechanism we need to fix the
>>>>> old one first.  Hannes' work on not passing the scsi_cmnd to the various
>>>>> reset handlers hasn't made a lot of progress in the last five years and
>>>>> we'll need to urgently fix that first before adding even more
>>>>> complexity.
>>>>>
>>>> I observed Hannes's patches posted about one year ago, it has not been
>>>> applied yet. I don't know if he is still working on it.
>>>>
>>>> My patches do not depend much on that work, I think the conflict can be
>>>> solved fast between two changes.
>>>
>>> I think we want to figure out Hannes's patches first.
>>>
>>> For a new EH design we will want to be able to do multiple TMFs in parallel
>>> on the same host/target right?
>>>
>>
>> It's not necessary to do multiple TMFs in parallel, it's ok to make sure
>> each TMFs do not affect each other.
>>
>> For example, we have two devices: 0:0:0:0 and 0:0:0:1
>>
>> Both of them request device reset, they do not happened in parallel, but
>> would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1
>> just wait 0:0:0:0 to finish.
> 
> I see. I guess we still get the benefit of not having to stop other devices
> when doing TMFs.
> 

Yes, it's better to support multiple TMFs in parallel than just run in serial.
I would wait for Hannes's changes to be applied and send my change again.

> I think we still want a common way to allocate/free and manage resources
> drivers will use during this time. Maybe have a init_device/target/cmd/eh_priv
> and exit_device/target/eh_priv (I'm not sure of the name, but something similar
> to the init_cmd_priv/exit_cmd_priv we have for normal commands.
> 
> scsi-ml then calls into the new eh with the priv data. Drivers don't have to
> do the preallocation and worry if it's per device/target/host.
> 
> I'm not 100% sure about the low level details. Check out how Hannes's is
> handling tag management for TMFs as well.
> 
> 
>>
>>> The problem is that we need to be able to make forward progress in the EH
>>> path and not fail just because we can't allocate memory for a TMF related
>>> struct. To accomplish this now, drivers will use mempools, preallocate TMF
>>> related structs/mem/tags with their scsi_cmnd related structs, preallocate
>>> per host/target/device related structs or ignore what I wrote above and just
>>> fail.
>>>
>>> Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd
>>> when it's not needed. That seems nice because after that, then for your new
>>> EH we can begin to standardize on how to handle preallocation of drivers
>>> resources needed to perform TMFs for your new EH. It could be a per
>>> device/target/host callout to allow drivers to preallocate, then scsi-ml calls
>>> into the drivers with that data. It doesn't have to be exactly like that or
>>> anything close. It would be nice for drivers to not have to think about this
>>> type of thing and scsi-ml just to handle the resource management for us when
>>> there are multiple TMFs in progress.
>>>
>>
>

Wenchao Hao Sept. 27, 2023, 9:41 a.m. UTC | #17

On 2023/9/27 15:59, Hannes Reinecke wrote:
> On 9/26/23 14:57, Wenchao Hao wrote:
>> On 2023/9/26 1:54, Mike Christie wrote:
>>> On 9/25/23 10:07 AM, Wenchao Hao wrote:
>>>> On 2023/9/25 22:55, Christoph Hellwig wrote:
>>>>> Before we add another new error handling mechanism we need to fix the
>>>>> old one first.  Hannes' work on not passing the scsi_cmnd to the various
>>>>> reset handlers hasn't made a lot of progress in the last five years and
>>>>> we'll need to urgently fix that first before adding even more
>>>>> complexity.
>>>>>
>>>> I observed Hannes's patches posted about one year ago, it has not been
>>>> applied yet. I don't know if he is still working on it.
>>>>
>>>> My patches do not depend much on that work, I think the conflict can be
>>>> solved fast between two changes.
>>>
>>> I think we want to figure out Hannes's patches first.
>>>
>>> For a new EH design we will want to be able to do multiple TMFs in parallel
>>> on the same host/target right?
>>>
>>
>> It's not necessary to do multiple TMFs in parallel, it's ok to make sure
>> each TMFs do not affect each other.
>>
>> For example, we have two devices: 0:0:0:0 and 0:0:0:1
>>
>> Both of them request device reset, they do not happened in parallel, but
>> would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1
>> just wait 0:0:0:0 to finish.
>>
> Well, not quite. Any higher-order TMFs are serialized by virtue of SCSI-EH, but command aborts (which also devolve down to TMFs on certain drivers) do run in parallel, and there we will be requiring multiple TMFs.
> 

It's best that multiple  TMFs can run in parallel, again, looking forwarding
to your changes.

> Cheers,
> 
> Hannes

[RFC,v2,00/18] scsi: scsi_error: Introduce new error handle mechanism

Message

Comments