mbox series

[v4,0/7] libsas and drivers: NCQ error handling

Message ID 1663840018-50161-1-git-send-email-john.garry@huawei.com (mailing list archive)
Headers show
Series libsas and drivers: NCQ error handling | expand

Message

John Garry Sept. 22, 2022, 9:46 a.m. UTC
As reported in [0], the pm8001 driver NCQ error handling more or less
duplicates what libata does in link error handling, as follows:
- abort all commands
- do autopsy with read log ext 10 command
- reset the target to recover, if necessary

Indeed for the hisi_sas driver we want to add similar handling for NCQ
errors.

This series add a new libsas API - sas_ata_device_link_abort() - to handle
host NCQ errors, and fixes up pm8001 and hisi_sas drivers to use it.

A difference in the pm8001 driver NCQ error handling is that we send
SATA_ABORT per-task prior to read log ext10, but I feel that this should
not make a difference to the error handling.

Damien kindly tested previous the series for pm8001, but any further pm8001
testing would be appreciated as I have since tweaked pm8001 handling again.
This is because the pm8001 driver hangs on my arm64 machine read log ext10
command.

Finally with these changes we can make the libsas task alloc/free APIs
private, which they should always have been.

Based on mkp-scsi @ 6.1/scsi-staging 7f615c1b5986 ("scsi:
scsi_transport_fc: Use %u for dev_loss_tmo")

[0] https://lore.kernel.org/linux-scsi/8fb3b093-55f0-1fab-81f4-e8519810a978@huawei.com/

Changes since v3:
- Add Damien's tags (thanks)
- Modify hisi_sas processing as follows:
  - use sas_task_abort() for rejected IO
  - Modify abort task processing to issue softreset in certain circumstances
- rebase

Changes since v2:
- Stop sending SATA_ABORT all for pm8001 handling
- Make "reset" optional in sas_ata_device_link_abort()
- Drop Jack's ACK

Changes since v1:
- Rename sas_ata_link_abort() -> sas_ata_device_link_abort()
- Set EH RESET flag in sas_ata_device_link_abort()
- Add Jack's Ack tags
- Rebase

John Garry (5):
  scsi: libsas: Add sas_ata_device_link_abort()
  scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()
  scsi: pm8001: Modify task abort handling for SATA task
  scsi: pm8001: Use sas_ata_device_link_abort() to handle NCQ errors
  scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private

Xingui Yang (2):
  scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw
  scsi: hisi_sas: Modify v3 HW SATA disk error state completion
    processing

 drivers/scsi/hisi_sas/hisi_sas.h       |   1 +
 drivers/scsi/hisi_sas/hisi_sas_main.c  |  26 +++-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  53 ++++++-
 drivers/scsi/libsas/sas_ata.c          |  12 ++
 drivers/scsi/libsas/sas_init.c         |   3 -
 drivers/scsi/libsas/sas_internal.h     |   4 +
 drivers/scsi/pm8001/pm8001_hwi.c       | 188 ++++---------------------
 drivers/scsi/pm8001/pm8001_sas.c       |   8 ++
 drivers/scsi/pm8001/pm8001_sas.h       |   4 -
 drivers/scsi/pm8001/pm80xx_hwi.c       | 177 +++--------------------
 include/scsi/libsas.h                  |   4 -
 include/scsi/sas_ata.h                 |   6 +
 12 files changed, 143 insertions(+), 343 deletions(-)

Comments

Martin K. Petersen Sept. 25, 2022, 6:02 p.m. UTC | #1
Hi John!

> Based on mkp-scsi @ 6.1/scsi-staging 7f615c1b5986 ("scsi:
> scsi_transport_fc: Use %u for dev_loss_tmo")

Can you please rebase on top of the latest staging? There are a couple
of pm8001 conflicts.

Thanks!
John Garry Sept. 26, 2022, 10:27 a.m. UTC | #2
On 25/09/2022 19:02, Martin K. Petersen wrote:

Hi Martin,

>> Based on mkp-scsi @ 6.1/scsi-staging 7f615c1b5986 ("scsi:
>> scsi_transport_fc: Use %u for dev_loss_tmo")
> Can you please rebase on top of the latest staging? There are a couple
> of pm8001 conflicts.

Sorry about that, I did test that it applied ok but I did not test 
building it again. Anyway, this is the only issue I saw:

drivers/scsi/pm8001/pm8001_hwi.c: In function ‘pm8001_mpi_task_abort_resp’:
drivers/scsi/pm8001/pm8001_hwi.c:3520:15: error: ‘pm8001_dev’ undeclared 
(first use in this function); did you mean ‘pm8001_dbg’?
  3520 |   atomic_dec(&pm8001_dev->running_req);
       |               ^~~~~~~~~~
       |               pm8001_dbg

Was there another issue?

Thanks,
John