Message ID | 20210609093631.2557822-1-yuyufen@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] scsi: libsas: add lun number check in .slave_alloc callback | expand |
hi, John On 2021/6/9 17:40, John Garry wrote: > On 09/06/2021 10:36, Yufen Yu wrote: >> We found that offline a ata device on hisi sas control and then >> scanning the host can probe 255 not exist devices into system. >> It can be reproduced easily as following: >> >> Some ata devices on hisi sas v3 control: >> [root@localhost ~]# lsscsi >> [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda >> [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb >> [2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc >> >> 1) echo "offline" > /sys/block/sdb/device/state >> 2) echo "- - -" > /sys/class/scsi_host/host2/scan >> >> Then, we can see another 255 not exist devices in system: >> [root@localhost ~]# lsscsi >> [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda >> [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb >> [2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh >> ... >> [2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb >> >> After REPORT LUN command issued to the offline device fail, it tries >> to do a sequential scan and probe all devices whose lun is not 0 >> successfully. >> >> To fix the problem, we try to do same things as commit 2fc62e2ac350 >> ("[SCSI] libsas: disable scanning lun > 0 on ata devices"), which >> will prevent the device whose lun number is not zero probe into system. >> >> Reported-by: Wu Bo <wubo40@huawei.com> >> Suggested-by: John Garry <john.garry@huawei.com> >> Signed-off-by: Yufen Yu <yuyufen@huawei.com> >> --- >> drivers/scsi/aic94xx/aic94xx_init.c | 1 + >> drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 1 + >> drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 1 + >> drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 + >> drivers/scsi/isci/init.c | 1 + >> drivers/scsi/libsas/sas_scsi_host.c | 9 +++++++++ >> drivers/scsi/mvsas/mv_init.c | 1 + >> drivers/scsi/pm8001/pm8001_init.c | 1 + > > Do we also need to modify aix79xx in a similar fashion? There is no aix79xx in directory drivers/scsi. I guess you mean aic79xxx? But it seems not need to modify. > > I just noticed that libsas.h already has a prototype for sas_slave_alloc() - any idea why? 'git log' shows that commit fa1c1e8f1ece "[SCSI] Add SATA support to libsas" have introduced sas_slave_alloc. And commit 9508a66f898d "[SCSI] libsas: async ata scanning" delete it while forgetting to get rid of the prototype. Yufen, Thanks
On 09/06/2021 13:01, Yufen Yu wrote: >> >> Do we also need to modify aix79xx in a similar fashion? > > There is no aix79xx in directory drivers/scsi. I guess you mean > aic79xxx? But it seems not need to modify. > So if you think that this HBA does not support SATA, then it would be good to mention it in the commit log. Some more comments: On 09/06/2021 10:36, Yufen Yu wrote: > We found that offline a ata device on hisi sas control and then /s/ata/SATA/ > scanning the host can probe 255 not exist devices into system. "can probe 255 non-existant" > It can be reproduced easily as following: > > Some ata devices on hisi sas v3 control: I don't know what this means, so please drop it. > [root@localhost ~]# lsscsi > [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda > [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb > [2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc > > 1) echo "offline" > /sys/block/sdb/device/state > 2) echo "- - -" > /sys/class/scsi_host/host2/scan > > Then, we can see another 255 not exist devices in system: use "non-existant" > [root@localhost ~]# lsscsi > [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda > [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb > [2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh > ... > [2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb > > After REPORT LUN command issued to the offline device fail, it tries > to do a sequential scan and probe all devices whose lun is not 0 > successfully. > > To fix the problem, we try to do same things as commit 2fc62e2ac350 > ("[SCSI] libsas: disable scanning lun > 0 on ata devices"), which > will prevent the device whose lun number is not zero probe into system. > Thanks, John
On 2021/6/9 20:09, John Garry wrote: > On 09/06/2021 13:01, Yufen Yu wrote: >>> >>> Do we also need to modify aix79xx in a similar fashion? >> >> There is no aix79xx in directory drivers/scsi. I guess you mean >> aic79xxx? But it seems not need to modify. >> > > So if you think that this HBA does not support SATA, then it would be good to mention it in the commit log. Maybe I didn't describe it clearly. I am not mean that aic79xxx dose not support SATA. But it is not libsas driver, so I think we don't need to modify it here. > > Some more comments: > > On 09/06/2021 10:36, Yufen Yu wrote: > > We found that offline a ata device on hisi sas control and then > > /s/ata/SATA/ > > > scanning the host can probe 255 not exist devices into system. > > "can probe 255 non-existant" > > > It can be reproduced easily as following: > > > > Some ata devices on hisi sas v3 control: > > I don't know what this means, so please drop it. > > > [root@localhost ~]# lsscsi > > [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda > > [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb > > [2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc > > > > 1) echo "offline" > /sys/block/sdb/device/state > > 2) echo "- - -" > /sys/class/scsi_host/host2/scan > > > > Then, we can see another 255 not exist devices in system: > > use "non-existant" Thanks to point out these error. I will fix it in next version. Thanks, Yufen
在 2021/6/9 17:40, John Garry 写道: > On 09/06/2021 10:36, Yufen Yu wrote: >> We found that offline a ata device on hisi sas control and then >> scanning the host can probe 255 not exist devices into system. >> It can be reproduced easily as following: >> >> Some ata devices on hisi sas v3 control: >> [root@localhost ~]# lsscsi >> [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda >> [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb >> [2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc >> >> 1) echo "offline" > /sys/block/sdb/device/state >> 2) echo "- - -" > /sys/class/scsi_host/host2/scan >> >> Then, we can see another 255 not exist devices in system: >> [root@localhost ~]# lsscsi >> [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda >> [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb >> [2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh >> ... >> [2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb >> >> After REPORT LUN command issued to the offline device fail, it tries >> to do a sequential scan and probe all devices whose lun is not 0 >> successfully. >> >> To fix the problem, we try to do same things as commit 2fc62e2ac350 >> ("[SCSI] libsas: disable scanning lun > 0 on ata devices"), which >> will prevent the device whose lun number is not zero probe into system. >> >> Reported-by: Wu Bo <wubo40@huawei.com> >> Suggested-by: John Garry <john.garry@huawei.com> >> Signed-off-by: Yufen Yu <yuyufen@huawei.com> >> --- >> drivers/scsi/aic94xx/aic94xx_init.c | 1 + >> drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 1 + >> drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 1 + >> drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 + >> drivers/scsi/isci/init.c | 1 + >> drivers/scsi/libsas/sas_scsi_host.c | 9 +++++++++ >> drivers/scsi/mvsas/mv_init.c | 1 + >> drivers/scsi/pm8001/pm8001_init.c | 1 + > > Do we also need to modify aix79xx in a similar fashion? > > I just noticed that libsas.h already has a prototype for > sas_slave_alloc() - any idea why? sas_slave_alloc() was implemented in the history and it was removed in this commit: 9508a66f898d. And it seems the prototype was left over since then.
在 2021/6/9 21:13, John Garry 写道: > >>> I just noticed that libsas.h already has a prototype for >>> sas_slave_alloc() - any idea why? >> >> sas_slave_alloc() was implemented in the history and it was removed in >> this commit: 9508a66f898d. And it seems the prototype was left over >> since then. >> . > > ok, understood. > > So how about backporting this also? I have no idea when the regression > was introduced and prob cannot test as it predates hisi_sas support. > This function before did nothing but initialized the ata port. The commit removed the function just moved the initialization somewhere else. -int sas_slave_alloc(struct scsi_device *scsi_dev) -{ - struct domain_device *dev = sdev_to_domain_dev(scsi_dev); - - if (dev_is_sata(dev)) - return ata_sas_port_init(dev->sata_dev.ap); - - return 0; -} It looks like it's not related to this issue. And I guess it is not a regression. This issue only exists when user do a manual scan and at the same time the device is offlined. Few people will do that actually. > BTW, we also have a dangling prototype for sas_init_ex_attr(), if > someone wants to delete that... > > Thanks, > John > .
diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c index a195bfe9eccc..7a78606598c4 100644 --- a/drivers/scsi/aic94xx/aic94xx_init.c +++ b/drivers/scsi/aic94xx/aic94xx_init.c @@ -53,6 +53,7 @@ static struct scsi_host_template aic94xx_sht = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, #ifdef CONFIG_COMPAT diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c index 3e359ac752fd..15eaac3a4eb6 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c @@ -1771,6 +1771,7 @@ static struct scsi_host_template sht_v1_hw = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, #ifdef CONFIG_COMPAT diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 46f60fc2a069..9df1639ffa65 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -3584,6 +3584,7 @@ static struct scsi_host_template sht_v2_hw = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, #ifdef CONFIG_COMPAT diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c index e95408314078..36ec3901cfd4 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c @@ -3155,6 +3155,7 @@ static struct scsi_host_template sht_v3_hw = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, #ifdef CONFIG_COMPAT diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c index c452849e7bb4..ffd33e5decae 100644 --- a/drivers/scsi/isci/init.c +++ b/drivers/scsi/isci/init.c @@ -167,6 +167,7 @@ static struct scsi_host_template isci_sht = { .eh_abort_handler = sas_eh_abort_handler, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, #ifdef CONFIG_COMPAT diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c index 1bf939818c98..ee44a0d7730b 100644 --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -911,6 +911,14 @@ void sas_task_abort(struct sas_task *task) blk_abort_request(sc->request); } +int sas_slave_alloc(struct scsi_device *sdev) +{ + if (dev_is_sata(sdev_to_domain_dev(sdev)) && sdev->lun) + return -ENXIO; + + return 0; +} + void sas_target_destroy(struct scsi_target *starget) { struct domain_device *found_dev = starget->hostdata; @@ -957,5 +965,6 @@ EXPORT_SYMBOL_GPL(sas_task_abort); EXPORT_SYMBOL_GPL(sas_phy_reset); EXPORT_SYMBOL_GPL(sas_eh_device_reset_handler); EXPORT_SYMBOL_GPL(sas_eh_target_reset_handler); +EXPORT_SYMBOL_GPL(sas_slave_alloc); EXPORT_SYMBOL_GPL(sas_target_destroy); EXPORT_SYMBOL_GPL(sas_ioctl); diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index 6aa2697c4a15..b03c0f35d7b0 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -46,6 +46,7 @@ static struct scsi_host_template mvs_sht = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, #ifdef CONFIG_COMPAT diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c index af09bd282cb9..313248c7bab9 100644 --- a/drivers/scsi/pm8001/pm8001_init.c +++ b/drivers/scsi/pm8001/pm8001_init.c @@ -101,6 +101,7 @@ static struct scsi_host_template pm8001_sht = { .max_sectors = SCSI_DEFAULT_MAX_SECTORS, .eh_device_reset_handler = sas_eh_device_reset_handler, .eh_target_reset_handler = sas_eh_target_reset_handler, + .slave_alloc = sas_slave_alloc, .target_destroy = sas_target_destroy, .ioctl = sas_ioctl, #ifdef CONFIG_COMPAT
We found that offline a ata device on hisi sas control and then scanning the host can probe 255 not exist devices into system. It can be reproduced easily as following: Some ata devices on hisi sas v3 control: [root@localhost ~]# lsscsi [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb [2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc 1) echo "offline" > /sys/block/sdb/device/state 2) echo "- - -" > /sys/class/scsi_host/host2/scan Then, we can see another 255 not exist devices in system: [root@localhost ~]# lsscsi [2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda [2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb [2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh ... [2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb After REPORT LUN command issued to the offline device fail, it tries to do a sequential scan and probe all devices whose lun is not 0 successfully. To fix the problem, we try to do same things as commit 2fc62e2ac350 ("[SCSI] libsas: disable scanning lun > 0 on ata devices"), which will prevent the device whose lun number is not zero probe into system. Reported-by: Wu Bo <wubo40@huawei.com> Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> --- drivers/scsi/aic94xx/aic94xx_init.c | 1 + drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 1 + drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 1 + drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 + drivers/scsi/isci/init.c | 1 + drivers/scsi/libsas/sas_scsi_host.c | 9 +++++++++ drivers/scsi/mvsas/mv_init.c | 1 + drivers/scsi/pm8001/pm8001_init.c | 1 + 8 files changed, 16 insertions(+)