Message ID | 20220602225113.10218-1-tyler.erickson@seagate.com (mailing list archive) |
---|---|
Headers | show |
Series | ata,sd: Fix reading concurrent positioning ranges | expand |
On 6/3/22 07:51, Tyler Erickson wrote: > This patch series fixes reading the concurrent positioning ranges. > > The first patch fixes reading this in libata, where it was reading > more data than a drive necessarily supports, resulting in a > command abort. > > The second patch fixes the SCSI translated data to put the VPD page > length in the correct starting byte. > > The third patch in sd, the fix is adding 4 instead of 3 for the header > length. Using 3 will always result in an error and was likely used > incorrectly as T10 specifications list all tables starting at byte 0, > and byte 3 is the page length, which would mean there are 4 total > bytes before the remaining data that contains the ranges and other > information. > > Tyler Erickson (3): > libata: fix reading concurrent positioning ranges log > libata: fix translation of concurrent positioning ranges > scsi: sd: Fix interpretation of VPD B9h length > > drivers/ata/libata-core.c | 21 +++++++++++++-------- > drivers/ata/libata-scsi.c | 2 +- > drivers/scsi/sd.c | 2 +- > 3 files changed, 15 insertions(+), 10 deletions(-) > > > base-commit: 700170bf6b4d773e328fa54ebb70ba444007c702 Looks all good to me. I tested this and really wonder how I did not catch these mistakes earlier :) Using a tcmu emulator for various concurrent positioning range configs to test, I got a lockdep splat when unplugging the drive: [ 760.702859] ====================================================== [ 760.702863] WARNING: possible circular locking dependency detected [ 760.702868] 5.18.0+ #1509 Not tainted [ 760.702875] ------------------------------------------------------ [...] [ 760.702966] the existing dependency chain (in reverse order) is: [ 760.702969] [ 760.702969] -> #1 (&q->sysfs_lock){+.+.}-{3:3}: [ 760.702982] __mutex_lock+0x15b/0x1480 [ 760.702998] blk_ia_range_sysfs_show+0x41/0xc0 [ 760.703010] sysfs_kf_seq_show+0x1f2/0x360 [ 760.703022] seq_read_iter+0x40f/0x1130 [ 760.703036] new_sync_read+0x2d8/0x520 [ 760.703049] vfs_read+0x31a/0x450 [ 760.703060] ksys_read+0xf7/0x1d0 [ 760.703070] do_syscall_64+0x34/0x80 [ 760.703081] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 760.703093] [ 760.703093] -> #0 (kn->active#385){++++}-{0:0}: [ 760.703108] __lock_acquire+0x2ba6/0x6a20 [ 760.703125] lock_acquire+0x19f/0x510 [ 760.703136] __kernfs_remove+0x739/0x940 [ 760.703145] kernfs_remove_by_name_ns+0x90/0xe0 [ 760.703154] remove_files+0x8c/0x1a0 [ 760.703165] sysfs_remove_group+0x77/0x150 [ 760.703175] sysfs_remove_groups+0x4f/0x90 [ 760.703186] __kobject_del+0x7d/0x1b0 [ 760.703196] kobject_del+0x31/0x50 [ 760.703203] disk_unregister_independent_access_ranges+0x153/0x290 [ 760.703214] blk_unregister_queue+0x166/0x210 [ 760.703226] del_gendisk+0x2f8/0x7c0 [ 760.703233] sd_remove+0x5e/0xb0 [sd_mod] [ 760.703252] device_release_driver_internal+0x3ad/0x750 [ 760.703262] bus_remove_device+0x2a6/0x570 [ 760.703269] device_del+0x48f/0xb50 [ 760.703280] __scsi_remove_device+0x21b/0x2b0 [scsi_mod] [ 760.703339] scsi_remove_device+0x3a/0x50 [scsi_mod] [ 760.703391] tcm_loop_port_unlink+0xca/0x160 [tcm_loop] [ 760.703407] target_fabric_port_unlink+0xd5/0x120 [target_core_mod] [ 760.703494] configfs_unlink+0x37f/0x7a0 [ 760.703502] vfs_unlink+0x295/0x800 [ 760.703514] do_unlinkat+0x2d9/0x560 [ 760.703520] __x64_sys_unlink+0xa5/0xf0 [ 760.703528] do_syscall_64+0x34/0x80 [ 760.703537] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 760.703548] [ 760.703548] other info that might help us debug this: [ 760.703548] [ 760.703551] Possible unsafe locking scenario: [ 760.703551] [ 760.703554] CPU0 CPU1 [ 760.703556] ---- ---- [ 760.703558] lock(&q->sysfs_lock); [ 760.703565] lock(kn->active#385); [ 760.703573] lock(&q->sysfs_lock); [ 760.703579] lock(kn->active#385); [ 760.703587] [ 760.703587] *** DEADLOCK *** This needs to be checked too, but that is not related to your fixes. I will queue the libata patches for rc1 update. Martin, Do you want to take patch 3 or should I just take it ?
On Fri, Jun 03, 2022 at 10:30:04AM +0900, Damien Le Moal wrote: > Looks all good to me. I tested this and really wonder how I did not catch > these mistakes earlier :) > > Using a tcmu emulator for various concurrent positioning range configs to > test, I got a lockdep splat when unplugging the drive: You probably want something like this: --- From 4340b85be3532149310326b5f0caf329e1f4c748 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Mon, 23 May 2022 09:18:44 +0200 Subject: block: don't take sysfs_lock in blk_ia_range_sysfs_show sysfs already synchronizes internally against kobject removal, so remove the extra lock. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-ia-ranges.c | 8 +------- include/linux/blkdev.h | 1 - 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/block/blk-ia-ranges.c b/block/blk-ia-ranges.c index 853be76b9808b..e9e7ebf02737f 100644 --- a/block/blk-ia-ranges.c +++ b/block/blk-ia-ranges.c @@ -54,13 +54,8 @@ static ssize_t blk_ia_range_sysfs_show(struct kobject *kobj, container_of(attr, struct blk_ia_range_sysfs_entry, attr); struct blk_independent_access_range *iar = container_of(kobj, struct blk_independent_access_range, kobj); - ssize_t ret; - mutex_lock(&iar->queue->sysfs_lock); - ret = entry->show(iar, buf); - mutex_unlock(&iar->queue->sysfs_lock); - - return ret; + return entry->show(iar, buf); } static const struct sysfs_ops blk_ia_range_sysfs_ops = { @@ -149,7 +144,6 @@ int disk_register_independent_access_ranges(struct gendisk *disk, } for (i = 0; i < iars->nr_ia_ranges; i++) { - iars->ia_range[i].queue = q; ret = kobject_init_and_add(&iars->ia_range[i].kobj, &blk_ia_range_ktype, &iars->kobj, "%d", i); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index a72203ed25454..0ceb85ca52af4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -348,7 +348,6 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev, */ struct blk_independent_access_range { struct kobject kobj; - struct request_queue *queue; sector_t sector; sector_t nr_sectors; };
On 6/3/22 14:23, Christoph Hellwig wrote: > On Fri, Jun 03, 2022 at 10:30:04AM +0900, Damien Le Moal wrote: >> Looks all good to me. I tested this and really wonder how I did not catch >> these mistakes earlier :) >> >> Using a tcmu emulator for various concurrent positioning range configs to >> test, I got a lockdep splat when unplugging the drive: > > You probably want something like this: > > --- > From 4340b85be3532149310326b5f0caf329e1f4c748 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig <hch@lst.de> > Date: Mon, 23 May 2022 09:18:44 +0200 > Subject: block: don't take sysfs_lock in blk_ia_range_sysfs_show > > sysfs already synchronizes internally against kobject removal, so remove > the extra lock. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > block/blk-ia-ranges.c | 8 +------- > include/linux/blkdev.h | 1 - > 2 files changed, 1 insertion(+), 8 deletions(-) > > diff --git a/block/blk-ia-ranges.c b/block/blk-ia-ranges.c > index 853be76b9808b..e9e7ebf02737f 100644 > --- a/block/blk-ia-ranges.c > +++ b/block/blk-ia-ranges.c > @@ -54,13 +54,8 @@ static ssize_t blk_ia_range_sysfs_show(struct kobject *kobj, > container_of(attr, struct blk_ia_range_sysfs_entry, attr); > struct blk_independent_access_range *iar = > container_of(kobj, struct blk_independent_access_range, kobj); > - ssize_t ret; > > - mutex_lock(&iar->queue->sysfs_lock); > - ret = entry->show(iar, buf); > - mutex_unlock(&iar->queue->sysfs_lock); > - > - return ret; > + return entry->show(iar, buf); I sent a patch that Jens applied doing exactly that. But forgot to add the removal of the queue pointer you have below. Sending an incremental patch for that. > } > > static const struct sysfs_ops blk_ia_range_sysfs_ops = { > @@ -149,7 +144,6 @@ int disk_register_independent_access_ranges(struct gendisk *disk, > } > > for (i = 0; i < iars->nr_ia_ranges; i++) { > - iars->ia_range[i].queue = q; > ret = kobject_init_and_add(&iars->ia_range[i].kobj, > &blk_ia_range_ktype, &iars->kobj, > "%d", i); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index a72203ed25454..0ceb85ca52af4 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -348,7 +348,6 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev, > */ > struct blk_independent_access_range { > struct kobject kobj; > - struct request_queue *queue; > sector_t sector; > sector_t nr_sectors; > };
On Thu, 2 Jun 2022 16:51:10 -0600, Tyler Erickson wrote: > This patch series fixes reading the concurrent positioning ranges. > > The first patch fixes reading this in libata, where it was reading > more data than a drive necessarily supports, resulting in a > command abort. > > The second patch fixes the SCSI translated data to put the VPD page > length in the correct starting byte. > > [...] Applied to 5.19/scsi-fixes, thanks! [3/3] scsi: sd: Fix interpretation of VPD B9h length https://git.kernel.org/mkp/scsi/c/f92de9d11042
On 6/3/22 07:51, Tyler Erickson wrote: > This patch series fixes reading the concurrent positioning ranges. > > The first patch fixes reading this in libata, where it was reading > more data than a drive necessarily supports, resulting in a > command abort. > > The second patch fixes the SCSI translated data to put the VPD page > length in the correct starting byte. > > The third patch in sd, the fix is adding 4 instead of 3 for the header > length. Using 3 will always result in an error and was likely used > incorrectly as T10 specifications list all tables starting at byte 0, > and byte 3 is the page length, which would mean there are 4 total > bytes before the remaining data that contains the ranges and other > information. > > Tyler Erickson (3): > libata: fix reading concurrent positioning ranges log > libata: fix translation of concurrent positioning ranges > scsi: sd: Fix interpretation of VPD B9h length Applied 1-2 to for-5.19-fixes. Thanks ! > > drivers/ata/libata-core.c | 21 +++++++++++++-------- > drivers/ata/libata-scsi.c | 2 +- > drivers/scsi/sd.c | 2 +- > 3 files changed, 15 insertions(+), 10 deletions(-) > > > base-commit: 700170bf6b4d773e328fa54ebb70ba444007c702