Message ID | 20170421211302.2667649-1-songliubraving@fb.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
> When a device is deleted through sysfs handle "delete", [ ... ]
If I try to use that sysfs attribute then I encounter a deadlock (see
also the report below). How is it possible that you have not hit that
deadlock in your tests?
Bart.
======================================================
[ INFO: possible circular locking dependency detected ]
4.11.0-rc6-dbg+ #3 Tainted: G I
-------------------------------------------------------
bash/7858 is trying to acquire lock:
(&shost->scan_mutex){+.+.+.}, at: [<ffffffff814de090>] scsi_remove_device+0x20/0x40
but task is already holding lock:
(s_active#326){++++.+}, at: [<ffffffff81293e20>] kernfs_remove_self+0xe0/0x140
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (s_active#326){++++.+}:
lock_acquire+0xd5/0x1c0
__kernfs_remove+0x248/0x310
kernfs_remove_by_name_ns+0x45/0xa0
remove_files.isra.1+0x35/0x70
sysfs_remove_group+0x44/0x90
sysfs_remove_groups+0x2e/0x50
device_remove_attrs+0x5e/0x80
device_del+0x1fd/0x320
__scsi_remove_device+0xe9/0x120
scsi_forget_host+0x60/0x70
scsi_remove_host+0x71/0x110
0xffffffffa0703690
process_one_work+0x20b/0x6a0
worker_thread+0x4e/0x4a0
kthread+0x113/0x150
ret_from_fork+0x2e/0x40
-> #0 (&shost->scan_mutex){+.+.+.}:
__lock_acquire+0x1109/0x1280
lock_acquire+0xd5/0x1c0
__mutex_lock+0x83/0x980
mutex_lock_nested+0x1b/0x20
scsi_remove_device+0x20/0x40
sdev_store_delete+0x27/0x30
dev_attr_store+0x18/0x30
sysfs_kf_write+0x45/0x60
kernfs_fop_write+0x13c/0x1c0
__vfs_write+0x28/0x140
vfs_write+0xc8/0x1e0
SyS_write+0x49/0xa0
entry_SYSCALL_64_fastpath+0x18/0xad
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(s_active#326);
lock(&shost->scan_mutex);
lock(s_active#326);
lock(&shost->scan_mutex);
*** DEADLOCK ***
3 locks held by bash/7858:
#0: (sb_writers#4){.+.+.+}, at: [<ffffffff81206b45>] vfs_write+0x195/0x1e0
#1: (&of->mutex){+.+.+.}, at: [<ffffffff81294af6>] kernfs_fop_write+0x106/0x1c0
#2: (s_active#326){++++.+}, at: [<ffffffff81293e20>] kernfs_remove_self+0xe0/0x140
stack backtrace:
CPU: 3 PID: 7858 Comm: bash Tainted: G I 4.11.0-rc6-dbg+ #3
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Call Trace:
dump_stack+0x68/0x93
print_circular_bug+0x1be/0x210
__lock_acquire+0x1109/0x1280
lock_acquire+0xd5/0x1c0
__mutex_lock+0x83/0x980
mutex_lock_nested+0x1b/0x20
scsi_remove_device+0x20/0x40
sdev_store_delete+0x27/0x30
dev_attr_store+0x18/0x30
sysfs_kf_write+0x45/0x60
kernfs_fop_write+0x13c/0x1c0
__vfs_write+0x28/0x140
vfs_write+0xc8/0x1e0
SyS_write+0x49/0xa0
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7fec0f748500
RSP: 002b:00007ffc1ddaec98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007fec0f748500
RDX: 0000000000000002 RSI: 0000000002012aa0 RDI: 0000000000000001
RBP: 00007ffc1ddaebf0 R08: 00007fec0fa0a740 R09: 00007fec10061100
R10: 0000000000000098 R11: 0000000000000246 R12: 00007fec10084bf0
R13: 0000000000000001 R14: 0000000000000000 R15: 00007ffc1ddaec18
On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote: > On the other hand, some devices do long latency IO during deletion, > for example, sd_shutdown() may do sync cache and/or start_stop. > It is not necessary for these commands to run in series. Have you noticed my patch series that makes sd_shutdown() submit the SYNCHRONIZE CACHE command asynchronously? Have you tried whether that patch series would be a good alternative? Thanks, Bart.
> On Apr 21, 2017, at 2:17 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote: > > On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote: >> When a device is deleted through sysfs handle "delete", [ ... ] > > If I try to use that sysfs attribute then I encounter a deadlock (see > also the report below). How is it possible that you have not hit that > deadlock in your tests? > > Bart. Hmm... Seems my test environment (VM) doesn't call the following path __scsi_remove_device+0xe9/0x120 scsi_forget_host+0x60/0x70 scsi_remove_host+0x71/0x110 Let me fix it. Thanks, Song
> On Apr 21, 2017, at 2:20 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote: > > On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote: >> On the other hand, some devices do long latency IO during deletion, >> for example, sd_shutdown() may do sync cache and/or start_stop. >> It is not necessary for these commands to run in series. > > Have you noticed my patch series that makes sd_shutdown() submit the > SYNCHRONIZE CACHE command asynchronously? Have you tried whether that > patch series would be a good alternative? > > Thanks, > > Bart. The asynchronous SYNCHRONIZE CACHE will not help our use case, where the latency comes from sd_start_stop_device(). Seems it is not easy to make the START STOP UNIT command async. Thanks, Song
On Fri, Apr 21, 2017 at 02:13:02PM -0700, Song Liu wrote: > When a device is deleted through sysfs handle "delete", the code > locks shost->scan_mutex. If multiple devices are deleted at the > same time, these deletes will be handled in series. > > On the other hand, some devices do long latency IO during deletion, > for example, sd_shutdown() may do sync cache and/or start_stop. > It is not necessary for these commands to run in series. > > To reduce latency of parallel "delete" requests, this patch reduces > the protection of scan_mutex. The only function with Scsi_Host > called in __scsi_remove_device() is the optional slave_destroy(). > Therefore, the protection of scan_mutex is only necessary for this > function. And I don't think it makes sense for slave_destroy either. Please do a quick audit of the instances and drop the lock for it, too. > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 82dfe07..e7a9e28 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -610,7 +610,7 @@ static int scsi_sdev_check_buf_bit(const char *buf) > return 1; > else if (buf[0] == '0') > return 0; > - else > + else > return -EINVAL; Also the patch has a few odd whitespace changes like this. Please remove those before reposting.
On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote: > When a device is deleted through sysfs handle "delete", the code > locks shost->scan_mutex. If multiple devices are deleted at the > same time, these deletes will be handled in series. > > On the other hand, some devices do long latency IO during deletion, > for example, sd_shutdown() may do sync cache and/or start_stop. > It is not necessary for these commands to run in series. > > To reduce latency of parallel "delete" requests, this patch reduces > the protection of scan_mutex. The only function with Scsi_Host > called in __scsi_remove_device() is the optional slave_destroy(). > Therefore, the protection of scan_mutex is only necessary for this > function. Hello Song, What you wrote above is wrong. It is necessary to serialize SCSI scanning against SCSI device removal. That is why scan_mutex is held around the __scsi_remove_device() call. When adding a LUN the following (and several other) actions are performed: * Allocation of memory for struct scsi_device. * Allocation of a block layer request queue. * Initialization of the .sdev_gendev and .sdev_dev device structures. * Association of the transport layer driver with the SCSI device. * Association of a device handler with the SCSI device (e.g. ALUA). * Making the .sdev_gendev and .sdev_dev devices visible in sysfs. * Making the transport layer attributes visible in sysfs. * Creating a bsg device node in sysfs. * Association of an upper layer driver (e.g. sd or sr) with the SCSI LUN. Removal of a LUN means undoing all of the above. If adding and removing a LUN would not be not serialized then there would be a risk that removal and immediate reregistration of a LUN will fail due to the reregistration process trying to add sysfs attributes that were not yet removed by the removal step. Bart.
> On Apr 25, 2017, at 10:23 AM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote: > > On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote: >> When a device is deleted through sysfs handle "delete", the code >> locks shost->scan_mutex. If multiple devices are deleted at the >> same time, these deletes will be handled in series. >> >> On the other hand, some devices do long latency IO during deletion, >> for example, sd_shutdown() may do sync cache and/or start_stop. >> It is not necessary for these commands to run in series. >> >> To reduce latency of parallel "delete" requests, this patch reduces >> the protection of scan_mutex. The only function with Scsi_Host >> called in __scsi_remove_device() is the optional slave_destroy(). >> Therefore, the protection of scan_mutex is only necessary for this >> function. > > Hello Song, > > What you wrote above is wrong. It is necessary to serialize SCSI > scanning against SCSI device removal. That is why scan_mutex is held > around the __scsi_remove_device() call. When adding a LUN the following > (and several other) actions are performed: > * Allocation of memory for struct scsi_device. > * Allocation of a block layer request queue. > * Initialization of the .sdev_gendev and .sdev_dev device structures. > * Association of the transport layer driver with the SCSI device. > * Association of a device handler with the SCSI device (e.g. ALUA). > * Making the .sdev_gendev and .sdev_dev devices visible in sysfs. > * Making the transport layer attributes visible in sysfs. > * Creating a bsg device node in sysfs. > * Association of an upper layer driver > (e.g. sd or sr) with the SCSI LUN. > > Removal of a LUN means undoing all of the above. If adding and removing > a LUN would not be not serialized then there would be a risk that removal > and immediate reregistration of a LUN will fail due to the reregistration > process trying to add sysfs attributes that were not yet removed by the > removal step. > > Bart. Hello Bart, Thanks a lot for looking into this. I have been studying the code recently. I am wondering whether the following would work: 1. Introduce a new mutex for scsi_device to protect most operations in the list you gathered above; 2. For operations like host->slave_destroy(), ensure they access scsi_host data with host_lock (or another spin lock). I looked into all instances of slave_destroy, only 2 of them: dc395x_slave_destroy() and visorhba_slave_destroy() access scsi_host data without protection of spin lock. 3. Once 1 and 2 is ready, __scsi_remove_device() only need to hold the mutex for the scsi_device. scan_mutex is no longer required. Is this a valid path? Thanks again, Song
On Tue, 2017-04-25 at 17:42 +0000, Song Liu wrote: > I have been studying the code recently. I am wondering whether the following > would work: > > 1. Introduce a new mutex for scsi_device to protect most operations in the > list you gathered above; > > 2. For operations like host->slave_destroy(), ensure they access scsi_host > data with host_lock (or another spin lock). > > I looked into all instances of slave_destroy, only 2 of them: > dc395x_slave_destroy() and visorhba_slave_destroy() access scsi_host data > without protection of spin lock. > > 3. Once 1 and 2 is ready, __scsi_remove_device() only need to hold the mutex > for the scsi_device. scan_mutex is no longer required. > > Is this a valid path? Sorry but I don't think so. Unlocking and reacquiring scan_mutex would create the potential that LUN scanning occurs in the meantime and hence that it fails because LUN removal is incomplete. Bart.
On Fri, 2017-04-21 at 22:31 +0000, Song Liu wrote: > On Apr 21, 2017, at 2:20 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote: > > On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote: > > > On the other hand, some devices do long latency IO during deletion, > > > for example, sd_shutdown() may do sync cache and/or start_stop. > > > It is not necessary for these commands to run in series. > > > > Have you noticed my patch series that makes sd_shutdown() submit the > > SYNCHRONIZE CACHE command asynchronously? Have you tried whether that > > patch series would be a good alternative? > > The asynchronous SYNCHRONIZE CACHE will not help our use case, where the > latency comes from sd_start_stop_device(). Seems it is not easy to make > the START STOP UNIT command async. Hello Song, It should be possible to make the START STOP UNIT command asynchronous too by issuing it asynchronously from inside sd_sync_cache_done(). To avoid dereferencing a stale struct scsi_disk pointer you will either have to hold an additional reference as long as the SYNCHRONIZE CACHE command is in progress or copy the data needed from struct scsi_disk into the SCSI request. Bart.
> On Apr 25, 2017, at 10:52 AM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote: > > On Tue, 2017-04-25 at 17:42 +0000, Song Liu wrote: >> I have been studying the code recently. I am wondering whether the following >> would work: >> >> 1. Introduce a new mutex for scsi_device to protect most operations in the >> list you gathered above; >> >> 2. For operations like host->slave_destroy(), ensure they access scsi_host >> data with host_lock (or another spin lock). >> >> I looked into all instances of slave_destroy, only 2 of them: >> dc395x_slave_destroy() and visorhba_slave_destroy() access scsi_host data >> without protection of spin lock. >> >> 3. Once 1 and 2 is ready, __scsi_remove_device() only need to hold the mutex >> for the scsi_device. scan_mutex is no longer required. >> >> Is this a valid path? > > Sorry but I don't think so. Unlocking and reacquiring scan_mutex would create > the potential that LUN scanning occurs in the meantime and hence that it fails > because LUN removal is incomplete. > Hello Bart, I am not sure I fully understand the problem here. If I understand the logic correctly, when a device is being removed, it will stay in scsi_host->__devices until fully the remove routine is finished. And LUN scanning in parallel will find the device with scsi_device_lookup_by_target(), and thus it would not rescan the device until the device is fully removed? Did I miss anything here? Thanks, Song
> On Apr 25, 2017, at 1:59 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote: > > On Fri, 2017-04-21 at 22:31 +0000, Song Liu wrote: >> On Apr 21, 2017, at 2:20 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote: >>> On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote: >>>> On the other hand, some devices do long latency IO during deletion, >>>> for example, sd_shutdown() may do sync cache and/or start_stop. >>>> It is not necessary for these commands to run in series. >>> >>> Have you noticed my patch series that makes sd_shutdown() submit the >>> SYNCHRONIZE CACHE command asynchronously? Have you tried whether that >>> patch series would be a good alternative? >> >> The asynchronous SYNCHRONIZE CACHE will not help our use case, where the >> latency comes from sd_start_stop_device(). Seems it is not easy to make >> the START STOP UNIT command async. > > Hello Song, > > It should be possible to make the START STOP UNIT command asynchronous too > by issuing it asynchronously from inside sd_sync_cache_done(). To avoid > dereferencing a stale struct scsi_disk pointer you will either have to hold > an additional reference as long as the SYNCHRONIZE CACHE command is in > progress or copy the data needed from struct scsi_disk into the SCSI request. > Hello Bart, As you stated in the patch: blk_cleanup_queue() in __scsi_remove_device() will wait for async SYNCHRONIZE CACHE and START STOP UNIT command to complete. Since __scsi_remove_device() is called with scan_mutex held, simultaneous START STOP UNIT requires are still serialized by the scan_mutex. Song
On Tue, 2017-04-25 at 21:17 +0000, Song Liu wrote: > I am not sure I fully understand the problem here. If I understand the logic > correctly, when a device is being removed, it will stay in scsi_host->__devices > until fully the remove routine is finished. And LUN scanning in parallel will > find the device with scsi_device_lookup_by_target(), and thus it would not > rescan the device until the device is fully removed? Did I miss anything here? Hello Song, The SCSI core is already complicated enough. Please don't complicate it further by making subtle changes to the semantics of scan_mutex. Please also note that I have proposed an alternative, namely to make the START STOP UNIT command asynchronous. Bart.
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 82dfe07..e7a9e28 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -610,7 +610,7 @@ static int scsi_sdev_check_buf_bit(const char *buf) return 1; else if (buf[0] == '0') return 0; - else + else return -EINVAL; } else return -EINVAL; @@ -774,7 +774,7 @@ store_queue_type_field(struct device *dev, struct device_attribute *attr, if (!sdev->tagged_supported) return -EINVAL; - + sdev_printk(KERN_INFO, sdev, "ignoring write to deprecated queue_type attribute"); return count; @@ -1302,8 +1302,11 @@ void __scsi_remove_device(struct scsi_device *sdev) blk_cleanup_queue(sdev->request_queue); cancel_work_sync(&sdev->requeue_work); - if (sdev->host->hostt->slave_destroy) + if (sdev->host->hostt->slave_destroy) { + mutex_lock(&sdev->host->scan_mutex); sdev->host->hostt->slave_destroy(sdev); + mutex_unlock(&sdev->host->scan_mutex); + } transport_destroy_device(dev); /* @@ -1322,11 +1325,7 @@ void __scsi_remove_device(struct scsi_device *sdev) **/ void scsi_remove_device(struct scsi_device *sdev) { - struct Scsi_Host *shost = sdev->host; - - mutex_lock(&shost->scan_mutex); __scsi_remove_device(sdev); - mutex_unlock(&shost->scan_mutex); } EXPORT_SYMBOL(scsi_remove_device);
When a device is deleted through sysfs handle "delete", the code locks shost->scan_mutex. If multiple devices are deleted at the same time, these deletes will be handled in series. On the other hand, some devices do long latency IO during deletion, for example, sd_shutdown() may do sync cache and/or start_stop. It is not necessary for these commands to run in series. To reduce latency of parallel "delete" requests, this patch reduces the protection of scan_mutex. The only function with Scsi_Host called in __scsi_remove_device() is the optional slave_destroy(). Therefore, the protection of scan_mutex is only necessary for this function. Signed-off-by: Song Liu <songliubraving@fb.com> --- drivers/scsi/scsi_sysfs.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)