Message ID | 20220628175612.2157218-1-bvanassche@acm.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | scsi: core: Call blk_mq_free_tag_set() earlier | expand |
On Tue, Jun 28, 2022 at 10:56:12AM -0700, Bart Van Assche wrote: > There are two .exit_cmd_priv implementations. Both implementations use the > SCSI host pointer. Make sure that the SCSI host pointer is valid when > .exit_cmd_priv is called by moving the .exit_cmd_priv calls from > scsi_device_dev_release() to scsi_forget_host(). Moving > blk_mq_free_tag_set() from scsi_device_dev_release() to scsi_forget_host() > is safe because scsi_forget_host() drains all the request queues that use > the host tag set. This guarantees that no requests are in flight and also > that no new requests will be allocated from the host tag set. Not sure scsi_forget_host really drains all queues since it bypasses sdev which state is SDEV_DEL, so removal for this sdev could be in-progress, not done yet. Thanks, Ming
On 6/28/22 18:17, Ming Lei wrote: > On Tue, Jun 28, 2022 at 10:56:12AM -0700, Bart Van Assche wrote: >> There are two .exit_cmd_priv implementations. Both implementations use the >> SCSI host pointer. Make sure that the SCSI host pointer is valid when >> .exit_cmd_priv is called by moving the .exit_cmd_priv calls from >> scsi_device_dev_release() to scsi_forget_host(). Moving >> blk_mq_free_tag_set() from scsi_device_dev_release() to scsi_forget_host() >> is safe because scsi_forget_host() drains all the request queues that use >> the host tag set. This guarantees that no requests are in flight and also >> that no new requests will be allocated from the host tag set. > > Not sure scsi_forget_host really drains all queues since it bypasses > sdev which state is SDEV_DEL, so removal for this sdev could be > in-progress, not done yet. Ah, that's right. How about making scsi_forget_host() wait until all request activity on the associated queues has stopped, e.g. by replacing the current scsi_forget_host() implementation by the one below? /** * scsi_forget_host() - Remove all SCSI devices from a host. * @shost: SCSI host to remove devices from. * * Removes all SCSI devices that have not yet been removed. For the SCSI devices * for which removal started before scsi_forget_host(), wait until the * associated request queue has reached the "dead" state. In that state it is * guaranteed that no new requests will be allocated and also that no requests * are in progress anymore. */ void scsi_forget_host(struct Scsi_Host *shost) { struct scsi_device *sdev; might_sleep(); restart: spin_lock_irq(shost->host_lock); list_for_each_entry(sdev, &shost->__devices, siblings) { if (sdev->sdev_state == SDEV_DEL && blk_queue_dead(sdev->request_queue)) { continue; } if (sdev->sdev_state == SDEV_DEL) { get_device(&sdev->sdev_gendev); spin_unlock_irq(shost->host_lock); while (!blk_queue_dead(sdev->request_queue)) msleep(10); spin_lock_irq(shost->host_lock); put_device(&sdev->sdev_gendev); goto restart; } spin_unlock_irq(shost->host_lock); __scsi_remove_device(sdev); goto restart; } spin_unlock_irq(shost->host_lock); } Thanks, Bart.
Hi Bart, I'd rather to understand the issue first. On Wed, Jun 29, 2022 at 02:49:27PM -0700, Bart Van Assche wrote: > On 6/28/22 18:17, Ming Lei wrote: > > On Tue, Jun 28, 2022 at 10:56:12AM -0700, Bart Van Assche wrote: > > > There are two .exit_cmd_priv implementations. Both implementations use the > > > SCSI host pointer. Make sure that the SCSI host pointer is valid when > > > .exit_cmd_priv is called by moving the .exit_cmd_priv calls from > > > scsi_device_dev_release() to scsi_forget_host(). Moving .exit_cmd_priv is actually called from scsi_host_dev_release() instead of scsi_device_dev_release(). Both scsi host pointer and host->shost_data is still valid when calling .exit_cmd_priv via scsi_mq_destroy_tags(). Previously I fixed[1] one similar issue, and that is caused by early module unloading, and anywhere host->hostt is referred, the scsi driver module should be prevented from being unloaded. [1] f2b85040acec scsi: core: Put LLD module refcnt after SCSI device is released Thanks, Ming
On Thu, Jun 30, 2022 at 09:01:39AM +0800, Ming Lei wrote: > Hi Bart, > > I'd rather to understand the issue first. > > On Wed, Jun 29, 2022 at 02:49:27PM -0700, Bart Van Assche wrote: > > On 6/28/22 18:17, Ming Lei wrote: > > > On Tue, Jun 28, 2022 at 10:56:12AM -0700, Bart Van Assche wrote: > > > > There are two .exit_cmd_priv implementations. Both implementations use the > > > > SCSI host pointer. Make sure that the SCSI host pointer is valid when > > > > .exit_cmd_priv is called by moving the .exit_cmd_priv calls from > > > > scsi_device_dev_release() to scsi_forget_host(). Moving > > .exit_cmd_priv is actually called from scsi_host_dev_release() instead > of scsi_device_dev_release(). Both scsi host pointer and host->shost_data is > still valid when calling .exit_cmd_priv via scsi_mq_destroy_tags(). > > Previously I fixed[1] one similar issue, and that is caused by early module > unloading, and anywhere host->hostt is referred, the scsi driver module > should be prevented from being unloaded. > > > [1] f2b85040acec scsi: core: Put LLD module refcnt after SCSI device is released Hi Bart, BTW, Changhui reported one very similar issue when running elevator switch/scsi debug LUN hotplug. From Changhui's report, the issue is basically same with what f2b85040acec tried to address, but the try_module_get() in scsi_device_dev_release() may fail, so the scsi_debug module still can be unloaded. The thing is that sdev can be released in async style, and target/host release is triggered by scsi_device_dev_release_usercontext(). So after scsi_host_remove() returns, the shost may still be live from driver core/sysfs viewpoint, and its release handler can be called after the LLD module is unloaded. Then this kind of issue is triggered. Seems there are at least two approaches for fixing the issue: 1) the one suggested in this thread: - moving any reference to shost->hostt in host release handler into scsi_host_remove(), and scsi_mq_destroy_tags()/scsi_proc_hostdir_rm(shost->hostt)() should be covered at least 2) wait until all targets are released in scsi_host_remove() I am fine with either of the two approaches. Bart, please let me know if you are working towards the approach in 1). If not, I have one patch which implements 2). BTW, after either 1) or 2) is done, commit f2b85040acec can be reverted. Thanks, Ming
On 6/29/22 18:01, Ming Lei wrote: > .exit_cmd_priv is actually called from scsi_host_dev_release() instead > of scsi_device_dev_release(). Agreed, I will fix this in the patch description. > Both scsi host pointer and host->shost_data is > still valid when calling .exit_cmd_priv via scsi_mq_destroy_tags(). I will change the patch description to make it clear that both exit_cmd_priv implementations use resources associated with the SCSI host. Thanks, Bart.
On 6/30/22 01:57, Ming Lei wrote: > BTW, Changhui reported one very similar issue when running elevator > switch/scsi debug LUN hotplug. > >>From Changhui's report, the issue is basically same with what > f2b85040acec tried to address, but the try_module_get() in > scsi_device_dev_release() may fail, so the scsi_debug module > still can be unloaded. > > The thing is that sdev can be released in async style, and target/host > release is triggered by scsi_device_dev_release_usercontext(). > > So after scsi_host_remove() returns, the shost may still be live from > driver core/sysfs viewpoint, and its release handler can be called > after the LLD module is unloaded. Then this kind of issue is triggered. > > Seems there are at least two approaches for fixing the issue: > > 1) the one suggested in this thread: > - moving any reference to shost->hostt in host release handler into > scsi_host_remove(), and scsi_mq_destroy_tags()/scsi_proc_hostdir_rm(shost->hostt)() > should be covered at least > > 2) wait until all targets are released in scsi_host_remove() > > I am fine with either of the two approaches. > > Bart, please let me know if you are working towards the approach in 1). > If not, I have one patch which implements 2). > > BTW, after either 1) or 2) is done, commit f2b85040acec can be reverted. Hi Ming, All I need is that all activity on the host tag set has stopped by the time scsi_forget_host() returns. I do not need (1) or (2) to fix the bug reported in the description of the patch at the start of this thread. Thanks, Bart.
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index ef6c0e37acce..74bfa187fe19 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -182,6 +182,8 @@ void scsi_remove_host(struct Scsi_Host *shost) mutex_unlock(&shost->scan_mutex); scsi_proc_host_rm(shost); + scsi_mq_destroy_tags(shost); + spin_lock_irqsave(shost->host_lock, flags); if (scsi_host_set_state(shost, SHOST_DEL)) BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY)); @@ -295,8 +297,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, return error; /* - * Any host allocation in this function will be freed in - * scsi_host_dev_release(). + * Any resources associated with the SCSI host in this function except + * the tag set will be freed by scsi_host_dev_release(). */ out_del_dev: device_del(&shost->shost_dev); @@ -312,6 +314,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, pm_runtime_disable(&shost->shost_gendev); pm_runtime_set_suspended(&shost->shost_gendev); pm_runtime_put_noidle(&shost->shost_gendev); + scsi_mq_destroy_tags(shost); fail: return error; } @@ -345,9 +348,6 @@ static void scsi_host_dev_release(struct device *dev) kfree(dev_name(&shost->shost_dev)); } - if (shost->tag_set.tags) - scsi_mq_destroy_tags(shost); - kfree(shost->shost_data); ida_free(&host_index_ida, shost->host_no); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 6ffc9e4258a8..1aa1a279f8f3 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1990,7 +1990,10 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) void scsi_mq_destroy_tags(struct Scsi_Host *shost) { + if (!shost->tag_set.tags) + return; blk_mq_free_tag_set(&shost->tag_set); + WARN_ON_ONCE(shost->tag_set.tags); } /**