diff mbox series

[v3,1/2] scsi: core: Make sure that hosts outlive targets and devices

Message ID 20220707182122.3797-2-bvanassche@acm.org (mailing list archive)
State Superseded
Headers show
Series Call blk_mq_free_tag_set() earlier | expand

Commit Message

Bart Van Assche July 7, 2022, 6:21 p.m. UTC
From: Ming Lei <ming.lei@redhat.com>

Fix the race conditions between SCSI LLD kernel module unloading and SCSI
device and target removal by making sure that SCSI hosts are destroyed after
all associated target and device objects have been freed.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ bvanassche: Reworked Ming's patch ]
---
 drivers/scsi/hosts.c      | 9 +++++++++
 drivers/scsi/scsi.c       | 9 ++++++---
 drivers/scsi/scsi_scan.c  | 7 +++++++
 drivers/scsi/scsi_sysfs.c | 9 ---------
 include/scsi/scsi_host.h  | 3 +++
 5 files changed, 25 insertions(+), 12 deletions(-)

Comments

Mike Christie July 9, 2022, 3:57 p.m. UTC | #1
On 7/7/22 1:21 PM, Bart Van Assche wrote:
> From: Ming Lei <ming.lei@redhat.com>
> 
> Fix the race conditions between SCSI LLD kernel module unloading and SCSI
> device and target removal by making sure that SCSI hosts are destroyed after
> all associated target and device objects have been freed.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: John Garry <john.garry@huawei.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> [ bvanassche: Reworked Ming's patch ]
> ---
>  drivers/scsi/hosts.c      | 9 +++++++++
>  drivers/scsi/scsi.c       | 9 ++++++---
>  drivers/scsi/scsi_scan.c  | 7 +++++++
>  drivers/scsi/scsi_sysfs.c | 9 ---------
>  include/scsi/scsi_host.h  | 3 +++
>  5 files changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index ef6c0e37acce..e0a56a8f1f74 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -168,6 +168,7 @@ void scsi_remove_host(struct Scsi_Host *shost)
>  
>  	mutex_lock(&shost->scan_mutex);
>  	spin_lock_irqsave(shost->host_lock, flags);
> +	/* Prevent that new SCSI targets or SCSI devices are added. */
>  	if (scsi_host_set_state(shost, SHOST_CANCEL))
>  		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) {
>  			spin_unlock_irqrestore(shost->host_lock, flags);
> @@ -190,6 +191,13 @@ void scsi_remove_host(struct Scsi_Host *shost)
>  	transport_unregister_device(&shost->shost_gendev);
>  	device_unregister(&shost->shost_dev);
>  	device_del(&shost->shost_gendev);
> +
> +	/*
> +	 * After scsi_remove_host() has returned the scsi LLD module can be
> +	 * unloaded and/or the host resources can be released. Hence wait until
> +	 * the dependent SCSI targets and devices are gone before returning.
> +	 */
> +	wait_event(shost->targets_wq, atomic_read(&shost->target_count) == 0);


Do we still have a possible use after free at the target removal level?

If you have a driver supports multiple targets and target removal (any of
he FC ones, HW iscsi, etc), then you can still hit:

1. thread1 does sysfs device delete. It's now waiting in blk_cleanup_queue
which is waiting on a cmd that has the SCSI error handler running on it or
for whatever reason.

2. thread2 decides to delete the target (dev loss tmo or user request). That
hits __scsi_remove_device for the device in #1 and sees it's in SDEV_DEL
state so it returns.

3. scsi_remove_target returns. The transport/driver then frees it's rport/session
for that target.

4. The thread1 in then makes progress in the EH callback and wants to reference
the rport/session struct we deleted in #3.

The drivers want to know that after scsi_remove_target has returned that nothing
is using devices under it similar to the scsi_remove_host case.

Every scsi_device has a scsi_target as a parent (scsi_device -> scsi_target) or
grandparent (scsi_device -> transport class struct like rport/session ->
scsi_target) now right. I was thinking maybe like 20 years ago when scsi_forget_host
was made we didn't?

If so, could we move what are doing in this patch down a level? Put the wait in
scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count
you have a scsi_target sdev_count.

scsi_forget_host would then need to loop over the targets and do
scsi_target_remove on them instead of doing it at the scsi_device level.
Bart Van Assche July 12, 2022, 10:29 p.m. UTC | #2
On 7/9/22 08:57, Mike Christie wrote:
> If so, could we move what are doing in this patch down a level? Put the wait in
> scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count
> you have a scsi_target sdev_count.
> 
> scsi_forget_host would then need to loop over the targets and do
> scsi_target_remove on them instead of doing it at the scsi_device level.

Hi Mike,

Thanks for having taken a look.

Could the approach outlined above have user-visible side effects?

A different approach has been selected for v4 of this patch series. Further feedback
is welcome.

Thanks,

Bart.
Mike Christie July 14, 2022, 3:32 p.m. UTC | #3
On 7/12/22 5:29 PM, Bart Van Assche wrote:
> On 7/9/22 08:57, Mike Christie wrote:
>> If so, could we move what are doing in this patch down a level? Put the wait in
>> scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count
>> you have a scsi_target sdev_count.
>>
>> scsi_forget_host would then need to loop over the targets and do
>> scsi_target_remove on them instead of doing it at the scsi_device level.
> 
> Hi Mike,
> 
> Thanks for having taken a look.
> 
> Could the approach outlined above have user-visible side effects?
> 

What kind of scenario are you thinking about?

I think we would have similar behavior today for the target behavior:

1. With the current code, if there is no sysfs deletion going on and
scsi_remove_target's call to __scsi_remove_device is the one that blocks
on blk_cleanup_queue then the target removal would wait.


2. With the current code we have:

	1. syfs deletion runs scsi_remove_target -> scsi_remove_device and
that takes the scan_mutex.
	2. scsi_remove_target passed the SDEV_DEL check and is calling
scsi_remove_device which is waiting on the scan_mutex.

So here scsi_remove_target waits for the deletion similar to what I described
above.

My suggestion just handles the case where

1. syfs deletion runs scsi_remove_target -> scsi_remove_device and
that takes the scan_mutex.

It then sets the state to SDEV_DEL.

2. scsi_remove_target runs and see the device is in the SDEV_DEL state.
It skips the device and then we return from scsi_remove_target without
having waited on that device's removal.




> A different approach has been selected for v4 of this patch series. Further feedback
> is welcome.
> 
> Thanks,
> 
> Bart.
Mike Christie July 14, 2022, 3:40 p.m. UTC | #4
On 7/14/22 10:32 AM, Mike Christie wrote:
> On 7/12/22 5:29 PM, Bart Van Assche wrote:
>> On 7/9/22 08:57, Mike Christie wrote:
>>> If so, could we move what are doing in this patch down a level? Put the wait in
>>> scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count
>>> you have a scsi_target sdev_count.
>>>
>>> scsi_forget_host would then need to loop over the targets and do
>>> scsi_target_remove on them instead of doing it at the scsi_device level.
>>
>> Hi Mike,
>>
>> Thanks for having taken a look.
>>
>> Could the approach outlined above have user-visible side effects?
>>
> 
> What kind of scenario are you thinking about?
> 
> I think we would have similar behavior today for the target behavior:
> 
> 1. With the current code, if there is no sysfs deletion going on and
> scsi_remove_target's call to __scsi_remove_device is the one that blocks
> on blk_cleanup_queue then the target removal would wait.
> 
> 
> 2. With the current code we have:
> 
> 	1. syfs deletion runs scsi_remove_target -> scsi_remove_device and
> that takes the scan_mutex.

Meant sdev_store_delete -> scsi_remove_device

> 	2. scsi_remove_target passed the SDEV_DEL check and is calling

Here I did mean scsi_remove_target. It's being called from something like
the fc or iscsi layer's transport threads.

> scsi_remove_device which is waiting on the scan_mutex.
> 
> So here scsi_remove_target waits for the deletion similar to what I described
> above.
> 
> My suggestion just handles the case where
> 
> 1. syfs deletion runs scsi_remove_target -> scsi_remove_device and
> that takes the scan_mutex.

Meant sdev_store_delete -> scsi_remove_device

> 
> It then sets the state to SDEV_DEL.
> 
> 2. scsi_remove_target runs and see the device is in the SDEV_DEL state.


Here I did mean scsi_remove_target. It's being called from something like
the fc or iscsi layer's transport threads.

> It skips the device and then we return from scsi_remove_target without
> having waited on that device's removal.
> 
> 
> 
> 
>> A different approach has been selected for v4 of this patch series. Further feedback
>> is welcome.
>>
>> Thanks,
>>
>> Bart.
>
Bart Van Assche July 14, 2022, 7:04 p.m. UTC | #5
On 7/9/22 08:57, Mike Christie wrote:
> If so, could we move what are doing in this patch down a level? Put the wait in
> scsi_remove_target and wake in scsi_target_dev_release. Instead of a target_count
> you have a scsi_target sdev_count.
> 
> scsi_forget_host would then need to loop over the targets and do
> scsi_target_remove on them instead of doing it at the scsi_device level.

I'm not sure how to implement the above since scsi_remove_target() skips 
targets that have one of the states STARGET_DEL, STARGET_REMOVE or 
STARGET_CREATED_REMOVE?

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index ef6c0e37acce..e0a56a8f1f74 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -168,6 +168,7 @@  void scsi_remove_host(struct Scsi_Host *shost)
 
 	mutex_lock(&shost->scan_mutex);
 	spin_lock_irqsave(shost->host_lock, flags);
+	/* Prevent that new SCSI targets or SCSI devices are added. */
 	if (scsi_host_set_state(shost, SHOST_CANCEL))
 		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)) {
 			spin_unlock_irqrestore(shost->host_lock, flags);
@@ -190,6 +191,13 @@  void scsi_remove_host(struct Scsi_Host *shost)
 	transport_unregister_device(&shost->shost_gendev);
 	device_unregister(&shost->shost_dev);
 	device_del(&shost->shost_gendev);
+
+	/*
+	 * After scsi_remove_host() has returned the scsi LLD module can be
+	 * unloaded and/or the host resources can be released. Hence wait until
+	 * the dependent SCSI targets and devices are gone before returning.
+	 */
+	wait_event(shost->targets_wq, atomic_read(&shost->target_count) == 0);
 }
 EXPORT_SYMBOL(scsi_remove_host);
 
@@ -394,6 +402,7 @@  struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	INIT_LIST_HEAD(&shost->starved_list);
 	init_waitqueue_head(&shost->host_wait);
 	mutex_init(&shost->scan_mutex);
+	init_waitqueue_head(&shost->targets_wq);
 
 	index = ida_alloc(&host_index_ida, GFP_KERNEL);
 	if (index < 0) {
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index c59eac7a32f2..086ec5b5862d 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -586,10 +586,13 @@  EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-	struct module *mod = sdev->host->hostt->module;
-
+	/*
+	 * Decreasing the module reference count before the device reference
+	 * count is safe since scsi_remove_host() only returns after all
+	 * devices have been removed.
+	 */
+	module_put(sdev->host->hostt->module);
 	put_device(&sdev->sdev_gendev);
-	module_put(mod);
 }
 EXPORT_SYMBOL(scsi_device_put);
 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 91ac901a6682..00c5754f0081 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -406,9 +406,14 @@  static void scsi_target_destroy(struct scsi_target *starget)
 static void scsi_target_dev_release(struct device *dev)
 {
 	struct device *parent = dev->parent;
+	struct Scsi_Host *shost = dev_to_shost(parent);
 	struct scsi_target *starget = to_scsi_target(dev);
 
 	kfree(starget);
+
+	if (atomic_dec_return(&shost->target_count) == 0)
+		wake_up(&shost->targets_wq);
+
 	put_device(parent);
 }
 
@@ -521,6 +526,8 @@  static struct scsi_target *scsi_alloc_target(struct device *parent,
 	starget->state = STARGET_CREATED;
 	starget->scsi_level = SCSI_2;
 	starget->max_target_blocked = SCSI_DEFAULT_TARGET_BLOCKED;
+	atomic_inc(&shost->target_count);
+
  retry:
 	spin_lock_irqsave(shost->host_lock, flags);
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 43949798a2e4..72bf5131e9d8 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -450,12 +450,9 @@  static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	struct scsi_vpd *vpd_pg0 = NULL, *vpd_pg89 = NULL;
 	struct scsi_vpd *vpd_pgb0 = NULL, *vpd_pgb1 = NULL, *vpd_pgb2 = NULL;
 	unsigned long flags;
-	struct module *mod;
 
 	sdev = container_of(work, struct scsi_device, ew.work);
 
-	mod = sdev->host->hostt->module;
-
 	scsi_dh_release_device(sdev);
 
 	parent = sdev->sdev_gendev.parent;
@@ -518,17 +515,11 @@  static void scsi_device_dev_release_usercontext(struct work_struct *work)
 
 	if (parent)
 		put_device(parent);
-	module_put(mod);
 }
 
 static void scsi_device_dev_release(struct device *dev)
 {
 	struct scsi_device *sdp = to_scsi_device(dev);
-
-	/* Set module pointer as NULL in case of module unloading */
-	if (!try_module_get(sdp->host->hostt->module))
-		sdp->host->hostt->module = NULL;
-
 	execute_in_process_context(scsi_device_dev_release_usercontext,
 				   &sdp->ew);
 }
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 667d889b92b5..339f975d356e 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -689,6 +689,9 @@  struct Scsi_Host {
 	/* ldm bits */
 	struct device		shost_gendev, shost_dev;
 
+	atomic_t		target_count;
+	wait_queue_head_t	targets_wq;
+
 	/*
 	 * Points to the transport data (if any) which is allocated
 	 * separately