diff mbox series

scsi: add non-sleeping variant of scsi_device_put() and use it in alua

Message ID 20230124120734.15806-1-mwilck@suse.com (mailing list archive)
State Superseded
Headers show
Series scsi: add non-sleeping variant of scsi_device_put() and use it in alua | expand

Commit Message

Martin Wilck Jan. 24, 2023, 12:07 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

Since the might_sleep() annotation was added in scsi_device_put() and
alua_rtpg_queue(), we have seen repeated reports of "BUG: sleeping function
called from invalid context" [1], [2]. alua_rtpg_queue() is always called
from contexts where the caller must hold at least one reference to the
scsi device in question. This means that the reference taken by
alua_rtpg_queue() itself can't be the last one, and thus can be dropped
without entering the code path in which scsi_device_put() might actually
sleep.

Add a new helper function, scsi_device_put_nosleep() for cases like this,
where a device reference is put from atomic context, and at the same time
it is certain that this reference is not the last one, and use it from
alua_rtpg_queue().

[1] https://lore.kernel.org/linux-scsi/b49e37d5-edfb-4c56-3eeb-62c7d5855c00@linux.ibm.com/
[2] https://lore.kernel.org/linux-scsi/55c35e64-a7d4-9072-46fd-e8eae6a90e96@linux.ibm.com/

Fixes: f93ed747e2c7 ("scsi: core: Release SCSI devices synchronously")
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sachin Sant <sachinp@linux.ibm.com>
Cc: Benjamin Block <bblock@linux.ibm.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c |  4 +---
 drivers/scsi/scsi.c                        | 23 ++++++++++++++++++----
 include/scsi/scsi_device.h                 |  1 +
 3 files changed, 21 insertions(+), 7 deletions(-)

Comments

Steffen Maier Jan. 24, 2023, 1:06 p.m. UTC | #1
On 1/24/23 13:07, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Since the might_sleep() annotation was added in scsi_device_put() and
> alua_rtpg_queue(), we have seen repeated reports of "BUG: sleeping function
> called from invalid context" [1], [2]. alua_rtpg_queue() is always called
> from contexts where the caller must hold at least one reference to the
> scsi device in question. This means that the reference taken by
> alua_rtpg_queue() itself can't be the last one, and thus can be dropped
> without entering the code path in which scsi_device_put() might actually
> sleep.
> 
> Add a new helper function, scsi_device_put_nosleep() for cases like this,
> where a device reference is put from atomic context, and at the same time
> it is certain that this reference is not the last one, and use it from
> alua_rtpg_queue().
> 
> [1] https://lore.kernel.org/linux-scsi/b49e37d5-edfb-4c56-3eeb-62c7d5855c00@linux.ibm.com/
> [2] https://lore.kernel.org/linux-scsi/55c35e64-a7d4-9072-46fd-e8eae6a90e96@linux.ibm.com/
> 
> Fixes: f93ed747e2c7 ("scsi: core: Release SCSI devices synchronously")
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Sachin Sant <sachinp@linux.ibm.com>
> Cc: Benjamin Block <bblock@linux.ibm.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Reported-by: Steffen Maier <maier@linux.ibm.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   drivers/scsi/device_handler/scsi_dh_alua.c |  4 +---
>   drivers/scsi/scsi.c                        | 23 ++++++++++++++++++----
>   include/scsi/scsi_device.h                 |  1 +
>   3 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index 49cc18a87473..bdfcea1c16cb 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -989,8 +989,6 @@ static bool alua_rtpg_queue(struct alua_port_group *pg,
>   	int start_queue = 0;
>   	unsigned long flags;
>   
> -	might_sleep();
> -
>   	if (WARN_ON_ONCE(!pg) || scsi_device_get(sdev))
>   		return false;
>   

I think Martin and James have already picked this hunk from Bart's last fixup 
[1] and sent a pull request including it to Linus [2].

At least, your patch did not apply cleanly for me, as I also had removed those 
two lines in my tree already.
Am applying the remaining parts to see how it performs in our CI (although this 
seems to occur seldomly so it'll be hard to tell whether it fixes the "problem").

[1] https://lore.kernel.org/linux-scsi/20230118180557.1212577-1-bvanassche@acm.org/
[2] 
https://lore.kernel.org/linux-scsi/87b5e16ec007de3523fd78534a48d6244bda3f46.camel@HansenPartnership.com/
Steffen Maier Jan. 24, 2023, 1:48 p.m. UTC | #2
On 1/24/23 13:07, mwilck@suse.com wrote:
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 1426b9b03612..eec52bb298a7 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -576,6 +576,24 @@ int scsi_device_get(struct scsi_device *sdev)
>   }
>   EXPORT_SYMBOL(scsi_device_get);
>   
> +/**
> + * scsi_device_put_nosleep  -  release a reference to a scsi_device
> + * @sdev:	device to release a reference on.
> + *
> + * Description: Release a reference to the scsi_device and decrements the use
> + * count of the underlying LLDD module. This function may only be called from
> + * a call context where it is certain that the reference dropped is not the
> + * last one.
> + */
> +void scsi_device_put_nosleep(struct scsi_device *sdev)
> +{
> +	struct module *mod = sdev->host->hostt->module;
> +
> +	put_device(&sdev->sdev_gendev);
> +	module_put(mod);
> +}
> +EXPORT_SYMBOL(scsi_device_put);

+EXPORT_SYMBOL(scsi_device_put_nosleep);


otherwise I get:

>   CC [M]  drivers/scsi/scsi.o
> In file included from ./include/linux/linkage.h:7,
>                  from ./include/linux/preempt.h:10,
>                  from ./arch/s390/include/asm/timex.h:13,
>                  from ./include/linux/timex.h:67,
>                  from ./include/linux/time32.h:13,
>                  from ./include/linux/time.h:60,
>                  from ./include/linux/stat.h:19,
>                  from ./include/linux/module.h:13,
>                  from drivers/scsi/scsi.c:41:
> ./include/linux/export.h:57:43: error: redefinition of ‘__ksymtab_scsi_device_put’
>    57 |         static const struct kernel_symbol __ksymtab_##sym               \
>       |                                           ^~~~~~~~~~
> ./include/linux/export.h:96:9: note: in expansion of macro ‘__KSYMTAB_ENTRY’
>    96 |         __KSYMTAB_ENTRY(sym, sec)
>       |         ^~~~~~~~~~~~~~~
> ./include/linux/export.h:140:41: note: in expansion of macro ‘___EXPORT_SYMBOL’
>   140 | #define __EXPORT_SYMBOL(sym, sec, ns)   ___EXPORT_SYMBOL(sym, sec, ns)
>       |                                         ^~~~~~~~~~~~~~~~
> ./include/linux/export.h:147:41: note: in expansion of macro ‘__EXPORT_SYMBOL’
>   147 | #define _EXPORT_SYMBOL(sym, sec)        __EXPORT_SYMBOL(sym, sec, "")
>       |                                         ^~~~~~~~~~~~~~~
> ./include/linux/export.h:150:41: note: in expansion of macro ‘_EXPORT_SYMBOL’
>   150 | #define EXPORT_SYMBOL(sym)              _EXPORT_SYMBOL(sym, "")
>       |                                         ^~~~~~~~~~~~~~
> drivers/scsi/scsi.c:611:1: note: in expansion of macro ‘EXPORT_SYMBOL’
>   611 | EXPORT_SYMBOL(scsi_device_put);
>       | ^~~~~~~~~~~~~
> ./include/linux/export.h:57:43: note: previous definition of ‘__ksymtab_scsi_device_put’ with type ‘const struct kernel_symbol’
>    57 |         static const struct kernel_symbol __ksymtab_##sym               \
>       |                                           ^~~~~~~~~~
> ./include/linux/export.h:96:9: note: in expansion of macro ‘__KSYMTAB_ENTRY’
>    96 |         __KSYMTAB_ENTRY(sym, sec)
>       |         ^~~~~~~~~~~~~~~
> ./include/linux/export.h:140:41: note: in expansion of macro ‘___EXPORT_SYMBOL’
>   140 | #define __EXPORT_SYMBOL(sym, sec, ns)   ___EXPORT_SYMBOL(sym, sec, ns)
>       |                                         ^~~~~~~~~~~~~~~~
> ./include/linux/export.h:147:41: note: in expansion of macro ‘__EXPORT_SYMBOL’
>   147 | #define _EXPORT_SYMBOL(sym, sec)        __EXPORT_SYMBOL(sym, sec, "")
>       |                                         ^~~~~~~~~~~~~~~
> ./include/linux/export.h:150:41: note: in expansion of macro ‘_EXPORT_SYMBOL’
>   150 | #define EXPORT_SYMBOL(sym)              _EXPORT_SYMBOL(sym, "")
>       |                                         ^~~~~~~~~~~~~~
> drivers/scsi/scsi.c:595:1: note: in expansion of macro ‘EXPORT_SYMBOL’
>   595 | EXPORT_SYMBOL(scsi_device_put);
>       | ^~~~~~~~~~~~~
diff mbox series

Patch

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 49cc18a87473..bdfcea1c16cb 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -989,8 +989,6 @@  static bool alua_rtpg_queue(struct alua_port_group *pg,
 	int start_queue = 0;
 	unsigned long flags;
 
-	might_sleep();
-
 	if (WARN_ON_ONCE(!pg) || scsi_device_get(sdev))
 		return false;
 
@@ -1031,7 +1029,7 @@  static bool alua_rtpg_queue(struct alua_port_group *pg,
 			kref_put(&pg->kref, release_port_group);
 	}
 	if (sdev)
-		scsi_device_put(sdev);
+		scsi_device_put_nosleep(sdev);
 
 	return true;
 }
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 1426b9b03612..eec52bb298a7 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -576,6 +576,24 @@  int scsi_device_get(struct scsi_device *sdev)
 }
 EXPORT_SYMBOL(scsi_device_get);
 
+/**
+ * scsi_device_put_nosleep  -  release a reference to a scsi_device
+ * @sdev:	device to release a reference on.
+ *
+ * Description: Release a reference to the scsi_device and decrements the use
+ * count of the underlying LLDD module. This function may only be called from
+ * a call context where it is certain that the reference dropped is not the
+ * last one.
+ */
+void scsi_device_put_nosleep(struct scsi_device *sdev)
+{
+	struct module *mod = sdev->host->hostt->module;
+
+	put_device(&sdev->sdev_gendev);
+	module_put(mod);
+}
+EXPORT_SYMBOL(scsi_device_put);
+
 /**
  * scsi_device_put  -  release a reference to a scsi_device
  * @sdev:	device to release a reference on.
@@ -586,12 +604,9 @@  EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-	struct module *mod = sdev->host->hostt->module;
-
 	might_sleep();
 
-	put_device(&sdev->sdev_gendev);
-	module_put(mod);
+	scsi_device_put_nosleep(sdev);
 }
 EXPORT_SYMBOL(scsi_device_put);
 
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 3642b8e3928b..85cffa07f2c0 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -365,6 +365,7 @@  void scsi_attach_vpd(struct scsi_device *sdev);
 
 extern struct scsi_device *scsi_device_from_queue(struct request_queue *q);
 extern int __must_check scsi_device_get(struct scsi_device *);
+extern void scsi_device_put_nosleep(struct scsi_device *);
 extern void scsi_device_put(struct scsi_device *);
 extern struct scsi_device *scsi_device_lookup(struct Scsi_Host *,
 					      uint, uint, u64);