diff mbox series

scsi: core: Fix the scsi_device_put() might_sleep annotation

Message ID 20230125194311.249553-1-bvanassche@acm.org (mailing list archive)
State Accepted
Headers show
Series scsi: core: Fix the scsi_device_put() might_sleep annotation | expand

Commit Message

Bart Van Assche Jan. 25, 2023, 7:43 p.m. UTC
Although most calls of scsi_device_put() happen from non-atomic context,
alua_rtpg_queue() calls this function from atomic context if
alua_rtpg_queue() itself is called from atomic context. alua_rtpg_queue()
is always called from contexts where the caller must hold at least one
reference to the scsi device in question. This means that the reference
taken by alua_rtpg_queue() itself can't be the last one, and thus can be
dropped without entering the code path in which scsi_device_put() might
actually sleep. Hence move the might_sleep() annotation from
scsi_device_put() into scsi_device_dev_release().

[1] https://lore.kernel.org/linux-scsi/b49e37d5-edfb-4c56-3eeb-62c7d5855c00@linux.ibm.com/
[2] https://lore.kernel.org/linux-scsi/55c35e64-a7d4-9072-46fd-e8eae6a90e96@linux.ibm.com/

Note: a significant part of the above description was written by Martin
Wilck.

Fixes: f93ed747e2c7 ("scsi: core: Release SCSI devices synchronously")
Cc: Martin Wilck <mwilck@suse.com>
Cc: Steffen Maier <maier@linux.ibm.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sachin Sant <sachinp@linux.ibm.com>
Cc: Benjamin Block <bblock@linux.ibm.com>
Reported-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi.c       | 2 --
 drivers/scsi/scsi_sysfs.c | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Martin Wilck Jan. 25, 2023, 8:03 p.m. UTC | #1
On Wed, 2023-01-25 at 11:43 -0800, Bart Van Assche wrote:
> Although most calls of scsi_device_put() happen from non-atomic
> context,
> alua_rtpg_queue() calls this function from atomic context if
> alua_rtpg_queue() itself is called from atomic context.
> alua_rtpg_queue()
> is always called from contexts where the caller must hold at least
> one
> reference to the scsi device in question. This means that the
> reference
> taken by alua_rtpg_queue() itself can't be the last one, and thus can
> be
> dropped without entering the code path in which scsi_device_put()
> might
> actually sleep. Hence move the might_sleep() annotation from
> scsi_device_put() into scsi_device_dev_release().
> 
> [1]
> https://lore.kernel.org/linux-scsi/b49e37d5-edfb-4c56-3eeb-62c7d5855c00@linux.ibm.com/
> [2]
> https://lore.kernel.org/linux-scsi/55c35e64-a7d4-9072-46fd-e8eae6a90e96@linux.ibm.com/
> 
> Note: a significant part of the above description was written by
> Martin
> Wilck.
> 
> Fixes: f93ed747e2c7 ("scsi: core: Release SCSI devices
> synchronously")
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Steffen Maier <maier@linux.ibm.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Sachin Sant <sachinp@linux.ibm.com>
> Cc: Benjamin Block <bblock@linux.ibm.com>
> Reported-by: Steffen Maier <maier@linux.ibm.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Martin Wilck <mwilck@suse.com>
Martin K. Petersen Jan. 27, 2023, 3:22 a.m. UTC | #2
On Wed, 25 Jan 2023 11:43:11 -0800, Bart Van Assche wrote:

> Although most calls of scsi_device_put() happen from non-atomic context,
> alua_rtpg_queue() calls this function from atomic context if
> alua_rtpg_queue() itself is called from atomic context. alua_rtpg_queue()
> is always called from contexts where the caller must hold at least one
> reference to the scsi device in question. This means that the reference
> taken by alua_rtpg_queue() itself can't be the last one, and thus can be
> dropped without entering the code path in which scsi_device_put() might
> actually sleep. Hence move the might_sleep() annotation from
> scsi_device_put() into scsi_device_dev_release().
> 
> [...]

Applied to 6.2/scsi-fixes, thanks!

[1/1] scsi: core: Fix the scsi_device_put() might_sleep annotation
      https://git.kernel.org/mkp/scsi/c/2542fc9578d4
Christoph Hellwig Jan. 30, 2023, 7:58 a.m. UTC | #3
On Thu, Jan 26, 2023 at 10:22:12PM -0500, Martin K. Petersen wrote:
> On Wed, 25 Jan 2023 11:43:11 -0800, Bart Van Assche wrote:
> 
> > Although most calls of scsi_device_put() happen from non-atomic context,
> > alua_rtpg_queue() calls this function from atomic context if
> > alua_rtpg_queue() itself is called from atomic context. alua_rtpg_queue()
> > is always called from contexts where the caller must hold at least one
> > reference to the scsi device in question. This means that the reference
> > taken by alua_rtpg_queue() itself can't be the last one, and thus can be
> > dropped without entering the code path in which scsi_device_put() might
> > actually sleep. Hence move the might_sleep() annotation from
> > scsi_device_put() into scsi_device_dev_release().
> > 
> > [...]
> 
> Applied to 6.2/scsi-fixes, thanks!

This is a really bad idea.  Instead of actually catching scsi_device_put
put from a wrong context all the time, it now limits to the final put
and thus making the annotation a lot less useful.
Martin Wilck Jan. 30, 2023, 5:04 p.m. UTC | #4
On Sun, 2023-01-29 at 23:58 -0800, Christoph Hellwig wrote:
> On Thu, Jan 26, 2023 at 10:22:12PM -0500, Martin K. Petersen wrote:
> > On Wed, 25 Jan 2023 11:43:11 -0800, Bart Van Assche wrote:
> > 
> > > Although most calls of scsi_device_put() happen from non-atomic
> > > context,
> > > alua_rtpg_queue() calls this function from atomic context if
> > > alua_rtpg_queue() itself is called from atomic context.
> > > alua_rtpg_queue()
> > > is always called from contexts where the caller must hold at
> > > least one
> > > reference to the scsi device in question. This means that the
> > > reference
> > > taken by alua_rtpg_queue() itself can't be the last one, and thus
> > > can be
> > > dropped without entering the code path in which scsi_device_put()
> > > might
> > > actually sleep. Hence move the might_sleep() annotation from
> > > scsi_device_put() into scsi_device_dev_release().
> > > 
> > > [...]
> > 
> > Applied to 6.2/scsi-fixes, thanks!
> 
> This is a really bad idea.  Instead of actually catching
> scsi_device_put
> put from a wrong context all the time, it now limits to the final put
> and thus making the annotation a lot less useful.

The other idea that we discussed was this:

https://lore.kernel.org/linux-scsi/53321793-fede-f84e-260a-9d6bc0bc2b6c@acm.org/T/#t

That approach would still catch other wrong call contexts (i.e. from
outside the ALUA code). But it requires adding a new exported function
in scsi.c. Would you prefer that, or do you have something completely
different in mind?

Idea #3 was to use a deferred call (e.g. a workqueue) for dropping the
additional ref that had been takein in alua_rtpg_queue().

Thanks,
Martin
diff mbox series

Patch

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 1426b9b03612..9feb0323bc44 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -588,8 +588,6 @@  void scsi_device_put(struct scsi_device *sdev)
 {
 	struct module *mod = sdev->host->hostt->module;
 
-	might_sleep();
-
 	put_device(&sdev->sdev_gendev);
 	module_put(mod);
 }
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 981d1bab2120..8ef9a5494340 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -451,6 +451,8 @@  static void scsi_device_dev_release(struct device *dev)
 	struct scsi_vpd *vpd_pgb0 = NULL, *vpd_pgb1 = NULL, *vpd_pgb2 = NULL;
 	unsigned long flags;
 
+	might_sleep();
+
 	scsi_dh_release_device(sdev);
 
 	parent = sdev->sdev_gendev.parent;