Message ID | 20230125194311.249553-1-bvanassche@acm.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | scsi: core: Fix the scsi_device_put() might_sleep annotation | expand |
On Wed, 2023-01-25 at 11:43 -0800, Bart Van Assche wrote: > Although most calls of scsi_device_put() happen from non-atomic > context, > alua_rtpg_queue() calls this function from atomic context if > alua_rtpg_queue() itself is called from atomic context. > alua_rtpg_queue() > is always called from contexts where the caller must hold at least > one > reference to the scsi device in question. This means that the > reference > taken by alua_rtpg_queue() itself can't be the last one, and thus can > be > dropped without entering the code path in which scsi_device_put() > might > actually sleep. Hence move the might_sleep() annotation from > scsi_device_put() into scsi_device_dev_release(). > > [1] > https://lore.kernel.org/linux-scsi/b49e37d5-edfb-4c56-3eeb-62c7d5855c00@linux.ibm.com/ > [2] > https://lore.kernel.org/linux-scsi/55c35e64-a7d4-9072-46fd-e8eae6a90e96@linux.ibm.com/ > > Note: a significant part of the above description was written by > Martin > Wilck. > > Fixes: f93ed747e2c7 ("scsi: core: Release SCSI devices > synchronously") > Cc: Martin Wilck <mwilck@suse.com> > Cc: Steffen Maier <maier@linux.ibm.com> > Cc: Hannes Reinecke <hare@suse.de> > Cc: Sachin Sant <sachinp@linux.ibm.com> > Cc: Benjamin Block <bblock@linux.ibm.com> > Reported-by: Steffen Maier <maier@linux.ibm.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Martin Wilck <mwilck@suse.com>
On Wed, 25 Jan 2023 11:43:11 -0800, Bart Van Assche wrote: > Although most calls of scsi_device_put() happen from non-atomic context, > alua_rtpg_queue() calls this function from atomic context if > alua_rtpg_queue() itself is called from atomic context. alua_rtpg_queue() > is always called from contexts where the caller must hold at least one > reference to the scsi device in question. This means that the reference > taken by alua_rtpg_queue() itself can't be the last one, and thus can be > dropped without entering the code path in which scsi_device_put() might > actually sleep. Hence move the might_sleep() annotation from > scsi_device_put() into scsi_device_dev_release(). > > [...] Applied to 6.2/scsi-fixes, thanks! [1/1] scsi: core: Fix the scsi_device_put() might_sleep annotation https://git.kernel.org/mkp/scsi/c/2542fc9578d4
On Thu, Jan 26, 2023 at 10:22:12PM -0500, Martin K. Petersen wrote: > On Wed, 25 Jan 2023 11:43:11 -0800, Bart Van Assche wrote: > > > Although most calls of scsi_device_put() happen from non-atomic context, > > alua_rtpg_queue() calls this function from atomic context if > > alua_rtpg_queue() itself is called from atomic context. alua_rtpg_queue() > > is always called from contexts where the caller must hold at least one > > reference to the scsi device in question. This means that the reference > > taken by alua_rtpg_queue() itself can't be the last one, and thus can be > > dropped without entering the code path in which scsi_device_put() might > > actually sleep. Hence move the might_sleep() annotation from > > scsi_device_put() into scsi_device_dev_release(). > > > > [...] > > Applied to 6.2/scsi-fixes, thanks! This is a really bad idea. Instead of actually catching scsi_device_put put from a wrong context all the time, it now limits to the final put and thus making the annotation a lot less useful.
On Sun, 2023-01-29 at 23:58 -0800, Christoph Hellwig wrote: > On Thu, Jan 26, 2023 at 10:22:12PM -0500, Martin K. Petersen wrote: > > On Wed, 25 Jan 2023 11:43:11 -0800, Bart Van Assche wrote: > > > > > Although most calls of scsi_device_put() happen from non-atomic > > > context, > > > alua_rtpg_queue() calls this function from atomic context if > > > alua_rtpg_queue() itself is called from atomic context. > > > alua_rtpg_queue() > > > is always called from contexts where the caller must hold at > > > least one > > > reference to the scsi device in question. This means that the > > > reference > > > taken by alua_rtpg_queue() itself can't be the last one, and thus > > > can be > > > dropped without entering the code path in which scsi_device_put() > > > might > > > actually sleep. Hence move the might_sleep() annotation from > > > scsi_device_put() into scsi_device_dev_release(). > > > > > > [...] > > > > Applied to 6.2/scsi-fixes, thanks! > > This is a really bad idea. Instead of actually catching > scsi_device_put > put from a wrong context all the time, it now limits to the final put > and thus making the annotation a lot less useful. The other idea that we discussed was this: https://lore.kernel.org/linux-scsi/53321793-fede-f84e-260a-9d6bc0bc2b6c@acm.org/T/#t That approach would still catch other wrong call contexts (i.e. from outside the ALUA code). But it requires adding a new exported function in scsi.c. Would you prefer that, or do you have something completely different in mind? Idea #3 was to use a deferred call (e.g. a workqueue) for dropping the additional ref that had been takein in alua_rtpg_queue(). Thanks, Martin
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 1426b9b03612..9feb0323bc44 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -588,8 +588,6 @@ void scsi_device_put(struct scsi_device *sdev) { struct module *mod = sdev->host->hostt->module; - might_sleep(); - put_device(&sdev->sdev_gendev); module_put(mod); } diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 981d1bab2120..8ef9a5494340 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -451,6 +451,8 @@ static void scsi_device_dev_release(struct device *dev) struct scsi_vpd *vpd_pgb0 = NULL, *vpd_pgb1 = NULL, *vpd_pgb2 = NULL; unsigned long flags; + might_sleep(); + scsi_dh_release_device(sdev); parent = sdev->sdev_gendev.parent;
Although most calls of scsi_device_put() happen from non-atomic context, alua_rtpg_queue() calls this function from atomic context if alua_rtpg_queue() itself is called from atomic context. alua_rtpg_queue() is always called from contexts where the caller must hold at least one reference to the scsi device in question. This means that the reference taken by alua_rtpg_queue() itself can't be the last one, and thus can be dropped without entering the code path in which scsi_device_put() might actually sleep. Hence move the might_sleep() annotation from scsi_device_put() into scsi_device_dev_release(). [1] https://lore.kernel.org/linux-scsi/b49e37d5-edfb-4c56-3eeb-62c7d5855c00@linux.ibm.com/ [2] https://lore.kernel.org/linux-scsi/55c35e64-a7d4-9072-46fd-e8eae6a90e96@linux.ibm.com/ Note: a significant part of the above description was written by Martin Wilck. Fixes: f93ed747e2c7 ("scsi: core: Release SCSI devices synchronously") Cc: Martin Wilck <mwilck@suse.com> Cc: Steffen Maier <maier@linux.ibm.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Sachin Sant <sachinp@linux.ibm.com> Cc: Benjamin Block <bblock@linux.ibm.com> Reported-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/scsi.c | 2 -- drivers/scsi/scsi_sysfs.c | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-)