diff mbox series

[v5,6/7] module: Improve support for asynchronous module exit code

Message ID 20220914225621.415631-7-bvanassche@acm.org (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Bart Van Assche Sept. 14, 2022, 10:56 p.m. UTC
Some kernel modules call device_del() from their module exit code and
schedule asynchronous work from inside the .release callback without waiting
until that callback has finished. As an example, many SCSI LLD drivers call
scsi_remove_host() from their module exit code. scsi_remove_host() may
invoke scsi_device_dev_release_usercontext() asynchronously.
scsi_device_dev_release_usercontext() uses the host template pointer and
that pointer usually exists in static storage in the SCSI LLD. Support
using the module reference count to keep the module around until
asynchronous module exiting has completed by waiting in the delete_module()
system call until the module reference count drops to zero.

The following debug patch has been used to make the new wait_event()
call wait:

Comments

Bart Van Assche Sept. 20, 2022, 5:13 p.m. UTC | #1
On 9/14/22 15:56, Bart Van Assche wrote:
> Some kernel modules call device_del() from their module exit code and
> schedule asynchronous work from inside the .release callback without waiting
> until that callback has finished. As an example, many SCSI LLD drivers call
> scsi_remove_host() from their module exit code. scsi_remove_host() may
> invoke scsi_device_dev_release_usercontext() asynchronously.
> scsi_device_dev_release_usercontext() uses the host template pointer and
> that pointer usually exists in static storage in the SCSI LLD. Support
> using the module reference count to keep the module around until
> asynchronous module exiting has completed by waiting in the delete_module()
> system call until the module reference count drops to zero.

Hi Luis,

I'd like to know your opinion about this patch since you are the 
maintainer of the kernel module system.

Thanks,

Bart.
Luis Chamberlain Sept. 28, 2022, 12:02 a.m. UTC | #2
On Tue, Sep 20, 2022 at 10:13:40AM -0700, Bart Van Assche wrote:
> On 9/14/22 15:56, Bart Van Assche wrote:
> > Some kernel modules call device_del() from their module exit code and
> > schedule asynchronous work from inside the .release callback without waiting
> > until that callback has finished. As an example, many SCSI LLD drivers call
> > scsi_remove_host() from their module exit code. scsi_remove_host() may
> > invoke scsi_device_dev_release_usercontext() asynchronously.
> > scsi_device_dev_release_usercontext() uses the host template pointer and
> > that pointer usually exists in static storage in the SCSI LLD. Support
> > using the module reference count to keep the module around until
> > asynchronous module exiting has completed by waiting in the delete_module()
> > system call until the module reference count drops to zero.
> 
> Hi Luis,
> 
> I'd like to know your opinion about this patch since you are the maintainer
> of the kernel module system.

See this patch which extends the documentation of try_module_get():

https://lkml.kernel.org/r/20211029184500.2821444-7-mcgrof@kernel.org

You can ignore discussion around the thread as sadly it is just
irrelevant stuff not about that patch. But the logic it spells out
is still true.

So, in short, using try_module_get() on exit is actually the wrong
thing to do and it is no surprise it would fail. I haven't gotten
yet around to reviewing Mauro's driver API which let's you unbind
drivers, but it sounds related so I CC'd you on that.

So I'd like to ask instead if an alternative to using try_module_get()
on exit would be better here and for the future.

  Luis
Ming Lei Sept. 28, 2022, 1:09 a.m. UTC | #3
On Wed, Sep 14, 2022 at 03:56:20PM -0700, Bart Van Assche wrote:
> Some kernel modules call device_del() from their module exit code and
> schedule asynchronous work from inside the .release callback without waiting
> until that callback has finished. As an example, many SCSI LLD drivers call

It isn't only related with device, any kobject has such issue, or any
reference counter usage has similar potential risk, see previous discussion:

https://lore.kernel.org/lkml/YsZm7lSXYAHT14ui@T590/

IMO, it is one fundamental problem wrt. module vs. reference counting or
kobject uses at least, since the callback depends on module code
segment.

> scsi_remove_host() from their module exit code. scsi_remove_host() may
> invoke scsi_device_dev_release_usercontext() asynchronously.
> scsi_device_dev_release_usercontext() uses the host template pointer and
> that pointer usually exists in static storage in the SCSI LLD. Support
> using the module reference count to keep the module around until
> asynchronous module exiting has completed by waiting in the delete_module()
> system call until the module reference count drops to zero.

The issue can't be addressed by the normal mod->refcnt, since user need
to unload module when the device isn't used.


thanks,
Ming
Bart Van Assche Sept. 28, 2022, 6:17 p.m. UTC | #4
On 9/27/22 17:02, Luis Chamberlain wrote:
> On Tue, Sep 20, 2022 at 10:13:40AM -0700, Bart Van Assche wrote:
>> On 9/14/22 15:56, Bart Van Assche wrote:
>>> Some kernel modules call device_del() from their module exit code and
>>> schedule asynchronous work from inside the .release callback without waiting
>>> until that callback has finished. As an example, many SCSI LLD drivers call
>>> scsi_remove_host() from their module exit code. scsi_remove_host() may
>>> invoke scsi_device_dev_release_usercontext() asynchronously.
>>> scsi_device_dev_release_usercontext() uses the host template pointer and
>>> that pointer usually exists in static storage in the SCSI LLD. Support
>>> using the module reference count to keep the module around until
>>> asynchronous module exiting has completed by waiting in the delete_module()
>>> system call until the module reference count drops to zero.
>>
>> Hi Luis,
>>
>> I'd like to know your opinion about this patch since you are the maintainer
>> of the kernel module system.
> 
> See this patch which extends the documentation of try_module_get():
> 
> https://lkml.kernel.org/r/20211029184500.2821444-7-mcgrof@kernel.org
> 
> You can ignore discussion around the thread as sadly it is just
> irrelevant stuff not about that patch. But the logic it spells out
> is still true.
> 
> So, in short, using try_module_get() on exit is actually the wrong
> thing to do and it is no surprise it would fail. I haven't gotten
> yet around to reviewing Mauro's driver API which let's you unbind
> drivers, but it sounds related so I CC'd you on that.
> 
> So I'd like to ask instead if an alternative to using try_module_get()
> on exit would be better here and for the future.

Hi Luis,

The extended documentation of try_module_get() is very helpful. But 
please note that this patch is not related to try_module_get() at all. 
See also patch 7/7 in this series 
(https://lore.kernel.org/linux-scsi/20220914225621.415631-8-bvanassche@acm.org/).

Thanks,

Bart.
Bart Van Assche Sept. 28, 2022, 7:27 p.m. UTC | #5
On 9/27/22 18:09, Ming Lei wrote:
> On Wed, Sep 14, 2022 at 03:56:20PM -0700, Bart Van Assche wrote:
>> Some kernel modules call device_del() from their module exit code and
>> schedule asynchronous work from inside the .release callback without waiting
>> until that callback has finished. As an example, many SCSI LLD drivers call
> 
> It isn't only related with device, any kobject has such issue, or any
> reference counter usage has similar potential risk, see previous discussion:
> 
> https://lore.kernel.org/lkml/YsZm7lSXYAHT14ui@T590/
> 
> IMO, it is one fundamental problem wrt. module vs. reference counting or
> kobject uses at least, since the callback depends on module code
> segment.
> 
>> scsi_remove_host() from their module exit code. scsi_remove_host() may
>> invoke scsi_device_dev_release_usercontext() asynchronously.
>> scsi_device_dev_release_usercontext() uses the host template pointer and
>> that pointer usually exists in static storage in the SCSI LLD. Support
>> using the module reference count to keep the module around until
>> asynchronous module exiting has completed by waiting in the delete_module()
>> system call until the module reference count drops to zero.
> 
> The issue can't be addressed by the normal mod->refcnt, since user need
> to unload module when the device isn't used.

Hi Ming,

How about removing support for calling scsi_device_put() from atomic context
as is done in the untested patch below?

Thanks,

Bart.

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index c59eac7a32f2..661753a10b47 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -561,6 +561,8 @@ EXPORT_SYMBOL(scsi_report_opcode);
   */
  int scsi_device_get(struct scsi_device *sdev)
  {
+	might_sleep();
+
  	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
  		goto fail;
  	if (!get_device(&sdev->sdev_gendev))
@@ -588,6 +590,7 @@ void scsi_device_put(struct scsi_device *sdev)
  {
  	struct module *mod = sdev->host->hostt->module;

+	might_sleep();
  	put_device(&sdev->sdev_gendev);
  	module_put(mod);
  }
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index a3aaafdeac1d..4cfc9317b4ad 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -441,7 +441,7 @@ static void scsi_device_cls_release(struct device *class_dev)
  	put_device(&sdev->sdev_gendev);
  }

-static void scsi_device_dev_release_usercontext(struct work_struct *work)
+static void scsi_device_dev_release(struct device *dev)
  {
  	struct scsi_device *sdev;
  	struct device *parent;
@@ -450,11 +450,8 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
  	struct scsi_vpd *vpd_pg0 = NULL, *vpd_pg89 = NULL;
  	struct scsi_vpd *vpd_pgb0 = NULL, *vpd_pgb1 = NULL, *vpd_pgb2 = NULL;
  	unsigned long flags;
-	struct module *mod;
-
-	sdev = container_of(work, struct scsi_device, ew.work);

-	mod = sdev->host->hostt->module;
+	sdev = to_scsi_device(dev);

  	parent = sdev->sdev_gendev.parent;

@@ -516,19 +513,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)

  	if (parent)
  		put_device(parent);
-	module_put(mod);
-}
-
-static void scsi_device_dev_release(struct device *dev)
-{
-	struct scsi_device *sdp = to_scsi_device(dev);
-
-	/* Set module pointer as NULL in case of module unloading */
-	if (!try_module_get(sdp->host->hostt->module))
-		sdp->host->hostt->module = NULL;
-
-	execute_in_process_context(scsi_device_dev_release_usercontext,
-				   &sdp->ew);
  }

  static struct class sdev_class = {
Ming Lei Sept. 29, 2022, 1:10 a.m. UTC | #6
On Wed, Sep 28, 2022 at 12:27:07PM -0700, Bart Van Assche wrote:
> On 9/27/22 18:09, Ming Lei wrote:
> > On Wed, Sep 14, 2022 at 03:56:20PM -0700, Bart Van Assche wrote:
> > > Some kernel modules call device_del() from their module exit code and
> > > schedule asynchronous work from inside the .release callback without waiting
> > > until that callback has finished. As an example, many SCSI LLD drivers call
> > 
> > It isn't only related with device, any kobject has such issue, or any
> > reference counter usage has similar potential risk, see previous discussion:
> > 
> > https://lore.kernel.org/lkml/YsZm7lSXYAHT14ui@T590/
> > 
> > IMO, it is one fundamental problem wrt. module vs. reference counting or
> > kobject uses at least, since the callback depends on module code
> > segment.
> > 
> > > scsi_remove_host() from their module exit code. scsi_remove_host() may
> > > invoke scsi_device_dev_release_usercontext() asynchronously.
> > > scsi_device_dev_release_usercontext() uses the host template pointer and
> > > that pointer usually exists in static storage in the SCSI LLD. Support
> > > using the module reference count to keep the module around until
> > > asynchronous module exiting has completed by waiting in the delete_module()
> > > system call until the module reference count drops to zero.
> > 
> > The issue can't be addressed by the normal mod->refcnt, since user need
> > to unload module when the device isn't used.
> 
> Hi Ming,
> 
> How about removing support for calling scsi_device_put() from atomic context
> as is done in the untested patch below?

That can't work.

The problem is that no existed mechanism can guarantee that kobject reference
drops to zero inside module_exit().


Thanks,
Ming
Bart Van Assche Sept. 29, 2022, 5:27 p.m. UTC | #7
On 9/28/22 18:10, Ming Lei wrote:
> On Wed, Sep 28, 2022 at 12:27:07PM -0700, Bart Van Assche wrote:
>> How about removing support for calling scsi_device_put() from atomic context
>> as is done in the untested patch below?
> 
> That can't work.
> 
> The problem is that no existed mechanism can guarantee that kobject reference
> drops to zero inside module_exit().

Hi Ming,

I agree that the patch in my previous email won't address potential calls of
.release functions while a module is being unloaded or after a module has been
unloaded. However, that's not the purpose of that patch. The purpose of that
patch is to rework all code that modifies members of the scsi host template.

Thanks,

Bart.
Luis Chamberlain Sept. 30, 2022, 7:39 p.m. UTC | #8
On Wed, Sep 28, 2022 at 11:17:02AM -0700, Bart Van Assche wrote:
> On 9/27/22 17:02, Luis Chamberlain wrote:
> > On Tue, Sep 20, 2022 at 10:13:40AM -0700, Bart Van Assche wrote:
> > > On 9/14/22 15:56, Bart Van Assche wrote:
> > > > Some kernel modules call device_del() from their module exit code and
> > > > schedule asynchronous work from inside the .release callback without waiting
> > > > until that callback has finished. As an example, many SCSI LLD drivers call
> > > > scsi_remove_host() from their module exit code. scsi_remove_host() may
> > > > invoke scsi_device_dev_release_usercontext() asynchronously.
> > > > scsi_device_dev_release_usercontext() uses the host template pointer and
> > > > that pointer usually exists in static storage in the SCSI LLD. Support
> > > > using the module reference count to keep the module around until
> > > > asynchronous module exiting has completed by waiting in the delete_module()
> > > > system call until the module reference count drops to zero.
> > > 
> > > Hi Luis,
> > > 
> > > I'd like to know your opinion about this patch since you are the maintainer
> > > of the kernel module system.
> > 
> > See this patch which extends the documentation of try_module_get():
> > 
> > https://lkml.kernel.org/r/20211029184500.2821444-7-mcgrof@kernel.org
> > 
> > You can ignore discussion around the thread as sadly it is just
> > irrelevant stuff not about that patch. But the logic it spells out
> > is still true.
> > 
> > So, in short, using try_module_get() on exit is actually the wrong
> > thing to do and it is no surprise it would fail. I haven't gotten
> > yet around to reviewing Mauro's driver API which let's you unbind
> > drivers, but it sounds related so I CC'd you on that.
> > 
> > So I'd like to ask instead if an alternative to using try_module_get()
> > on exit would be better here and for the future.
> 
> Hi Luis,
> 
> The extended documentation of try_module_get() is very helpful. But please
> note that this patch is not related to try_module_get() at all. See also
> patch 7/7 in this series (https://lore.kernel.org/linux-scsi/20220914225621.415631-8-bvanassche@acm.org/).

I cannot see how this patch set is no way related to try_module_get()
given the 7/7 patch you posted replaces try_module_get() with __module_get().
My point, and hint, is that the original construct that added try_module_get()
on removal was flawed and I'm not sure trying to expand on that idea would
or even *should* be fruitful given the issues / tribal knowledge I tried
extending documentation for.

It would beg the question if instead re-evaluating the goal could be
done in such a way that the new documentation I suggested on try_module_get()
would be seriously taken into account.

  Luis
Luis Chamberlain Oct. 3, 2022, 11:56 p.m. UTC | #9
On Fri, Sep 30, 2022 at 12:39:27PM -0700, Luis Chamberlain wrote:
> On Wed, Sep 28, 2022 at 11:17:02AM -0700, Bart Van Assche wrote:
> > On 9/27/22 17:02, Luis Chamberlain wrote:
> > > On Tue, Sep 20, 2022 at 10:13:40AM -0700, Bart Van Assche wrote:
> > > > On 9/14/22 15:56, Bart Van Assche wrote:
> > > > > Some kernel modules call device_del() from their module exit code and
> > > > > schedule asynchronous work from inside the .release callback without waiting
> > > > > until that callback has finished. As an example, many SCSI LLD drivers call
> > > > > scsi_remove_host() from their module exit code. scsi_remove_host() may
> > > > > invoke scsi_device_dev_release_usercontext() asynchronously.
> > > > > scsi_device_dev_release_usercontext() uses the host template pointer and
> > > > > that pointer usually exists in static storage in the SCSI LLD. Support
> > > > > using the module reference count to keep the module around until
> > > > > asynchronous module exiting has completed by waiting in the delete_module()
> > > > > system call until the module reference count drops to zero.
> > > > 
> > > > Hi Luis,
> > > > 
> > > > I'd like to know your opinion about this patch since you are the maintainer
> > > > of the kernel module system.
> > > 
> > > See this patch which extends the documentation of try_module_get():
> > > 
> > > https://lkml.kernel.org/r/20211029184500.2821444-7-mcgrof@kernel.org
> > > 
> > > You can ignore discussion around the thread as sadly it is just
> > > irrelevant stuff not about that patch. But the logic it spells out
> > > is still true.
> > > 
> > > So, in short, using try_module_get() on exit is actually the wrong
> > > thing to do and it is no surprise it would fail. I haven't gotten
> > > yet around to reviewing Mauro's driver API which let's you unbind
> > > drivers, but it sounds related so I CC'd you on that.
> > > 
> > > So I'd like to ask instead if an alternative to using try_module_get()
> > > on exit would be better here and for the future.
> > 
> > Hi Luis,
> > 
> > The extended documentation of try_module_get() is very helpful. But please
> > note that this patch is not related to try_module_get() at all. See also
> > patch 7/7 in this series (https://lore.kernel.org/linux-scsi/20220914225621.415631-8-bvanassche@acm.org/).
> 
> I cannot see how this patch set is no way related to try_module_get()
> given the 7/7 patch you posted replaces try_module_get() with __module_get().
> My point, and hint, is that the original construct that added try_module_get()
> on removal was flawed and I'm not sure trying to expand on that idea would
> or even *should* be fruitful given the issues / tribal knowledge I tried
> extending documentation for.
> 
> It would beg the question if instead re-evaluating the goal could be
> done in such a way that the new documentation I suggested on try_module_get()
> would be seriously taken into account.

Yeah I've gone ahead and re-read your original patch again the issue
with that is it waits *once* for the refcnt to go to 0, but that does
not forbit it from going back up, at which point you have a race which
can still create the situation. Every subsystem is different, but for
instance simply running a loop opening a device block file should
suffice to bump the refcnt of a respective block driver module. So
at least the patch itself won't ultimately address this issue I'm
afraid.

  Luis
Bart Van Assche Oct. 4, 2022, 12:24 a.m. UTC | #10
On 10/3/22 16:56, Luis Chamberlain wrote:
> Yeah I've gone ahead and re-read your original patch again the issue
> with that is it waits *once* for the refcnt to go to 0, but that does
> not forbit it from going back up, at which point you have a race which
> can still create the situation. Every subsystem is different, but for
> instance simply running a loop opening a device block file should
> suffice to bump the refcnt of a respective block driver module. So
> at least the patch itself won't ultimately address this issue I'm
> afraid.

Hi Luis,

Thanks for the feedback. I will try to find a solution that does not 
require to modify the kernel module code.

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8be8e08fb67d..fead694ff95a 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -14,6 +14,7 @@ 
 #include <linux/device.h>
 #include <linux/pm_runtime.h>
 #include <linux/bsg.h>
+#include <linux/delay.h>

 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
@@ -518,6 +519,7 @@  static void scsi_device_dev_release_usercontext(struct work_struct *work)

 	if (parent)
 		put_device(parent);
+	msleep(100);
 	module_put(mod);
 }

diff --git a/kernel/module/main.c b/kernel/module/main.c
index a271126d7d59..0bf75ec3f5a8 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -756,8 +756,10 @@  SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
 	 * unloading is not forced, wait for the module reference count to drop
 	 * to zero again.
 	 */
-	if (!forced)
+	if (!forced) {
+		WARN_ON_ONCE(atomic_read(&mod->refcnt));
 		wait_event(mod->refcnt_wq, atomic_read(&mod->refcnt) == 0);
+	}
 	blocking_notifier_call_chain(&module_notify_list,
 				     MODULE_STATE_GOING, mod);
 	klp_module_going(mod);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeea9731ef80..f021625f2caa 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3355,7 +3355,7 @@  int schedule_on_each_cpu(work_func_t func)
  */
 int execute_in_process_context(work_func_t fn, struct execute_work *ew)
 {
-	if (!in_interrupt()) {
+	if (false && !in_interrupt()) {
 		fn(&ew->work);
 		return 0;
 	}

Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/module.h |  1 +
 kernel/module/main.c   | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 518296ea7f73..3a77d2bd4198 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -533,6 +533,7 @@  struct module {
 	/* Destruction function. */
 	void (*exit)(void);
 
+	wait_queue_head_t refcnt_wq;
 	atomic_t refcnt;
 #endif
 
diff --git a/kernel/module/main.c b/kernel/module/main.c
index a4e4d84b6f4e..a271126d7d59 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -550,6 +550,7 @@  static int module_unload_init(struct module *mod)
 
 	/* Hold reference count during initialization. */
 	atomic_inc(&mod->refcnt);
+	init_waitqueue_head(&mod->refcnt_wq);
 
 	return 0;
 }
@@ -750,6 +751,13 @@  SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
 	/* Final destruction now no one is using it. */
 	if (mod->exit != NULL)
 		mod->exit();
+	/*
+	 * If the module reference count was increased by mod->exit() and if
+	 * unloading is not forced, wait for the module reference count to drop
+	 * to zero again.
+	 */
+	if (!forced)
+		wait_event(mod->refcnt_wq, atomic_read(&mod->refcnt) == 0);
 	blocking_notifier_call_chain(&module_notify_list,
 				     MODULE_STATE_GOING, mod);
 	klp_module_going(mod);
@@ -854,6 +862,8 @@  void module_put(struct module *module)
 		WARN_ON(ret < 0);	/* Failed to put refcount */
 		trace_module_put(module, _RET_IP_);
 		preempt_enable();
+		if (ret == 0)
+			wake_up(&module->refcnt_wq);
 	}
 }
 EXPORT_SYMBOL(module_put);