diff mbox series

[net-next] netdevsim: Block until all devices are released

Message ID 20231026083343.890689-1-idosch@nvidia.com (mailing list archive)
State Accepted
Commit 6aff7cbfe7bfafbed86a6948e660e280848c4d97
Delegated to: Netdev Maintainers
Headers show
Series [net-next] netdevsim: Block until all devices are released | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1363 this patch: 1363
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 1388 this patch: 1388
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1388 this patch: 1388
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 59 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ido Schimmel Oct. 26, 2023, 8:33 a.m. UTC
Like other buses, devices on the netdevsim bus have a release callback
that is invoked when the reference count of the device drops to zero.
However, unlike other buses such as PCI, the release callback is not
necessarily built into the kernel, as netdevsim can be built as a
module.

The above is problematic as nothing prevents the module from being
unloaded before the release callback has been invoked, which can happen
asynchronously. One such example can be found in commit a380687200e0
("devlink: take device reference for devlink object") where devlink
calls put_device() from an RCU callback.

The issue is not theoretical and the reproducer in [1] can reliably
crash the kernel. The conclusion of this discussion was that the issue
should be solved in netdevsim, which is what this patch is trying to do.

Add a reference count that is increased when a device is added to the
bus and decreased when a device is released. Signal a completion when
the reference count drops to zero and wait for the completion when
unloading the module so that the module will not be unloaded before all
the devices were released. The reference count is initialized to one so
that completion is only signaled when unloading the module.

With this patch, the reproducer in [1] no longer crashes the kernel.

[1] https://lore.kernel.org/netdev/20230619125015.1541143-2-idosch@nvidia.com/

Fixes: a380687200e0 ("devlink: take device reference for devlink object")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
Original posting is here:
https://lore.kernel.org/netdev/20231017074257.3389177-2-idosch@nvidia.com/
Had to reword the commit message to reflect the fact that the bug is
already present in net-next, which is why I removed Jakub's tag.
---
 drivers/net/netdevsim/bus.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Jiri Pirko Oct. 26, 2023, 10:03 a.m. UTC | #1
Thu, Oct 26, 2023 at 10:33:43AM CEST, idosch@nvidia.com wrote:
>Like other buses, devices on the netdevsim bus have a release callback
>that is invoked when the reference count of the device drops to zero.
>However, unlike other buses such as PCI, the release callback is not
>necessarily built into the kernel, as netdevsim can be built as a
>module.
>
>The above is problematic as nothing prevents the module from being
>unloaded before the release callback has been invoked, which can happen
>asynchronously. One such example can be found in commit a380687200e0
>("devlink: take device reference for devlink object") where devlink
>calls put_device() from an RCU callback.
>
>The issue is not theoretical and the reproducer in [1] can reliably
>crash the kernel. The conclusion of this discussion was that the issue
>should be solved in netdevsim, which is what this patch is trying to do.
>
>Add a reference count that is increased when a device is added to the
>bus and decreased when a device is released. Signal a completion when
>the reference count drops to zero and wait for the completion when
>unloading the module so that the module will not be unloaded before all
>the devices were released. The reference count is initialized to one so
>that completion is only signaled when unloading the module.
>
>With this patch, the reproducer in [1] no longer crashes the kernel.
>
>[1] https://lore.kernel.org/netdev/20230619125015.1541143-2-idosch@nvidia.com/
>
>Fixes: a380687200e0 ("devlink: take device reference for devlink object")
>Signed-off-by: Ido Schimmel <idosch@nvidia.com>

Reviewed-by: Jiri Pirko <jiri@nvidia.com>

Thanks!
patchwork-bot+netdevbpf@kernel.org Oct. 27, 2023, 10 p.m. UTC | #2
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 26 Oct 2023 11:33:43 +0300 you wrote:
> Like other buses, devices on the netdevsim bus have a release callback
> that is invoked when the reference count of the device drops to zero.
> However, unlike other buses such as PCI, the release callback is not
> necessarily built into the kernel, as netdevsim can be built as a
> module.
> 
> The above is problematic as nothing prevents the module from being
> unloaded before the release callback has been invoked, which can happen
> asynchronously. One such example can be found in commit a380687200e0
> ("devlink: take device reference for devlink object") where devlink
> calls put_device() from an RCU callback.
> 
> [...]

Here is the summary with links:
  - [net-next] netdevsim: Block until all devices are released
    https://git.kernel.org/netdev/net-next/c/6aff7cbfe7bf

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
index 0787ad252dd9..bcbc1e19edde 100644
--- a/drivers/net/netdevsim/bus.c
+++ b/drivers/net/netdevsim/bus.c
@@ -3,11 +3,13 @@ 
  * Copyright (C) 2019 Mellanox Technologies. All rights reserved
  */
 
+#include <linux/completion.h>
 #include <linux/device.h>
 #include <linux/idr.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
+#include <linux/refcount.h>
 #include <linux/slab.h>
 #include <linux/sysfs.h>
 
@@ -17,6 +19,8 @@  static DEFINE_IDA(nsim_bus_dev_ids);
 static LIST_HEAD(nsim_bus_dev_list);
 static DEFINE_MUTEX(nsim_bus_dev_list_lock);
 static bool nsim_bus_enable;
+static refcount_t nsim_bus_devs; /* Including the bus itself. */
+static DECLARE_COMPLETION(nsim_bus_devs_released);
 
 static struct nsim_bus_dev *to_nsim_bus_dev(struct device *dev)
 {
@@ -121,6 +125,8 @@  static void nsim_bus_dev_release(struct device *dev)
 
 	nsim_bus_dev = container_of(dev, struct nsim_bus_dev, dev);
 	kfree(nsim_bus_dev);
+	if (refcount_dec_and_test(&nsim_bus_devs))
+		complete(&nsim_bus_devs_released);
 }
 
 static struct device_type nsim_bus_dev_type = {
@@ -170,6 +176,7 @@  new_device_store(const struct bus_type *bus, const char *buf, size_t count)
 		goto err;
 	}
 
+	refcount_inc(&nsim_bus_devs);
 	/* Allow using nsim_bus_dev */
 	smp_store_release(&nsim_bus_dev->init, true);
 
@@ -326,6 +333,7 @@  int nsim_bus_init(void)
 	err = driver_register(&nsim_driver);
 	if (err)
 		goto err_bus_unregister;
+	refcount_set(&nsim_bus_devs, 1);
 	/* Allow using resources */
 	smp_store_release(&nsim_bus_enable, true);
 	return 0;
@@ -341,6 +349,8 @@  void nsim_bus_exit(void)
 
 	/* Disallow using resources */
 	smp_store_release(&nsim_bus_enable, false);
+	if (refcount_dec_and_test(&nsim_bus_devs))
+		complete(&nsim_bus_devs_released);
 
 	mutex_lock(&nsim_bus_dev_list_lock);
 	list_for_each_entry_safe(nsim_bus_dev, tmp, &nsim_bus_dev_list, list) {
@@ -349,6 +359,8 @@  void nsim_bus_exit(void)
 	}
 	mutex_unlock(&nsim_bus_dev_list_lock);
 
+	wait_for_completion(&nsim_bus_devs_released);
+
 	driver_unregister(&nsim_driver);
 	bus_unregister(&nsim_bus);
 }