diff mbox series

[net] bonding: fix oops during rmmod

Message ID 641f914f-3216-4eeb-87dd-91b78aa97773@cybernetics.com (mailing list archive)
State Accepted
Commit a45835a0bb6ef7d5ddbc0714dd760de979cb6ece
Delegated to: Netdev Maintainers
Headers show
Series [net] bonding: fix oops during rmmod | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 925 this patch: 925
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 936 this patch: 936
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 936 this patch: 936
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 44 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 13 this patch: 13
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-17--00-00 (tests: 1034)

Commit Message

Tony Battersby May 14, 2024, 7:57 p.m. UTC
"rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding:
remove redundant NULL check in debugfs function").  Here are the relevant
functions being called:

bonding_exit()
  bond_destroy_debugfs()
    debugfs_remove_recursive(bonding_debug_root);
    bonding_debug_root = NULL; <--------- SET TO NULL HERE
  bond_netlink_fini()
    rtnl_link_unregister()
      __rtnl_link_unregister()
        unregister_netdevice_many_notify()
          bond_uninit()
            bond_debug_unregister()
              (commit removed check for bonding_debug_root == NULL)
              debugfs_remove()
              simple_recursive_removal()
                down_write() -> OOPS

However, reverting the bad commit does not solve the problem completely
because the original code contains a race that could cause the same
oops, although it was much less likely to be triggered unintentionally:

CPU1
  rmmod bonding
    bonding_exit()
      bond_destroy_debugfs()
        debugfs_remove_recursive(bonding_debug_root);

CPU2
  echo -bond0 > /sys/class/net/bonding_masters
    bond_uninit()
      bond_debug_unregister()
        if (!bonding_debug_root)

CPU1
        bonding_debug_root = NULL;

So do NOT revert the bad commit (since the removed checks were racy
anyway), and instead change the order of actions taken during module
removal.  The same oops can also happen if there is an error during
module init, so apply the same fix there.

Fixes: cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---
 drivers/net/bonding/bond_main.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)


base-commit: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

Comments

Simon Horman May 15, 2024, 11:44 a.m. UTC | #1
On Tue, May 14, 2024 at 03:57:29PM -0400, Tony Battersby wrote:
> "rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding:
> remove redundant NULL check in debugfs function").  Here are the relevant
> functions being called:
> 
> bonding_exit()
>   bond_destroy_debugfs()
>     debugfs_remove_recursive(bonding_debug_root);
>     bonding_debug_root = NULL; <--------- SET TO NULL HERE
>   bond_netlink_fini()
>     rtnl_link_unregister()
>       __rtnl_link_unregister()
>         unregister_netdevice_many_notify()
>           bond_uninit()
>             bond_debug_unregister()
>               (commit removed check for bonding_debug_root == NULL)
>               debugfs_remove()
>               simple_recursive_removal()
>                 down_write() -> OOPS
> 
> However, reverting the bad commit does not solve the problem completely
> because the original code contains a race that could cause the same
> oops, although it was much less likely to be triggered unintentionally:
> 
> CPU1
>   rmmod bonding
>     bonding_exit()
>       bond_destroy_debugfs()
>         debugfs_remove_recursive(bonding_debug_root);
> 
> CPU2
>   echo -bond0 > /sys/class/net/bonding_masters
>     bond_uninit()
>       bond_debug_unregister()
>         if (!bonding_debug_root)
> 
> CPU1
>         bonding_debug_root = NULL;
> 
> So do NOT revert the bad commit (since the removed checks were racy
> anyway), and instead change the order of actions taken during module
> removal.  The same oops can also happen if there is an error during
> module init, so apply the same fix there.
> 
> Fixes: cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function")
> Cc: stable@vger.kernel.org
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Jay Vosburgh May 15, 2024, 12:44 p.m. UTC | #2
Tony Battersby <tonyb@cybernetics.com> wrote:

>"rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding:
>remove redundant NULL check in debugfs function").  Here are the relevant
>functions being called:
>
>bonding_exit()
>  bond_destroy_debugfs()
>    debugfs_remove_recursive(bonding_debug_root);
>    bonding_debug_root = NULL; <--------- SET TO NULL HERE
>  bond_netlink_fini()
>    rtnl_link_unregister()
>      __rtnl_link_unregister()
>        unregister_netdevice_many_notify()
>          bond_uninit()
>            bond_debug_unregister()
>              (commit removed check for bonding_debug_root == NULL)
>              debugfs_remove()
>              simple_recursive_removal()
>                down_write() -> OOPS
>
>However, reverting the bad commit does not solve the problem completely
>because the original code contains a race that could cause the same
>oops, although it was much less likely to be triggered unintentionally:
>
>CPU1
>  rmmod bonding
>    bonding_exit()
>      bond_destroy_debugfs()
>        debugfs_remove_recursive(bonding_debug_root);
>
>CPU2
>  echo -bond0 > /sys/class/net/bonding_masters
>    bond_uninit()
>      bond_debug_unregister()
>        if (!bonding_debug_root)
>
>CPU1
>        bonding_debug_root = NULL;
>
>So do NOT revert the bad commit (since the removed checks were racy
>anyway), and instead change the order of actions taken during module
>removal.  The same oops can also happen if there is an error during
>module init, so apply the same fix there.
>
>Fixes: cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function")
>Cc: stable@vger.kernel.org
>Signed-off-by: Tony Battersby <tonyb@cybernetics.com>

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>

>---
> drivers/net/bonding/bond_main.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 2c5ed0a7cb18..bceda85f0dcf 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -6477,16 +6477,16 @@ static int __init bonding_init(void)
> 	if (res)
> 		goto out;
> 
>+	bond_create_debugfs();
>+
> 	res = register_pernet_subsys(&bond_net_ops);
> 	if (res)
>-		goto out;
>+		goto err_net_ops;
> 
> 	res = bond_netlink_init();
> 	if (res)
> 		goto err_link;
> 
>-	bond_create_debugfs();
>-
> 	for (i = 0; i < max_bonds; i++) {
> 		res = bond_create(&init_net, NULL);
> 		if (res)
>@@ -6501,10 +6501,11 @@ static int __init bonding_init(void)
> out:
> 	return res;
> err:
>-	bond_destroy_debugfs();
> 	bond_netlink_fini();
> err_link:
> 	unregister_pernet_subsys(&bond_net_ops);
>+err_net_ops:
>+	bond_destroy_debugfs();
> 	goto out;
> 
> }
>@@ -6513,11 +6514,11 @@ static void __exit bonding_exit(void)
> {
> 	unregister_netdevice_notifier(&bond_netdev_notifier);
> 
>-	bond_destroy_debugfs();
>-
> 	bond_netlink_fini();
> 	unregister_pernet_subsys(&bond_net_ops);
> 
>+	bond_destroy_debugfs();
>+
> #ifdef CONFIG_NET_POLL_CONTROLLER
> 	/* Make sure we don't have an imbalance on our netpoll blocking */
> 	WARN_ON(atomic_read(&netpoll_block_tx));
>
>base-commit: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
>-- 
>2.25.1
>
>
patchwork-bot+netdevbpf@kernel.org May 17, 2024, 2:40 a.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 14 May 2024 15:57:29 -0400 you wrote:
> "rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding:
> remove redundant NULL check in debugfs function").  Here are the relevant
> functions being called:
> 
> bonding_exit()
>   bond_destroy_debugfs()
>     debugfs_remove_recursive(bonding_debug_root);
>     bonding_debug_root = NULL; <--------- SET TO NULL HERE
>   bond_netlink_fini()
>     rtnl_link_unregister()
>       __rtnl_link_unregister()
>         unregister_netdevice_many_notify()
>           bond_uninit()
>             bond_debug_unregister()
>               (commit removed check for bonding_debug_root == NULL)
>               debugfs_remove()
>               simple_recursive_removal()
>                 down_write() -> OOPS
> 
> [...]

Here is the summary with links:
  - [net] bonding: fix oops during rmmod
    https://git.kernel.org/netdev/net/c/a45835a0bb6e

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2c5ed0a7cb18..bceda85f0dcf 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -6477,16 +6477,16 @@  static int __init bonding_init(void)
 	if (res)
 		goto out;
 
+	bond_create_debugfs();
+
 	res = register_pernet_subsys(&bond_net_ops);
 	if (res)
-		goto out;
+		goto err_net_ops;
 
 	res = bond_netlink_init();
 	if (res)
 		goto err_link;
 
-	bond_create_debugfs();
-
 	for (i = 0; i < max_bonds; i++) {
 		res = bond_create(&init_net, NULL);
 		if (res)
@@ -6501,10 +6501,11 @@  static int __init bonding_init(void)
 out:
 	return res;
 err:
-	bond_destroy_debugfs();
 	bond_netlink_fini();
 err_link:
 	unregister_pernet_subsys(&bond_net_ops);
+err_net_ops:
+	bond_destroy_debugfs();
 	goto out;
 
 }
@@ -6513,11 +6514,11 @@  static void __exit bonding_exit(void)
 {
 	unregister_netdevice_notifier(&bond_netdev_notifier);
 
-	bond_destroy_debugfs();
-
 	bond_netlink_fini();
 	unregister_pernet_subsys(&bond_net_ops);
 
+	bond_destroy_debugfs();
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	/* Make sure we don't have an imbalance on our netpoll blocking */
 	WARN_ON(atomic_read(&netpoll_block_tx));