diff mbox series

[net] team: fix possible deadlock in team_port_change_check

Message ID 20240801111842.50031-1-aha310510@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] team: fix possible deadlock in team_port_change_check | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 42 this patch: 42
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 43 this patch: 43
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 43 this patch: 43
netdev/checkpatch fail CHECK: Unbalanced braces around else statement ERROR: else should follow close brace '}' ERROR: space required before the open brace '{' WARNING: Missing a blank line after declarations
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-08-01--12-00 (tests: 701)

Commit Message

Jeongjun Park Aug. 1, 2024, 11:18 a.m. UTC
In do_setlink() , do_set_master() is called when dev->flags does not have
the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called,
which generates the NETDEV_UP event. This causes a deadlock as it tries to
acquire 'team->lock' again.

To solve this, I modified team_port_change_check to check if 'team->lock' 
has already been acquired and not acquire the lock again if it has been 
acquired in the upper function.

============================================
WARNING: possible recursive locking detected
6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted
--------------------------------------------
syz.0.15/5889 is trying to acquire lock:
ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline]
ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973

but task is already holding lock:
ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(team->team_lock_key#2);
  lock(team->team_lock_key#2);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz.0.15/5889:
 #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644
 #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975

stack backtrace:
CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119
 check_deadlock kernel/locking/lockdep.c:3061 [inline]
 validate_chain kernel/locking/lockdep.c:3855 [inline]
 __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142
 lock_acquire kernel/locking/lockdep.c:5759 [inline]
 lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724
 __mutex_lock_common kernel/locking/mutex.c:608 [inline]
 __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752
 team_port_change_check drivers/net/team/team_core.c:2950 [inline]
 team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973
 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
 call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
 call_netdevice_notifiers net/core/dev.c:2046 [inline]
 __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876
 dev_change_flags+0x10c/0x160 net/core/dev.c:8914
 vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468
 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
 call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
 call_netdevice_notifiers net/core/dev.c:2046 [inline]
 dev_open net/core/dev.c:1515 [inline]
 dev_open+0x144/0x160 net/core/dev.c:1503
 team_port_add drivers/net/team/team_core.c:1216 [inline]
 team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976
 do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701
 do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907
 __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696
 rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743
 rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647
 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550
 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
 netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357
 netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg net/socket.c:745 [inline]
 ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597
 ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651
 __sys_sendmsg+0x117/0x1f0 net/socket.c:2680
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc07ed77299
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299
RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012
RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528

Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
---
 drivers/net/team/team_core.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

--

Comments

Jakub Kicinski Aug. 1, 2024, 2:28 p.m. UTC | #1
On Thu,  1 Aug 2024 20:18:42 +0900 Jeongjun Park wrote:
>  	struct team *team = port->team;
> +	bool flag = true;
>  
> -	mutex_lock(&team->lock);
> +	if (mutex_is_locked(&team->lock)){
> +		unsigned long owner, curr = (unsigned long)current;
> +		owner = atomic_long_read(&team->lock.owner);
> +		if (owner != curr)
> +			mutex_lock(&team->lock);
> +		else
> +			flag = false;
> +	}
> +	else{
> +		mutex_lock(&team->lock);
> +	}
>  	__team_port_change_check(port, linkup);
> -	mutex_unlock(&team->lock);
> +	if (flag)
> +		mutex_unlock(&team->lock);

You didn't even run this thru checkpatch, let alone the fact that its
reimplementing nested locks (or trying to) :(

Some of the syzbot reports are not fixed because they are either hard
or because there is a long standing disagreement on how to solve them.
Please keep that in mind.
Jeongjun Park Aug. 1, 2024, 10:51 p.m. UTC | #2
Jakub Kicinski wrote:
>
> On Thu,  1 Aug 2024 20:18:42 +0900 Jeongjun Park wrote:
> >       struct team *team = port->team;
> > +     bool flag = true;
> >
> > -     mutex_lock(&team->lock);
> > +     if (mutex_is_locked(&team->lock)){
> > +             unsigned long owner, curr = (unsigned long)current;
> > +             owner = atomic_long_read(&team->lock.owner);
> > +             if (owner != curr)
> > +                     mutex_lock(&team->lock);
> > +             else
> > +                     flag = false;
> > +     }
> > +     else{
> > +             mutex_lock(&team->lock);
> > +     }
> >       __team_port_change_check(port, linkup);
> > -     mutex_unlock(&team->lock);
> > +     if (flag)
> > +             mutex_unlock(&team->lock);
>
> You didn't even run this thru checkpatch, let alone the fact that its
> reimplementing nested locks (or trying to) :(
>
> Some of the syzbot reports are not fixed because they are either hard
> or because there is a long standing disagreement on how to solve them.
> Please keep that in mind.

Okay, but I have a question. Is it true that team devices can also be
protected through rtnl? As far as I know, rtnl only protects net_device,
so I didn't think about removing the lock for team->lock.

Regards,
Jeongjun Park
Jakub Kicinski Aug. 2, 2024, 12:49 a.m. UTC | #3
On Fri, 2 Aug 2024 07:51:19 +0900 Jeongjun Park wrote:
> > You didn't even run this thru checkpatch, let alone the fact that its
> > reimplementing nested locks (or trying to) :(
> >
> > Some of the syzbot reports are not fixed because they are either hard
> > or because there is a long standing disagreement on how to solve them.
> > Please keep that in mind.  
> 
> Okay, but I have a question. Is it true that team devices can also be
> protected through rtnl? As far as I know, rtnl only protects net_device,
> so I didn't think about removing the lock for team->lock.

Yes, but I think that gets us into the "long standing disagreement"
territory :) You may be able to find previous attempt to remove
team->lock in the mailing list archive.
diff mbox series

Patch

diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
index ab1935a4aa2c..4ac6e55998ec 100644
--- a/drivers/net/team/team_core.c
+++ b/drivers/net/team/team_core.c
@@ -2946,10 +2946,22 @@  static void __team_port_change_port_removed(struct team_port *port)
 static void team_port_change_check(struct team_port *port, bool linkup)
 {
 	struct team *team = port->team;
+	bool flag = true;
 
-	mutex_lock(&team->lock);
+	if (mutex_is_locked(&team->lock)){
+		unsigned long owner, curr = (unsigned long)current;
+		owner = atomic_long_read(&team->lock.owner);
+		if (owner != curr)
+			mutex_lock(&team->lock);
+		else
+			flag = false;
+	}
+	else{
+		mutex_lock(&team->lock);
+	}
 	__team_port_change_check(port, linkup);
-	mutex_unlock(&team->lock);
+	if (flag)
+		mutex_unlock(&team->lock);
 }