diff mbox series

[net,v2] rtnetlink: fix possible deadlock in team_port_change_check

Message ID 20240731150940.14106-1-aha310510@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] rtnetlink: fix possible deadlock in team_port_change_check | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 42 this patch: 42
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 43 this patch: 43
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 43 this patch: 43
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-01--06-00 (tests: 707)

Commit Message

Jeongjun Park July 31, 2024, 3:09 p.m. UTC
In do_setlink() , do_set_master() is called when dev->flags does not have
the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called,
which generates the NETDEV_UP event. This causes a deadlock as it tries to
acquire 'team->lock' again.

To solve this, we need to unlock 'team->lock' before calling dev_open()
in team_port_add() and then reacquire the lock when dev_open() returns.
Since the implementation acquires the lock in advance when the team
structure is used inside dev_open(), data races will not occur even if it
is briefly unlocked.

============================================
WARNING: possible recursive locking detected
6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted
--------------------------------------------
syz.0.15/5889 is trying to acquire lock:
ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline]
ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973

but task is already holding lock:
ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(team->team_lock_key#2);
  lock(team->team_lock_key#2);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz.0.15/5889:
 #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644
 #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975

stack backtrace:
CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119
 check_deadlock kernel/locking/lockdep.c:3061 [inline]
 validate_chain kernel/locking/lockdep.c:3855 [inline]
 __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142
 lock_acquire kernel/locking/lockdep.c:5759 [inline]
 lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724
 __mutex_lock_common kernel/locking/mutex.c:608 [inline]
 __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752
 team_port_change_check drivers/net/team/team_core.c:2950 [inline]
 team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973
 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
 call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
 call_netdevice_notifiers net/core/dev.c:2046 [inline]
 __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876
 dev_change_flags+0x10c/0x160 net/core/dev.c:8914
 vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468
 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
 call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
 call_netdevice_notifiers net/core/dev.c:2046 [inline]
 dev_open net/core/dev.c:1515 [inline]
 dev_open+0x144/0x160 net/core/dev.c:1503
 team_port_add drivers/net/team/team_core.c:1216 [inline]
 team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976
 do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701
 do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907
 __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696
 rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743
 rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647
 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550
 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
 netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357
 netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg net/socket.c:745 [inline]
 ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597
 ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651
 __sys_sendmsg+0x117/0x1f0 net/socket.c:2680
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc07ed77299
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299
RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012
RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528

Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com
Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
---
 drivers/net/team/team_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--

Comments

Hangbin Liu Aug. 1, 2024, 2:16 a.m. UTC | #1
On Thu, Aug 01, 2024 at 12:09:40AM +0900, Jeongjun Park wrote:
> In do_setlink() , do_set_master() is called when dev->flags does not have
> the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called,
> which generates the NETDEV_UP event. This causes a deadlock as it tries to
> acquire 'team->lock' again.
> 
> To solve this, we need to unlock 'team->lock' before calling dev_open()
> in team_port_add() and then reacquire the lock when dev_open() returns.
> Since the implementation acquires the lock in advance when the team
> structure is used inside dev_open(), data races will not occur even if it
> is briefly unlocked.
> 
> 
> Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com
> Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"")

The fixes tag shouldn't be ec4ffd100ffb, as the issue exists before
ec4ffd100ffb. I think it should be

3d249d4ca7d0 ("net: introduce ethernet teaming device")

Jiri, what do you think?

Thanks
Hangbin
Eric Dumazet Aug. 1, 2024, 6:28 a.m. UTC | #2
On Wed, Jul 31, 2024 at 5:10 PM Jeongjun Park <aha310510@gmail.com> wrote:
>
> In do_setlink() , do_set_master() is called when dev->flags does not have
> the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called,
> which generates the NETDEV_UP event. This causes a deadlock as it tries to
> acquire 'team->lock' again.
>
> To solve this, we need to unlock 'team->lock' before calling dev_open()
> in team_port_add() and then reacquire the lock when dev_open() returns.
> Since the implementation acquires the lock in advance when the team
> structure is used inside dev_open(), data races will not occur even if it
> is briefly unlocked.
>
> ============================================
> WARNING: possible recursive locking detected
> 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted
> --------------------------------------------
> syz.0.15/5889 is trying to acquire lock:
> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline]
> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973
>
> but task is already holding lock:
> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(team->team_lock_key#2);
>   lock(team->team_lock_key#2);
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation
>
> 2 locks held by syz.0.15/5889:
>  #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644
>  #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975
>
> stack backtrace:
> CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:93 [inline]
>  dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119
>  check_deadlock kernel/locking/lockdep.c:3061 [inline]
>  validate_chain kernel/locking/lockdep.c:3855 [inline]
>  __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142
>  lock_acquire kernel/locking/lockdep.c:5759 [inline]
>  lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724
>  __mutex_lock_common kernel/locking/mutex.c:608 [inline]
>  __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752
>  team_port_change_check drivers/net/team/team_core.c:2950 [inline]
>  team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973
>  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
>  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
>  call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
>  call_netdevice_notifiers net/core/dev.c:2046 [inline]
>  __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876
>  dev_change_flags+0x10c/0x160 net/core/dev.c:8914
>  vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468
>  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
>  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
>  call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
>  call_netdevice_notifiers net/core/dev.c:2046 [inline]
>  dev_open net/core/dev.c:1515 [inline]
>  dev_open+0x144/0x160 net/core/dev.c:1503
>  team_port_add drivers/net/team/team_core.c:1216 [inline]
>  team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976
>  do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701
>  do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907
>  __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696
>  rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743
>  rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647
>  netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550
>  netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
>  netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357
>  netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg net/socket.c:745 [inline]
>  ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597
>  ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651
>  __sys_sendmsg+0x117/0x1f0 net/socket.c:2680
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fc07ed77299
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299
> RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012
> RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528
>
> Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com
> Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"")
> Signed-off-by: Jeongjun Park <aha310510@gmail.com>
> ---
>  drivers/net/team/team_core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
> index ab1935a4aa2c..ee595c3c6624 100644
> --- a/drivers/net/team/team_core.c
> +++ b/drivers/net/team/team_core.c
> @@ -1212,8 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
>                            portname);
>                 goto err_port_enter;
>         }
> -
> +       mutex_unlock(&team->lock);

Why would this be safe ?

All checks done in team_port_add() before this point would need to be
redone after mutex_lock() ?

If another mutex (rtnl ?) is already protecting this path, this would
suggest team->lock should be removed,
and RTNL should be used in all needed paths.

>         err = dev_open(port_dev, extack);
> +       mutex_lock(&team->lock);



>         if (err) {
>                 netdev_dbg(dev, "Device %s opening failed\n",
>                            portname);
> --
Jeongjun Park Aug. 1, 2024, 11:04 a.m. UTC | #3
Eric Dumazet wrote:
>
> On Wed, Jul 31, 2024 at 5:10 PM Jeongjun Park <aha310510@gmail.com> wrote:
> >
> > In do_setlink() , do_set_master() is called when dev->flags does not have
> > the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called,
> > which generates the NETDEV_UP event. This causes a deadlock as it tries to
> > acquire 'team->lock' again.
> >
> > To solve this, we need to unlock 'team->lock' before calling dev_open()
> > in team_port_add() and then reacquire the lock when dev_open() returns.
> > Since the implementation acquires the lock in advance when the team
> > structure is used inside dev_open(), data races will not occur even if it
> > is briefly unlocked.
> >
> > ============================================
> > WARNING: possible recursive locking detected
> > 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted
> > --------------------------------------------
> > syz.0.15/5889 is trying to acquire lock:
> > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline]
> > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973
> >
> > but task is already holding lock:
> > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975
> >
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> >
> >        CPU0
> >        ----
> >   lock(team->team_lock_key#2);
> >   lock(team->team_lock_key#2);
> >
> >  *** DEADLOCK ***
> >
> >  May be due to missing lock nesting notation
> >
> > 2 locks held by syz.0.15/5889:
> >  #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
> >  #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644
> >  #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975
> >
> > stack backtrace:
> > CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > Call Trace:
> >  <TASK>
> >  __dump_stack lib/dump_stack.c:93 [inline]
> >  dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119
> >  check_deadlock kernel/locking/lockdep.c:3061 [inline]
> >  validate_chain kernel/locking/lockdep.c:3855 [inline]
> >  __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142
> >  lock_acquire kernel/locking/lockdep.c:5759 [inline]
> >  lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724
> >  __mutex_lock_common kernel/locking/mutex.c:608 [inline]
> >  __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752
> >  team_port_change_check drivers/net/team/team_core.c:2950 [inline]
> >  team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973
> >  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
> >  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
> >  call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
> >  call_netdevice_notifiers net/core/dev.c:2046 [inline]
> >  __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876
> >  dev_change_flags+0x10c/0x160 net/core/dev.c:8914
> >  vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468
> >  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
> >  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
> >  call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
> >  call_netdevice_notifiers net/core/dev.c:2046 [inline]
> >  dev_open net/core/dev.c:1515 [inline]
> >  dev_open+0x144/0x160 net/core/dev.c:1503
> >  team_port_add drivers/net/team/team_core.c:1216 [inline]
> >  team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976
> >  do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701
> >  do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907
> >  __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696
> >  rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743
> >  rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647
> >  netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550
> >  netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
> >  netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357
> >  netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
> >  sock_sendmsg_nosec net/socket.c:730 [inline]
> >  __sock_sendmsg net/socket.c:745 [inline]
> >  ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597
> >  ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651
> >  __sys_sendmsg+0x117/0x1f0 net/socket.c:2680
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7fc07ed77299
> > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> > RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299
> > RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012
> > RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528
> >
> > Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com
> > Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"")
> > Signed-off-by: Jeongjun Park <aha310510@gmail.com>
> > ---
> >  drivers/net/team/team_core.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
> > index ab1935a4aa2c..ee595c3c6624 100644
> > --- a/drivers/net/team/team_core.c
> > +++ b/drivers/net/team/team_core.c
> > @@ -1212,8 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
> >                            portname);
> >                 goto err_port_enter;
> >         }
> > -
> > +       mutex_unlock(&team->lock);
>
> Why would this be safe ?
>
> All checks done in team_port_add() before this point would need to be
> redone after mutex_lock() ?
>
> If another mutex (rtnl ?) is already protecting this path, this would
> suggest team->lock should be removed,
> and RTNL should be used in all needed paths.
>

If so, I will rewrite the patch and send by modifying
team_port_change_check.

Regards,
Jeongjun Park
Jiri Pirko Aug. 1, 2024, 2:31 p.m. UTC | #4
Thu, Aug 01, 2024 at 08:28:20AM CEST, edumazet@google.com wrote:
>On Wed, Jul 31, 2024 at 5:10 PM Jeongjun Park <aha310510@gmail.com> wrote:
>>
>> In do_setlink() , do_set_master() is called when dev->flags does not have
>> the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called,
>> which generates the NETDEV_UP event. This causes a deadlock as it tries to
>> acquire 'team->lock' again.
>>
>> To solve this, we need to unlock 'team->lock' before calling dev_open()
>> in team_port_add() and then reacquire the lock when dev_open() returns.
>> Since the implementation acquires the lock in advance when the team
>> structure is used inside dev_open(), data races will not occur even if it
>> is briefly unlocked.
>>
>> ============================================
>> WARNING: possible recursive locking detected
>> 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted
>> --------------------------------------------
>> syz.0.15/5889 is trying to acquire lock:
>> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline]
>> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973
>>
>> but task is already holding lock:
>> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975
>>
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>>
>>        CPU0
>>        ----
>>   lock(team->team_lock_key#2);
>>   lock(team->team_lock_key#2);
>>
>>  *** DEADLOCK ***
>>
>>  May be due to missing lock nesting notation
>>
>> 2 locks held by syz.0.15/5889:
>>  #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>>  #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644
>>  #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975
>>
>> stack backtrace:
>> CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
>> Call Trace:
>>  <TASK>
>>  __dump_stack lib/dump_stack.c:93 [inline]
>>  dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119
>>  check_deadlock kernel/locking/lockdep.c:3061 [inline]
>>  validate_chain kernel/locking/lockdep.c:3855 [inline]
>>  __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142
>>  lock_acquire kernel/locking/lockdep.c:5759 [inline]
>>  lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724
>>  __mutex_lock_common kernel/locking/mutex.c:608 [inline]
>>  __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752
>>  team_port_change_check drivers/net/team/team_core.c:2950 [inline]
>>  team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973
>>  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
>>  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
>>  call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
>>  call_netdevice_notifiers net/core/dev.c:2046 [inline]
>>  __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876
>>  dev_change_flags+0x10c/0x160 net/core/dev.c:8914
>>  vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468
>>  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
>>  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994
>>  call_netdevice_notifiers_extack net/core/dev.c:2032 [inline]
>>  call_netdevice_notifiers net/core/dev.c:2046 [inline]
>>  dev_open net/core/dev.c:1515 [inline]
>>  dev_open+0x144/0x160 net/core/dev.c:1503
>>  team_port_add drivers/net/team/team_core.c:1216 [inline]
>>  team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976
>>  do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701
>>  do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907
>>  __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696
>>  rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743
>>  rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647
>>  netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550
>>  netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
>>  netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357
>>  netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
>>  sock_sendmsg_nosec net/socket.c:730 [inline]
>>  __sock_sendmsg net/socket.c:745 [inline]
>>  ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597
>>  ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651
>>  __sys_sendmsg+0x117/0x1f0 net/socket.c:2680
>>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>  do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7fc07ed77299
>> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>> RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299
>> RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012
>> RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528
>>
>> Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com
>> Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"")
>> Signed-off-by: Jeongjun Park <aha310510@gmail.com>
>> ---
>>  drivers/net/team/team_core.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
>> index ab1935a4aa2c..ee595c3c6624 100644
>> --- a/drivers/net/team/team_core.c
>> +++ b/drivers/net/team/team_core.c
>> @@ -1212,8 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev,
>>                            portname);
>>                 goto err_port_enter;
>>         }
>> -
>> +       mutex_unlock(&team->lock);
>
>Why would this be safe ?
>
>All checks done in team_port_add() before this point would need to be
>redone after mutex_lock() ?
>
>If another mutex (rtnl ?) is already protecting this path, this would
>suggest team->lock should be removed,
>and RTNL should be used in all needed paths.

I agree. The problem is, not all other paths rely on rtnl, they rely on
team->lock instead. I believe that the best solution is to remove
team->lock entirely.

I looked into it, the only problem I see is team_nl_fill_one_option_get()
function. That operates without rtnl being held. Taking rtnl here is too
heavy, given that it may be repeatedly used to fetch hash stats.
It's read only, could be probably converted to RCU. It's in my todo list,
once I have some spare cycles.




>
>>         err = dev_open(port_dev, extack);
>> +       mutex_lock(&team->lock);
>
>
>
>>         if (err) {
>>                 netdev_dbg(dev, "Device %s opening failed\n",
>>                            portname);
>> --
diff mbox series

Patch

diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c
index ab1935a4aa2c..ee595c3c6624 100644
--- a/drivers/net/team/team_core.c
+++ b/drivers/net/team/team_core.c
@@ -1212,8 +1212,9 @@  static int team_port_add(struct team *team, struct net_device *port_dev,
 			   portname);
 		goto err_port_enter;
 	}
-
+	mutex_unlock(&team->lock);
 	err = dev_open(port_dev, extack);
+	mutex_lock(&team->lock);
 	if (err) {
 		netdev_dbg(dev, "Device %s opening failed\n",
 			   portname);