Message ID | 20240731150940.14106-1-aha310510@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] rtnetlink: fix possible deadlock in team_port_change_check | expand |
On Thu, Aug 01, 2024 at 12:09:40AM +0900, Jeongjun Park wrote: > In do_setlink() , do_set_master() is called when dev->flags does not have > the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called, > which generates the NETDEV_UP event. This causes a deadlock as it tries to > acquire 'team->lock' again. > > To solve this, we need to unlock 'team->lock' before calling dev_open() > in team_port_add() and then reacquire the lock when dev_open() returns. > Since the implementation acquires the lock in advance when the team > structure is used inside dev_open(), data races will not occur even if it > is briefly unlocked. > > > Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com > Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"") The fixes tag shouldn't be ec4ffd100ffb, as the issue exists before ec4ffd100ffb. I think it should be 3d249d4ca7d0 ("net: introduce ethernet teaming device") Jiri, what do you think? Thanks Hangbin
On Wed, Jul 31, 2024 at 5:10 PM Jeongjun Park <aha310510@gmail.com> wrote: > > In do_setlink() , do_set_master() is called when dev->flags does not have > the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called, > which generates the NETDEV_UP event. This causes a deadlock as it tries to > acquire 'team->lock' again. > > To solve this, we need to unlock 'team->lock' before calling dev_open() > in team_port_add() and then reacquire the lock when dev_open() returns. > Since the implementation acquires the lock in advance when the team > structure is used inside dev_open(), data races will not occur even if it > is briefly unlocked. > > ============================================ > WARNING: possible recursive locking detected > 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted > -------------------------------------------- > syz.0.15/5889 is trying to acquire lock: > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline] > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973 > > but task is already holding lock: > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(team->team_lock_key#2); > lock(team->team_lock_key#2); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by syz.0.15/5889: > #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline] > #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644 > #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975 > > stack backtrace: > CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > Call Trace: > <TASK> > __dump_stack lib/dump_stack.c:93 [inline] > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119 > check_deadlock kernel/locking/lockdep.c:3061 [inline] > validate_chain kernel/locking/lockdep.c:3855 [inline] > __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142 > lock_acquire kernel/locking/lockdep.c:5759 [inline] > lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724 > __mutex_lock_common kernel/locking/mutex.c:608 [inline] > __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752 > team_port_change_check drivers/net/team/team_core.c:2950 [inline] > team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973 > notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 > call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994 > call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] > call_netdevice_notifiers net/core/dev.c:2046 [inline] > __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876 > dev_change_flags+0x10c/0x160 net/core/dev.c:8914 > vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468 > notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 > call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994 > call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] > call_netdevice_notifiers net/core/dev.c:2046 [inline] > dev_open net/core/dev.c:1515 [inline] > dev_open+0x144/0x160 net/core/dev.c:1503 > team_port_add drivers/net/team/team_core.c:1216 [inline] > team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976 > do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701 > do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907 > __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696 > rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743 > rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647 > netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550 > netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] > netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357 > netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901 > sock_sendmsg_nosec net/socket.c:730 [inline] > __sock_sendmsg net/socket.c:745 [inline] > ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597 > ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651 > __sys_sendmsg+0x117/0x1f0 net/socket.c:2680 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > RIP: 0033:0x7fc07ed77299 > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e > RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299 > RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012 > RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528 > > Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com > Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"") > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > --- > drivers/net/team/team_core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c > index ab1935a4aa2c..ee595c3c6624 100644 > --- a/drivers/net/team/team_core.c > +++ b/drivers/net/team/team_core.c > @@ -1212,8 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev, > portname); > goto err_port_enter; > } > - > + mutex_unlock(&team->lock); Why would this be safe ? All checks done in team_port_add() before this point would need to be redone after mutex_lock() ? If another mutex (rtnl ?) is already protecting this path, this would suggest team->lock should be removed, and RTNL should be used in all needed paths. > err = dev_open(port_dev, extack); > + mutex_lock(&team->lock); > if (err) { > netdev_dbg(dev, "Device %s opening failed\n", > portname); > --
Eric Dumazet wrote: > > On Wed, Jul 31, 2024 at 5:10 PM Jeongjun Park <aha310510@gmail.com> wrote: > > > > In do_setlink() , do_set_master() is called when dev->flags does not have > > the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called, > > which generates the NETDEV_UP event. This causes a deadlock as it tries to > > acquire 'team->lock' again. > > > > To solve this, we need to unlock 'team->lock' before calling dev_open() > > in team_port_add() and then reacquire the lock when dev_open() returns. > > Since the implementation acquires the lock in advance when the team > > structure is used inside dev_open(), data races will not occur even if it > > is briefly unlocked. > > > > ============================================ > > WARNING: possible recursive locking detected > > 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted > > -------------------------------------------- > > syz.0.15/5889 is trying to acquire lock: > > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline] > > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973 > > > > but task is already holding lock: > > ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975 > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > > CPU0 > > ---- > > lock(team->team_lock_key#2); > > lock(team->team_lock_key#2); > > > > *** DEADLOCK *** > > > > May be due to missing lock nesting notation > > > > 2 locks held by syz.0.15/5889: > > #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline] > > #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644 > > #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975 > > > > stack backtrace: > > CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > > Call Trace: > > <TASK> > > __dump_stack lib/dump_stack.c:93 [inline] > > dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119 > > check_deadlock kernel/locking/lockdep.c:3061 [inline] > > validate_chain kernel/locking/lockdep.c:3855 [inline] > > __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142 > > lock_acquire kernel/locking/lockdep.c:5759 [inline] > > lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724 > > __mutex_lock_common kernel/locking/mutex.c:608 [inline] > > __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752 > > team_port_change_check drivers/net/team/team_core.c:2950 [inline] > > team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973 > > notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 > > call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994 > > call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] > > call_netdevice_notifiers net/core/dev.c:2046 [inline] > > __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876 > > dev_change_flags+0x10c/0x160 net/core/dev.c:8914 > > vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468 > > notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 > > call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994 > > call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] > > call_netdevice_notifiers net/core/dev.c:2046 [inline] > > dev_open net/core/dev.c:1515 [inline] > > dev_open+0x144/0x160 net/core/dev.c:1503 > > team_port_add drivers/net/team/team_core.c:1216 [inline] > > team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976 > > do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701 > > do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907 > > __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696 > > rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743 > > rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647 > > netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550 > > netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] > > netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357 > > netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901 > > sock_sendmsg_nosec net/socket.c:730 [inline] > > __sock_sendmsg net/socket.c:745 [inline] > > ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597 > > ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651 > > __sys_sendmsg+0x117/0x1f0 net/socket.c:2680 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > RIP: 0033:0x7fc07ed77299 > > Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > > RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e > > RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299 > > RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012 > > RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528 > > > > Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com > > Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"") > > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > > --- > > drivers/net/team/team_core.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c > > index ab1935a4aa2c..ee595c3c6624 100644 > > --- a/drivers/net/team/team_core.c > > +++ b/drivers/net/team/team_core.c > > @@ -1212,8 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev, > > portname); > > goto err_port_enter; > > } > > - > > + mutex_unlock(&team->lock); > > Why would this be safe ? > > All checks done in team_port_add() before this point would need to be > redone after mutex_lock() ? > > If another mutex (rtnl ?) is already protecting this path, this would > suggest team->lock should be removed, > and RTNL should be used in all needed paths. > If so, I will rewrite the patch and send by modifying team_port_change_check. Regards, Jeongjun Park
Thu, Aug 01, 2024 at 08:28:20AM CEST, edumazet@google.com wrote: >On Wed, Jul 31, 2024 at 5:10 PM Jeongjun Park <aha310510@gmail.com> wrote: >> >> In do_setlink() , do_set_master() is called when dev->flags does not have >> the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called, >> which generates the NETDEV_UP event. This causes a deadlock as it tries to >> acquire 'team->lock' again. >> >> To solve this, we need to unlock 'team->lock' before calling dev_open() >> in team_port_add() and then reacquire the lock when dev_open() returns. >> Since the implementation acquires the lock in advance when the team >> structure is used inside dev_open(), data races will not occur even if it >> is briefly unlocked. >> >> ============================================ >> WARNING: possible recursive locking detected >> 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted >> -------------------------------------------- >> syz.0.15/5889 is trying to acquire lock: >> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline] >> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973 >> >> but task is already holding lock: >> ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975 >> >> other info that might help us debug this: >> Possible unsafe locking scenario: >> >> CPU0 >> ---- >> lock(team->team_lock_key#2); >> lock(team->team_lock_key#2); >> >> *** DEADLOCK *** >> >> May be due to missing lock nesting notation >> >> 2 locks held by syz.0.15/5889: >> #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline] >> #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644 >> #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975 >> >> stack backtrace: >> CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 >> Call Trace: >> <TASK> >> __dump_stack lib/dump_stack.c:93 [inline] >> dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119 >> check_deadlock kernel/locking/lockdep.c:3061 [inline] >> validate_chain kernel/locking/lockdep.c:3855 [inline] >> __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142 >> lock_acquire kernel/locking/lockdep.c:5759 [inline] >> lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724 >> __mutex_lock_common kernel/locking/mutex.c:608 [inline] >> __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752 >> team_port_change_check drivers/net/team/team_core.c:2950 [inline] >> team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973 >> notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 >> call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994 >> call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] >> call_netdevice_notifiers net/core/dev.c:2046 [inline] >> __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876 >> dev_change_flags+0x10c/0x160 net/core/dev.c:8914 >> vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468 >> notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 >> call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994 >> call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] >> call_netdevice_notifiers net/core/dev.c:2046 [inline] >> dev_open net/core/dev.c:1515 [inline] >> dev_open+0x144/0x160 net/core/dev.c:1503 >> team_port_add drivers/net/team/team_core.c:1216 [inline] >> team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976 >> do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701 >> do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907 >> __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696 >> rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743 >> rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647 >> netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550 >> netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] >> netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357 >> netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901 >> sock_sendmsg_nosec net/socket.c:730 [inline] >> __sock_sendmsg net/socket.c:745 [inline] >> ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597 >> ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651 >> __sys_sendmsg+0x117/0x1f0 net/socket.c:2680 >> do_syscall_x64 arch/x86/entry/common.c:52 [inline] >> do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 >> entry_SYSCALL_64_after_hwframe+0x77/0x7f >> RIP: 0033:0x7fc07ed77299 >> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 >> RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e >> RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299 >> RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012 >> RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 >> R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528 >> >> Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com >> Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"") >> Signed-off-by: Jeongjun Park <aha310510@gmail.com> >> --- >> drivers/net/team/team_core.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c >> index ab1935a4aa2c..ee595c3c6624 100644 >> --- a/drivers/net/team/team_core.c >> +++ b/drivers/net/team/team_core.c >> @@ -1212,8 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev, >> portname); >> goto err_port_enter; >> } >> - >> + mutex_unlock(&team->lock); > >Why would this be safe ? > >All checks done in team_port_add() before this point would need to be >redone after mutex_lock() ? > >If another mutex (rtnl ?) is already protecting this path, this would >suggest team->lock should be removed, >and RTNL should be used in all needed paths. I agree. The problem is, not all other paths rely on rtnl, they rely on team->lock instead. I believe that the best solution is to remove team->lock entirely. I looked into it, the only problem I see is team_nl_fill_one_option_get() function. That operates without rtnl being held. Taking rtnl here is too heavy, given that it may be repeatedly used to fetch hash stats. It's read only, could be probably converted to RCU. It's in my todo list, once I have some spare cycles. > >> err = dev_open(port_dev, extack); >> + mutex_lock(&team->lock); > > > >> if (err) { >> netdev_dbg(dev, "Device %s opening failed\n", >> portname); >> --
diff --git a/drivers/net/team/team_core.c b/drivers/net/team/team_core.c index ab1935a4aa2c..ee595c3c6624 100644 --- a/drivers/net/team/team_core.c +++ b/drivers/net/team/team_core.c @@ -1212,8 +1212,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev, portname); goto err_port_enter; } - + mutex_unlock(&team->lock); err = dev_open(port_dev, extack); + mutex_lock(&team->lock); if (err) { netdev_dbg(dev, "Device %s opening failed\n", portname);
In do_setlink() , do_set_master() is called when dev->flags does not have the IFF_UP flag set, so 'team->lock' is acquired and dev_open() is called, which generates the NETDEV_UP event. This causes a deadlock as it tries to acquire 'team->lock' again. To solve this, we need to unlock 'team->lock' before calling dev_open() in team_port_add() and then reacquire the lock when dev_open() returns. Since the implementation acquires the lock in advance when the team structure is used inside dev_open(), data races will not occur even if it is briefly unlocked. ============================================ WARNING: possible recursive locking detected 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Not tainted -------------------------------------------- syz.0.15/5889 is trying to acquire lock: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_port_change_check drivers/net/team/team_core.c:2950 [inline] ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973 but task is already holding lock: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(team->team_lock_key#2); lock(team->team_lock_key#2); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by syz.0.15/5889: #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline] #0: ffffffff8fa1f4e8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x372/0xea0 net/core/rtnetlink.c:6644 #1: ffff8880231e4d40 (team->team_lock_key#2){+.+.}-{3:3}, at: team_add_slave+0x9c/0x20e0 drivers/net/team/team_core.c:1975 stack backtrace: CPU: 1 UID: 0 PID: 5889 Comm: syz.0.15 Not tainted 6.11.0-rc1-syzkaller-ge4fc196f5ba3-dirty #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:119 check_deadlock kernel/locking/lockdep.c:3061 [inline] validate_chain kernel/locking/lockdep.c:3855 [inline] __lock_acquire+0x2167/0x3cb0 kernel/locking/lockdep.c:5142 lock_acquire kernel/locking/lockdep.c:5759 [inline] lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5724 __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752 team_port_change_check drivers/net/team/team_core.c:2950 [inline] team_device_event+0x2c7/0x770 drivers/net/team/team_core.c:2973 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994 call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] call_netdevice_notifiers net/core/dev.c:2046 [inline] __dev_notify_flags+0x12d/0x2e0 net/core/dev.c:8876 dev_change_flags+0x10c/0x160 net/core/dev.c:8914 vlan_device_event+0xdfc/0x2120 net/8021q/vlan.c:468 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1994 call_netdevice_notifiers_extack net/core/dev.c:2032 [inline] call_netdevice_notifiers net/core/dev.c:2046 [inline] dev_open net/core/dev.c:1515 [inline] dev_open+0x144/0x160 net/core/dev.c:1503 team_port_add drivers/net/team/team_core.c:1216 [inline] team_add_slave+0xacd/0x20e0 drivers/net/team/team_core.c:1976 do_set_master+0x1bc/0x230 net/core/rtnetlink.c:2701 do_setlink+0x306d/0x4060 net/core/rtnetlink.c:2907 __rtnl_newlink+0xc35/0x1960 net/core/rtnetlink.c:3696 rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3743 rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6647 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2550 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] netlink_unicast+0x544/0x830 net/netlink/af_netlink.c:1357 netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0xab5/0xc90 net/socket.c:2597 ___sys_sendmsg+0x135/0x1e0 net/socket.c:2651 __sys_sendmsg+0x117/0x1f0 net/socket.c:2680 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc07ed77299 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fc07fb7f048 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fc07ef05f80 RCX: 00007fc07ed77299 RDX: 0000000000000000 RSI: 0000000020000600 RDI: 0000000000000012 RBP: 00007fc07ede48e6 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fc07ef05f80 R15: 00007ffeb5c0d528 Reported-by: syzbot+b668da2bc4cb9670bf58@syzkaller.appspotmail.com Fixes: ec4ffd100ffb ("Revert "net: rtnetlink: Enslave device before bringing it up"") Signed-off-by: Jeongjun Park <aha310510@gmail.com> --- drivers/net/team/team_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --