Message ID | 20220519020148.1058344-1-liuhangbin@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 9b80ccda233fa6c59de411bf889cc4d0e028f2c7 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [PATCHv3,net] bonding: fix missed rcu protection | expand |
On Thu, May 19, 2022 at 10:01:48AM +0800, Hangbin Liu wrote: > When removing the rcu_read_lock in bond_ethtool_get_ts_info() as > discussed [1], I didn't notice it could be called via setsockopt, > which doesn't hold rcu lock, as syzbot pointed: > > stack backtrace: > CPU: 0 PID: 3599 Comm: syz-executor317 Not tainted 5.18.0-rc5-syzkaller-01392-g01f4685797a5 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > <TASK> > __dump_stack lib/dump_stack.c:88 [inline] > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 > bond_option_active_slave_get_rcu include/net/bonding.h:353 [inline] > bond_ethtool_get_ts_info+0x32c/0x3a0 drivers/net/bonding/bond_main.c:5595 > __ethtool_get_ts_info+0x173/0x240 net/ethtool/common.c:554 > ethtool_get_phc_vclocks+0x99/0x110 net/ethtool/common.c:568 > sock_timestamping_bind_phc net/core/sock.c:869 [inline] > sock_set_timestamping+0x3a3/0x7e0 net/core/sock.c:916 > sock_setsockopt+0x543/0x2ec0 net/core/sock.c:1221 > __sys_setsockopt+0x55e/0x6a0 net/socket.c:2223 > __do_sys_setsockopt net/socket.c:2238 [inline] > __se_sys_setsockopt net/socket.c:2235 [inline] > __x64_sys_setsockopt+0xba/0x150 net/socket.c:2235 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x7f8902c8eb39 > > Fix it by adding rcu_read_lock and take a ref on the real_dev. > Since dev_hold() and dev_put() can take NULL these days, we can > skip checking if real_dev exist. > > [1] https://lore.kernel.org/netdev/27565.1642742439@famine/ > > Reported-by: syzbot+92beb3d46aab498710fa@syzkaller.appspotmail.com > Fixes: aa6034678e87 ("bonding: use rcu_dereference_rtnl when get bonding active slave") > Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > --- Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Hello: This patch was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Thu, 19 May 2022 10:01:48 +0800 you wrote: > When removing the rcu_read_lock in bond_ethtool_get_ts_info() as > discussed [1], I didn't notice it could be called via setsockopt, > which doesn't hold rcu lock, as syzbot pointed: > > stack backtrace: > CPU: 0 PID: 3599 Comm: syz-executor317 Not tainted 5.18.0-rc5-syzkaller-01392-g01f4685797a5 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > <TASK> > __dump_stack lib/dump_stack.c:88 [inline] > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 > bond_option_active_slave_get_rcu include/net/bonding.h:353 [inline] > bond_ethtool_get_ts_info+0x32c/0x3a0 drivers/net/bonding/bond_main.c:5595 > __ethtool_get_ts_info+0x173/0x240 net/ethtool/common.c:554 > ethtool_get_phc_vclocks+0x99/0x110 net/ethtool/common.c:568 > sock_timestamping_bind_phc net/core/sock.c:869 [inline] > sock_set_timestamping+0x3a3/0x7e0 net/core/sock.c:916 > sock_setsockopt+0x543/0x2ec0 net/core/sock.c:1221 > __sys_setsockopt+0x55e/0x6a0 net/socket.c:2223 > __do_sys_setsockopt net/socket.c:2238 [inline] > __se_sys_setsockopt net/socket.c:2235 [inline] > __x64_sys_setsockopt+0xba/0x150 net/socket.c:2235 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x7f8902c8eb39 > > [...] Here is the summary with links: - [PATCHv3,net] bonding: fix missed rcu protection https://git.kernel.org/netdev/net/c/9b80ccda233f You are awesome, thank you!
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 38e152548126..b5c5196e03ee 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -5591,16 +5591,23 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev, const struct ethtool_ops *ops; struct net_device *real_dev; struct phy_device *phydev; + int ret = 0; + rcu_read_lock(); real_dev = bond_option_active_slave_get_rcu(bond); + dev_hold(real_dev); + rcu_read_unlock(); + if (real_dev) { ops = real_dev->ethtool_ops; phydev = real_dev->phydev; if (phy_has_tsinfo(phydev)) { - return phy_ts_info(phydev, info); + ret = phy_ts_info(phydev, info); + goto out; } else if (ops->get_ts_info) { - return ops->get_ts_info(real_dev, info); + ret = ops->get_ts_info(real_dev, info); + goto out; } } @@ -5608,7 +5615,9 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev, SOF_TIMESTAMPING_SOFTWARE; info->phc_index = -1; - return 0; +out: + dev_put(real_dev); + return ret; } static const struct ethtool_ops bond_ethtool_ops = {
When removing the rcu_read_lock in bond_ethtool_get_ts_info() as discussed [1], I didn't notice it could be called via setsockopt, which doesn't hold rcu lock, as syzbot pointed: stack backtrace: CPU: 0 PID: 3599 Comm: syz-executor317 Not tainted 5.18.0-rc5-syzkaller-01392-g01f4685797a5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 bond_option_active_slave_get_rcu include/net/bonding.h:353 [inline] bond_ethtool_get_ts_info+0x32c/0x3a0 drivers/net/bonding/bond_main.c:5595 __ethtool_get_ts_info+0x173/0x240 net/ethtool/common.c:554 ethtool_get_phc_vclocks+0x99/0x110 net/ethtool/common.c:568 sock_timestamping_bind_phc net/core/sock.c:869 [inline] sock_set_timestamping+0x3a3/0x7e0 net/core/sock.c:916 sock_setsockopt+0x543/0x2ec0 net/core/sock.c:1221 __sys_setsockopt+0x55e/0x6a0 net/socket.c:2223 __do_sys_setsockopt net/socket.c:2238 [inline] __se_sys_setsockopt net/socket.c:2235 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2235 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f8902c8eb39 Fix it by adding rcu_read_lock and take a ref on the real_dev. Since dev_hold() and dev_put() can take NULL these days, we can skip checking if real_dev exist. [1] https://lore.kernel.org/netdev/27565.1642742439@famine/ Reported-by: syzbot+92beb3d46aab498710fa@syzkaller.appspotmail.com Fixes: aa6034678e87 ("bonding: use rcu_dereference_rtnl when get bonding active slave") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- v3: skip checking if real_dev exist since dev_hold/put could take NULL. v2: add ref on the real_dev as Jakub and Paolo suggested. --- drivers/net/bonding/bond_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)