Message ID | 20250116092159.50890-1-atenart@kernel.org (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net: avoid race between device unregistration and ethnl ops | expand |
On 1/16/25 10:21, Antoine Tenart wrote: > The following trace can be seen if a device is being unregistered while > its number of channels are being modified. > > DEBUG_LOCKS_WARN_ON(lock->magic != lock) > WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120 > CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771 > RIP: 0010:__mutex_lock+0xc8a/0x1120 > Call Trace: > <TASK> > ethtool_check_max_channel+0x1ea/0x880 > ethnl_set_channels+0x3c3/0xb10 > ethnl_default_set_doit+0x306/0x650 > genl_family_rcv_msg_doit+0x1e3/0x2c0 > genl_rcv_msg+0x432/0x6f0 > netlink_rcv_skb+0x13d/0x3b0 > genl_rcv+0x28/0x40 > netlink_unicast+0x42e/0x720 > netlink_sendmsg+0x765/0xc20 > __sys_sendto+0x3ac/0x420 > __x64_sys_sendto+0xe0/0x1c0 > do_syscall_64+0x95/0x180 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > This is because unregister_netdevice_many_notify might run before the > rtnl lock section of ethnl operations, eg. set_channels in the above > example. In this example the rss lock would be destroyed by the device > unregistration path before being used again, but in general running > ethnl operations while dismantle has started is not a good idea. > > Fix this by denying any operation on devices being unregistered. A check > was already there in ethnl_ops_begin, but not wide enough. > > Note that the same issue cannot be seen on the ioctl version > (__dev_ethtool) because the device reference is retrieved from within > the rtnl lock section there. Once dismantle started, the net device is > unlisted and no reference will be found. > > Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered") > Signed-off-by: Antoine Tenart <atenart@kernel.org> > --- for future submissions, please add a changelog and a link to previous revisions > net/ethtool/netlink.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c > index e3f0ef6b851b..4d18dc29b304 100644 > --- a/net/ethtool/netlink.c > +++ b/net/ethtool/netlink.c > @@ -90,7 +90,7 @@ int ethnl_ops_begin(struct net_device *dev) > pm_runtime_get_sync(dev->dev.parent); > > if (!netif_device_present(dev) || > - dev->reg_state == NETREG_UNREGISTERING) { > + dev->reg_state >= NETREG_UNREGISTERING) { looks good, but I would add a comment above enum netdev_reg_state definition, to avoid any new state added "at the end" what about NETREG_DUMMY? you want to cover it here too? > ret = -ENODEV; > goto err; > }
Quoting Przemek Kitszel (2025-01-16 10:44:40) > On 1/16/25 10:21, Antoine Tenart wrote: > > The following trace can be seen if a device is being unregistered while > > its number of channels are being modified. > > > > DEBUG_LOCKS_WARN_ON(lock->magic != lock) > > WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120 > > CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771 > > RIP: 0010:__mutex_lock+0xc8a/0x1120 > > Call Trace: > > <TASK> > > ethtool_check_max_channel+0x1ea/0x880 > > ethnl_set_channels+0x3c3/0xb10 > > ethnl_default_set_doit+0x306/0x650 > > genl_family_rcv_msg_doit+0x1e3/0x2c0 > > genl_rcv_msg+0x432/0x6f0 > > netlink_rcv_skb+0x13d/0x3b0 > > genl_rcv+0x28/0x40 > > netlink_unicast+0x42e/0x720 > > netlink_sendmsg+0x765/0xc20 > > __sys_sendto+0x3ac/0x420 > > __x64_sys_sendto+0xe0/0x1c0 > > do_syscall_64+0x95/0x180 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > This is because unregister_netdevice_many_notify might run before the > > rtnl lock section of ethnl operations, eg. set_channels in the above > > example. In this example the rss lock would be destroyed by the device > > unregistration path before being used again, but in general running > > ethnl operations while dismantle has started is not a good idea. > > > > Fix this by denying any operation on devices being unregistered. A check > > was already there in ethnl_ops_begin, but not wide enough. > > > > Note that the same issue cannot be seen on the ioctl version > > (__dev_ethtool) because the device reference is retrieved from within > > the rtnl lock section there. Once dismantle started, the net device is > > unlisted and no reference will be found. > > > > Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered") > > Signed-off-by: Antoine Tenart <atenart@kernel.org> > > --- > > for future submissions, please add a changelog and a link to previous > revisions This one was a bit special as v2 is completely different from v1, not much to describe. But sure, at least a link could help. > > net/ethtool/netlink.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c > > index e3f0ef6b851b..4d18dc29b304 100644 > > --- a/net/ethtool/netlink.c > > +++ b/net/ethtool/netlink.c > > @@ -90,7 +90,7 @@ int ethnl_ops_begin(struct net_device *dev) > > pm_runtime_get_sync(dev->dev.parent); > > > > if (!netif_device_present(dev) || > > - dev->reg_state == NETREG_UNREGISTERING) { > > + dev->reg_state >= NETREG_UNREGISTERING) { > > looks good, but I would add a comment above enum netdev_reg_state > definition, to avoid any new state added "at the end" > > what about NETREG_DUMMY? you want to cover it here too? I'm not super familiar with NETREG_DUMMY but my understanding is those devices aren't listed and aren't accessible through ethnl. Having said that I do agree the checks on reg_state could be consolidated, eg. reusing and improving dev_isalive(). I actually planned to have a look at if this would make sense later on. tl;dr; I don't think there's an issue in practice but we could probably consolidate the code to make things easier to maintain and to read. Thanks, Antoine
On 1/16/25 14:47, Antoine Tenart wrote: > Quoting Przemek Kitszel (2025-01-16 10:44:40) >> On 1/16/25 10:21, Antoine Tenart wrote: >>> The following trace can be seen if a device is being unregistered while >>> its number of channels are being modified. >>> >>> DEBUG_LOCKS_WARN_ON(lock->magic != lock) >>> WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120 >>> CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771 >>> RIP: 0010:__mutex_lock+0xc8a/0x1120 >>> Call Trace: >>> <TASK> >>> ethtool_check_max_channel+0x1ea/0x880 >>> ethnl_set_channels+0x3c3/0xb10 >>> ethnl_default_set_doit+0x306/0x650 >>> genl_family_rcv_msg_doit+0x1e3/0x2c0 >>> genl_rcv_msg+0x432/0x6f0 >>> netlink_rcv_skb+0x13d/0x3b0 >>> genl_rcv+0x28/0x40 >>> netlink_unicast+0x42e/0x720 >>> netlink_sendmsg+0x765/0xc20 >>> __sys_sendto+0x3ac/0x420 >>> __x64_sys_sendto+0xe0/0x1c0 >>> do_syscall_64+0x95/0x180 >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>> >>> This is because unregister_netdevice_many_notify might run before the >>> rtnl lock section of ethnl operations, eg. set_channels in the above >>> example. In this example the rss lock would be destroyed by the device >>> unregistration path before being used again, but in general running >>> ethnl operations while dismantle has started is not a good idea. >>> >>> Fix this by denying any operation on devices being unregistered. A check >>> was already there in ethnl_ops_begin, but not wide enough. >>> >>> Note that the same issue cannot be seen on the ioctl version >>> (__dev_ethtool) because the device reference is retrieved from within >>> the rtnl lock section there. Once dismantle started, the net device is >>> unlisted and no reference will be found. >>> >>> Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered") >>> Signed-off-by: Antoine Tenart <atenart@kernel.org> >>> --- >> >> for future submissions, please add a changelog and a link to previous >> revisions > > This one was a bit special as v2 is completely different from v1, not > much to describe. But sure, at least a link could help. > >>> net/ethtool/netlink.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c >>> index e3f0ef6b851b..4d18dc29b304 100644 >>> --- a/net/ethtool/netlink.c >>> +++ b/net/ethtool/netlink.c >>> @@ -90,7 +90,7 @@ int ethnl_ops_begin(struct net_device *dev) >>> pm_runtime_get_sync(dev->dev.parent); >>> >>> if (!netif_device_present(dev) || >>> - dev->reg_state == NETREG_UNREGISTERING) { >>> + dev->reg_state >= NETREG_UNREGISTERING) { >> >> looks good, but I would add a comment above enum netdev_reg_state >> definition, to avoid any new state added "at the end" with your interest of more improvements in the area, current patch is fine for me as a fix, so: Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> >> >> what about NETREG_DUMMY? you want to cover it here too? > > I'm not super familiar with NETREG_DUMMY but my understanding is those > devices aren't listed and aren't accessible through ethnl. > > Having said that I do agree the checks on reg_state could be > consolidated, eg. reusing and improving dev_isalive(). I actually > planned to have a look at if this would make sense later on. > > tl;dr; I don't think there's an issue in practice but we could probably > consolidate the code to make things easier to maintain and to read. agree > > Thanks, > Antoine
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c index e3f0ef6b851b..4d18dc29b304 100644 --- a/net/ethtool/netlink.c +++ b/net/ethtool/netlink.c @@ -90,7 +90,7 @@ int ethnl_ops_begin(struct net_device *dev) pm_runtime_get_sync(dev->dev.parent); if (!netif_device_present(dev) || - dev->reg_state == NETREG_UNREGISTERING) { + dev->reg_state >= NETREG_UNREGISTERING) { ret = -ENODEV; goto err; }
The following trace can be seen if a device is being unregistered while its number of channels are being modified. DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 3 PID: 3754 at kernel/locking/mutex.c:564 __mutex_lock+0xc8a/0x1120 CPU: 3 UID: 0 PID: 3754 Comm: ethtool Not tainted 6.13.0-rc6+ #771 RIP: 0010:__mutex_lock+0xc8a/0x1120 Call Trace: <TASK> ethtool_check_max_channel+0x1ea/0x880 ethnl_set_channels+0x3c3/0xb10 ethnl_default_set_doit+0x306/0x650 genl_family_rcv_msg_doit+0x1e3/0x2c0 genl_rcv_msg+0x432/0x6f0 netlink_rcv_skb+0x13d/0x3b0 genl_rcv+0x28/0x40 netlink_unicast+0x42e/0x720 netlink_sendmsg+0x765/0xc20 __sys_sendto+0x3ac/0x420 __x64_sys_sendto+0xe0/0x1c0 do_syscall_64+0x95/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e This is because unregister_netdevice_many_notify might run before the rtnl lock section of ethnl operations, eg. set_channels in the above example. In this example the rss lock would be destroyed by the device unregistration path before being used again, but in general running ethnl operations while dismantle has started is not a good idea. Fix this by denying any operation on devices being unregistered. A check was already there in ethnl_ops_begin, but not wide enough. Note that the same issue cannot be seen on the ioctl version (__dev_ethtool) because the device reference is retrieved from within the rtnl lock section there. Once dismantle started, the net device is unlisted and no reference will be found. Fixes: dde91ccfa25f ("ethtool: do not perform operations on net devices being unregistered") Signed-off-by: Antoine Tenart <atenart@kernel.org> --- net/ethtool/netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)