Message ID | 20230410195220.1335670-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net] net: don't omit syncing RX filters to devices that are down | expand |
On Mon, Apr 10, 2023 at 10:52:20PM +0300, Vladimir Oltean wrote: > There are 2 possible ways to solve the issue. > > Alternatively, we could remove the check/optimization and thus make > dev_mc_del() always propagate down to the ndo_set_rx_mode() of the > device. This would implicitly solve the IGMP/IGMP6 code paths with DSA, > as well as any other potential issues of this kind with address deletion > not being synced prior to device removal. Self NACK. After a more careful inspection of dmesg, I now notice this WARN_ON during probe time: [ 7.710448] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL) [ 7.735401] [ 7.736921] ============================================ [ 7.742266] WARNING: possible recursive locking detected [ 7.747610] 6.3.0-rc5-01277-g8ec1b4985857 #77 Not tainted [ 7.753048] -------------------------------------------- [ 7.758391] kworker/u4:0/8 is trying to acquire lock: [ 7.763477] ffff5c348439a280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0 [ 7.770991] [ 7.770991] but task is already holding lock: [ 7.776859] ffff5c34843e1280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0 [ 7.784358] [ 7.784358] other info that might help us debug this: [ 7.790924] Possible unsafe locking scenario: [ 7.790924] [ 7.796876] CPU0 [ 7.799340] ---- [ 7.801803] lock(_xmit_ETHER); [ 7.805073] lock(_xmit_ETHER); [ 7.808342] [ 7.808342] *** DEADLOCK *** [ 7.808342] [ 7.814295] May be due to missing lock nesting notation [ 7.814295] [ 7.821119] 7 locks held by kworker/u4:0/8: [ 7.825334] #0: ffff5c3480007948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1f8/0x568 [ 7.835549] #1: ffff80000856bd58 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x224/0x568 [ 7.844886] #2: ffff5c34826a71c0 (&dev->mutex){....}-{4:4}, at: __device_attach+0x48/0x1a0 [ 7.853358] #3: ffffb9e51aa8db80 (dsa2_mutex){+.+.}-{4:4}, at: dsa_register_switch+0x50/0x1188 [ 7.862182] #4: ffffb9e51aa70788 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x40 [ 7.869956] #5: ffff5c34843f95b0 (&idev->mc_lock){+.+.}-{4:4}, at: __ipv6_dev_mc_inc+0xa8/0x498 [ 7.878864] #6: ffff5c34843e1280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0 [ 7.886803] [ 7.886803] stack backtrace: [ 7.891188] CPU: 1 PID: 8 Comm: kworker/u4:0 Not tainted 6.3.0-rc5-01277-g8ec1b4985857 #77 [ 7.904249] Workqueue: events_unbound deferred_probe_work_func [ 7.910146] Call trace: [ 7.912611] dump_backtrace+0x108/0x130 [ 7.916486] show_stack+0x24/0x30 [ 7.919832] dump_stack_lvl+0x60/0x80 [ 7.923531] dump_stack+0x18/0x28 [ 7.926880] __lock_acquire+0x7e8/0x2fc8 [ 7.930850] lock_acquire+0x118/0x260 [ 7.934555] _raw_spin_lock_nested+0x68/0xb0 [ 7.938871] dev_mc_add+0x40/0xa0 [ 7.942218] dsa_slave_sync_mc+0x68/0x180 [ 7.946264] __hw_addr_sync_dev+0x138/0x158 [ 7.950483] dsa_slave_set_rx_mode+0x3c/0x70 [ 7.954796] __dev_set_rx_mode+0x80/0xa0 [ 7.958762] dev_mc_add+0x74/0xa0 [ 7.962109] igmp6_group_added+0x78/0x128 [ 7.966162] __ipv6_dev_mc_inc+0x278/0x498 [ 7.970299] ipv6_dev_mc_inc+0x20/0x38 [ 7.974087] ipv6_add_dev+0x3f0/0x4d0 [ 7.977791] addrconf_notify+0x1b0/0x4a8 [ 7.981757] raw_notifier_call_chain+0x50/0x88 [ 7.986254] call_netdevice_notifiers+0x74/0xd0 [ 7.990823] register_netdevice+0x4f0/0x600 [ 7.995054] dsa_slave_create+0x3f8/0x620 [ 7.999099] dsa_port_setup+0x10c/0x158 [ 8.002978] dsa_register_switch+0xe18/0x1188 [ 8.007378] felix_pci_probe+0x120/0x1f0 [ 8.011345] pci_device_probe+0x1b0/0x278 [ 8.015394] really_probe+0x13c/0x2f8 [ 8.019097] __driver_probe_device+0xc0/0xf8 [ 8.023409] driver_probe_device+0x48/0x218 [ 8.027634] __device_attach_driver+0x128/0x158 [ 8.032209] bus_for_each_drv+0x12c/0x160 [ 8.036257] __device_attach+0xcc/0x1a0 [ 8.040131] device_initial_probe+0x20/0x38 [ 8.044356] bus_probe_device+0xa0/0x118 [ 8.048316] deferred_probe_work_func+0x98/0xe0 [ 8.052890] process_one_work+0x290/0x568 [ 8.056935] worker_thread+0x238/0x4a8 [ 8.060716] kthread+0x108/0x130 [ 8.063983] ret_from_fork+0x10/0x20 which appears to be a false positive caused by the newly opened time window in which DSA user ports have net_devices registered, but they aren't yet upper devices of the DSA master, so their dev->nested_level is the same. It seems like I will return to the targeted fix for v3, after some more investigation here tomorrow.
diff --git a/net/core/dev.c b/net/core/dev.c index 480600a075ce..a83f725c76d8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -8448,10 +8448,6 @@ void __dev_set_rx_mode(struct net_device *dev) { const struct net_device_ops *ops = dev->netdev_ops; - /* dev_open will call this function so the list will stay sane. */ - if (!(dev->flags&IFF_UP)) - return; - if (!netif_device_present(dev)) return;
During NETDEV_DOWN netdev events, ipv4 devinet calls ip_mc_down(), and ipv6 calls addrconf_ifdown(), and both of these eventually result in calls to dev_mc_del(), either through igmp_group_dropped() or igmp6_group_dropped(). During the handling of that notifier, the IFF_UP dev->flags bit of the device is unset. The problem is that, although dev_mc_del() does call __dev_set_rx_mode(), this will not propagate all the way to the ndo_set_rx_mode() of the device, because of the IFF_UP check removed by this change. DSA does some processing in its dsa_slave_set_rx_mode(), and assumes that all addresses that were synced by higher layers are also unsynced by the time the device driver is unregistered. That unsatisfied assumption triggers the WARN_ON(!list_empty(&dp->mdbs)) call from dsa_switch_release_ports(), and we leak memory corresponding to the multicast addresses that were never unsynced. Minimal reproducer: ip link set swp0 up ip link set swp0 down echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind There are 2 possible ways to solve the issue. One would be to change devinet and addrconf to react to the earlier NETDEV_GOING_DOWN event (which is emitted while dev->flags still has IFF_UP set). That would work, but it would require paying attention in the future to other call paths that would also potentially need the same change. Alternatively, we could remove the check/optimization and thus make dev_mc_del() always propagate down to the ndo_set_rx_mode() of the device. This would implicitly solve the IGMP/IGMP6 code paths with DSA, as well as any other potential issues of this kind with address deletion not being synced prior to device removal. Fixes: 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") Link: https://lore.kernel.org/netdev/ZDP2bxXGbHX8C4BC@shredder/ Suggested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- v1 is in the Link. A different approach was used here. net/core/dev.c | 4 ---- 1 file changed, 4 deletions(-)