diff mbox series

[v2,net] net: don't omit syncing RX filters to devices that are down

Message ID 20230410195220.1335670-1-vladimir.oltean@nxp.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2,net] net: don't omit syncing RX filters to devices that are down | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 26 this patch: 26
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 26 this patch: 26
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vladimir Oltean April 10, 2023, 7:52 p.m. UTC
During NETDEV_DOWN netdev events, ipv4 devinet calls ip_mc_down(),
and ipv6 calls addrconf_ifdown(), and both of these eventually result in
calls to dev_mc_del(), either through igmp_group_dropped() or
igmp6_group_dropped(). During the handling of that notifier, the IFF_UP
dev->flags bit of the device is unset.

The problem is that, although dev_mc_del() does call __dev_set_rx_mode(),
this will not propagate all the way to the ndo_set_rx_mode() of the
device, because of the IFF_UP check removed by this change.

DSA does some processing in its dsa_slave_set_rx_mode(), and assumes
that all addresses that were synced by higher layers are also unsynced
by the time the device driver is unregistered.

That unsatisfied assumption triggers the WARN_ON(!list_empty(&dp->mdbs))
call from dsa_switch_release_ports(), and we leak memory corresponding
to the multicast addresses that were never unsynced.

Minimal reproducer:
ip link set swp0 up
ip link set swp0 down
echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind

There are 2 possible ways to solve the issue.

One would be to change devinet and addrconf to react to the earlier
NETDEV_GOING_DOWN event (which is emitted while dev->flags still has
IFF_UP set). That would work, but it would require paying attention in
the future to other call paths that would also potentially need the same
change.

Alternatively, we could remove the check/optimization and thus make
dev_mc_del() always propagate down to the ndo_set_rx_mode() of the
device. This would implicitly solve the IGMP/IGMP6 code paths with DSA,
as well as any other potential issues of this kind with address deletion
not being synced prior to device removal.

Fixes: 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB")
Link: https://lore.kernel.org/netdev/ZDP2bxXGbHX8C4BC@shredder/
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1 is in the Link. A different approach was used here.

 net/core/dev.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Vladimir Oltean April 10, 2023, 11:52 p.m. UTC | #1
On Mon, Apr 10, 2023 at 10:52:20PM +0300, Vladimir Oltean wrote:
> There are 2 possible ways to solve the issue.
> 
> Alternatively, we could remove the check/optimization and thus make
> dev_mc_del() always propagate down to the ndo_set_rx_mode() of the
> device. This would implicitly solve the IGMP/IGMP6 code paths with DSA,
> as well as any other potential issues of this kind with address deletion
> not being synced prior to device removal.

Self NACK.

After a more careful inspection of dmesg, I now notice this WARN_ON
during probe time:

[    7.710448] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    7.735401]
[    7.736921] ============================================
[    7.742266] WARNING: possible recursive locking detected
[    7.747610] 6.3.0-rc5-01277-g8ec1b4985857 #77 Not tainted
[    7.753048] --------------------------------------------
[    7.758391] kworker/u4:0/8 is trying to acquire lock:
[    7.763477] ffff5c348439a280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0
[    7.770991]
[    7.770991] but task is already holding lock:
[    7.776859] ffff5c34843e1280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0
[    7.784358]
[    7.784358] other info that might help us debug this:
[    7.790924]  Possible unsafe locking scenario:
[    7.790924]
[    7.796876]        CPU0
[    7.799340]        ----
[    7.801803]   lock(_xmit_ETHER);
[    7.805073]   lock(_xmit_ETHER);
[    7.808342]
[    7.808342]  *** DEADLOCK ***
[    7.808342]
[    7.814295]  May be due to missing lock nesting notation
[    7.814295]
[    7.821119] 7 locks held by kworker/u4:0/8:
[    7.825334]  #0: ffff5c3480007948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1f8/0x568
[    7.835549]  #1: ffff80000856bd58 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x224/0x568
[    7.844886]  #2: ffff5c34826a71c0 (&dev->mutex){....}-{4:4}, at: __device_attach+0x48/0x1a0
[    7.853358]  #3: ffffb9e51aa8db80 (dsa2_mutex){+.+.}-{4:4}, at: dsa_register_switch+0x50/0x1188
[    7.862182]  #4: ffffb9e51aa70788 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x40
[    7.869956]  #5: ffff5c34843f95b0 (&idev->mc_lock){+.+.}-{4:4}, at: __ipv6_dev_mc_inc+0xa8/0x498
[    7.878864]  #6: ffff5c34843e1280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0
[    7.886803]
[    7.886803] stack backtrace:
[    7.891188] CPU: 1 PID: 8 Comm: kworker/u4:0 Not tainted 6.3.0-rc5-01277-g8ec1b4985857 #77
[    7.904249] Workqueue: events_unbound deferred_probe_work_func
[    7.910146] Call trace:
[    7.912611]  dump_backtrace+0x108/0x130
[    7.916486]  show_stack+0x24/0x30
[    7.919832]  dump_stack_lvl+0x60/0x80
[    7.923531]  dump_stack+0x18/0x28
[    7.926880]  __lock_acquire+0x7e8/0x2fc8
[    7.930850]  lock_acquire+0x118/0x260
[    7.934555]  _raw_spin_lock_nested+0x68/0xb0
[    7.938871]  dev_mc_add+0x40/0xa0
[    7.942218]  dsa_slave_sync_mc+0x68/0x180
[    7.946264]  __hw_addr_sync_dev+0x138/0x158
[    7.950483]  dsa_slave_set_rx_mode+0x3c/0x70
[    7.954796]  __dev_set_rx_mode+0x80/0xa0
[    7.958762]  dev_mc_add+0x74/0xa0
[    7.962109]  igmp6_group_added+0x78/0x128
[    7.966162]  __ipv6_dev_mc_inc+0x278/0x498
[    7.970299]  ipv6_dev_mc_inc+0x20/0x38
[    7.974087]  ipv6_add_dev+0x3f0/0x4d0
[    7.977791]  addrconf_notify+0x1b0/0x4a8
[    7.981757]  raw_notifier_call_chain+0x50/0x88
[    7.986254]  call_netdevice_notifiers+0x74/0xd0
[    7.990823]  register_netdevice+0x4f0/0x600
[    7.995054]  dsa_slave_create+0x3f8/0x620
[    7.999099]  dsa_port_setup+0x10c/0x158
[    8.002978]  dsa_register_switch+0xe18/0x1188
[    8.007378]  felix_pci_probe+0x120/0x1f0
[    8.011345]  pci_device_probe+0x1b0/0x278
[    8.015394]  really_probe+0x13c/0x2f8
[    8.019097]  __driver_probe_device+0xc0/0xf8
[    8.023409]  driver_probe_device+0x48/0x218
[    8.027634]  __device_attach_driver+0x128/0x158
[    8.032209]  bus_for_each_drv+0x12c/0x160
[    8.036257]  __device_attach+0xcc/0x1a0
[    8.040131]  device_initial_probe+0x20/0x38
[    8.044356]  bus_probe_device+0xa0/0x118
[    8.048316]  deferred_probe_work_func+0x98/0xe0
[    8.052890]  process_one_work+0x290/0x568
[    8.056935]  worker_thread+0x238/0x4a8
[    8.060716]  kthread+0x108/0x130
[    8.063983]  ret_from_fork+0x10/0x20

which appears to be a false positive caused by the newly opened time
window in which DSA user ports have net_devices registered, but they
aren't yet upper devices of the DSA master, so their dev->nested_level
is the same.

It seems like I will return to the targeted fix for v3, after some more
investigation here tomorrow.
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 480600a075ce..a83f725c76d8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8448,10 +8448,6 @@  void __dev_set_rx_mode(struct net_device *dev)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 
-	/* dev_open will call this function so the list will stay sane. */
-	if (!(dev->flags&IFF_UP))
-		return;
-
 	if (!netif_device_present(dev))
 		return;