Message ID | 20210111230943.3701806-1-olteanv@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: dsa: unbind all switches from tree when DSA master unbinds | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 18 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On 1/11/21 3:09 PM, Vladimir Oltean wrote: > From: Vladimir Oltean <vladimir.oltean@nxp.com> > > Currently the following happens when a DSA master driver unbinds while > there are DSA switches attached to it: > > $ echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 392 at net/core/dev.c:9507 > Call trace: > rollback_registered_many+0x5fc/0x688 > unregister_netdevice_queue+0x98/0x120 > dsa_slave_destroy+0x4c/0x88 > dsa_port_teardown.part.16+0x78/0xb0 > dsa_tree_teardown_switches+0x58/0xc0 > dsa_unregister_switch+0x104/0x1b8 > felix_pci_remove+0x24/0x48 > pci_device_remove+0x48/0xf0 > device_release_driver_internal+0x118/0x1e8 > device_driver_detach+0x28/0x38 > unbind_store+0xd0/0x100 > > Located at the above location is this WARN_ON: > > /* Notifier chain MUST detach us all upper devices. */ > WARN_ON(netdev_has_any_upper_dev(dev)); > > Other stacked interfaces, like VLAN, do indeed listen for > NETDEV_UNREGISTER on the real_dev and also unregister themselves at that > time, which is clearly the behavior that rollback_registered_many > expects. But DSA interfaces are not VLAN. They have backing hardware > (platform devices, PCI devices, MDIO, SPI etc) which have a life cycle > of their own and we can't just trigger an unregister from the DSA > framework when we receive a netdev notifier that the master unregisters. > > Luckily, there is something we can do, and that is to inform the driver > core that we have a runtime dependency to the DSA master interface's > device, and create a device link where that is the supplier and we are > the consumer. Having this device link will make the DSA switch unbind > before the DSA master unbinds, which is enough to avoid the WARN_ON from > rollback_registered_many. > > Note that even before the blamed commit, DSA did nothing intelligent > when the master interface got unregistered either. See the discussion > here: > https://lore.kernel.org/netdev/20200505210253.20311-1-f.fainelli@gmail.com/ > But this time, at least the WARN_ON is loud enough that the > upper_dev_link commit can be blamed. > > The advantage with this approach vs dev_hold(master) in the attached > link is that the latter is not meant for long term reference counting. > With dev_hold, the only thing that will happen is that when the user > attempts an unbind of the DSA master, netdev_wait_allrefs will keep > waiting and waiting, due to DSA keeping the refcount forever. DSA would > not access freed memory corresponding to the master interface, but the > unbind would still result in a freeze. Whereas with device links, > graceful teardown is ensured. It even works with cascaded DSA trees. > > $ echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind > [ 1818.797546] device swp0 left promiscuous mode > [ 1819.301112] sja1105 spi2.0: Link is Down > [ 1819.307981] DSA: tree 1 torn down > [ 1819.312408] device eno2 left promiscuous mode > [ 1819.656803] mscc_felix 0000:00:00.5: Link is Down > [ 1819.667194] DSA: tree 0 torn down > [ 1819.711557] fsl_enetc 0000:00:00.2 eno2: Link is Down > > This approach allows us to keep the DSA framework absolutely unchanged, > and the driver core will just know to unbind us first when the master > goes away - as opposed to the large (and probably impossible) rework > required if attempting to listen for NETDEV_UNREGISTER. > > As per the documentation at Documentation/driver-api/device_link.rst, > specifying the DL_FLAG_AUTOREMOVE_CONSUMER flag causes the device link > to be automatically purged when the consumer fails to probe or later > unbinds. So we don't need to keep the consumer_link variable in struct > dsa_switch. > > Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings") > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Thanks!
On Tue, 12 Jan 2021 10:51:39 -0800 Florian Fainelli wrote: > On 1/11/21 3:09 PM, Vladimir Oltean wrote: > > From: Vladimir Oltean <vladimir.oltean@nxp.com> > > > > Currently the following happens when a DSA master driver unbinds while > > there are DSA switches attached to it: > > > > $ echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 392 at net/core/dev.c:9507 > > Call trace: > > rollback_registered_many+0x5fc/0x688 > > unregister_netdevice_queue+0x98/0x120 > > dsa_slave_destroy+0x4c/0x88 > > dsa_port_teardown.part.16+0x78/0xb0 > > dsa_tree_teardown_switches+0x58/0xc0 > > dsa_unregister_switch+0x104/0x1b8 > > felix_pci_remove+0x24/0x48 > > pci_device_remove+0x48/0xf0 > > device_release_driver_internal+0x118/0x1e8 > > device_driver_detach+0x28/0x38 > > unbind_store+0xd0/0x100 > > [...] > > > > Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings") > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > Tested-by: Florian Fainelli <f.fainelli@gmail.com> Applied, thanks!
diff --git a/net/dsa/master.c b/net/dsa/master.c index 5a0f6fec4271..cb3a5cf99b25 100644 --- a/net/dsa/master.c +++ b/net/dsa/master.c @@ -309,8 +309,18 @@ static struct lock_class_key dsa_master_addr_list_lock_key; int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp) { int mtu = ETH_DATA_LEN + cpu_dp->tag_ops->overhead; + struct dsa_switch *ds = cpu_dp->ds; + struct device_link *consumer_link; int ret; + /* The DSA master must use SET_NETDEV_DEV for this to work. */ + consumer_link = device_link_add(ds->dev, dev->dev.parent, + DL_FLAG_AUTOREMOVE_CONSUMER); + if (!consumer_link) + netdev_err(dev, + "Failed to create a device link to DSA switch %s\n", + dev_name(ds->dev)); + rtnl_lock(); ret = dev_set_mtu(dev, mtu); rtnl_unlock();