Message ID | 20230803134253.2711124-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Accepted |
Commit | a94c16a2fda010866b8858a386a8bfbeba4f72c5 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: dsa: ocelot: call dsa_tag_8021q_unregister() under rtnl_lock() on driver remove | expand |
On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote: > When the tagging protocol in current use is "ocelot-8021q" and we unbind > the driver, we see this splat: > > $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind > mscc_felix 0000:00:00.5 swp0: left promiscuous mode > sja1105 spi2.0: Link is Down > DSA: tree 1 torn down > mscc_felix 0000:00:00.5 swp2: left promiscuous mode > sja1105 spi2.2: Link is Down > DSA: tree 3 torn down > fsl_enetc 0000:00:00.2 eno2: left promiscuous mode > mscc_felix 0000:00:00.5: Link is Down > ------------[ cut here ]------------ > RTNL: assertion failed at net/dsa/tag_8021q.c (409) > WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0 > Modules linked in: > CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771 > pc : dsa_tag_8021q_unregister+0x12c/0x1a0 > lr : dsa_tag_8021q_unregister+0x12c/0x1a0 > Call trace: > dsa_tag_8021q_unregister+0x12c/0x1a0 > felix_tag_8021q_teardown+0x130/0x150 > felix_teardown+0x3c/0xd8 > dsa_tree_teardown_switches+0xbc/0xe0 > dsa_unregister_switch+0x168/0x260 > felix_pci_remove+0x30/0x60 > pci_device_remove+0x4c/0x100 > device_release_driver_internal+0x188/0x288 > device_links_unbind_consumers+0xfc/0x138 > device_release_driver_internal+0xe0/0x288 > device_driver_detach+0x24/0x38 > unbind_store+0xd8/0x108 > drv_attr_store+0x30/0x50 > ---[ end trace 0000000000000000 ]--- > ------------[ cut here ]------------ > RTNL: assertion failed at net/8021q/vlan_core.c (376) > WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0 > CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771 > pc : vlan_vid_del+0x1b8/0x1f0 > lr : vlan_vid_del+0x1b8/0x1f0 > dsa_tag_8021q_unregister+0x8c/0x1a0 > felix_tag_8021q_teardown+0x130/0x150 > felix_teardown+0x3c/0xd8 > dsa_tree_teardown_switches+0xbc/0xe0 > dsa_unregister_switch+0x168/0x260 > felix_pci_remove+0x30/0x60 > pci_device_remove+0x4c/0x100 > device_release_driver_internal+0x188/0x288 > device_links_unbind_consumers+0xfc/0x138 > device_release_driver_internal+0xe0/0x288 > device_driver_detach+0x24/0x38 > unbind_store+0xd8/0x108 > drv_attr_store+0x30/0x50 > DSA: tree 0 torn down > > This was somewhat not so easy to spot, because "ocelot-8021q" is not the > default tagging protocol, and thus, not everyone who tests the unbinding > path may have switched to it beforehand. The default > felix_tag_npi_teardown() does not require rtnl_lock() to be held. > > Fixes: 7c83a7c539ab ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q") > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- > drivers/net/dsa/ocelot/felix.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c > index fd7eb4a52918..9a3e5ec16972 100644 > --- a/drivers/net/dsa/ocelot/felix.c > +++ b/drivers/net/dsa/ocelot/felix.c > @@ -1619,8 +1619,10 @@ static void felix_teardown(struct dsa_switch *ds) > struct felix *felix = ocelot_to_felix(ocelot); > struct dsa_port *dp; > > + rtnl_lock(); > if (felix->tag_proto_ops) > felix->tag_proto_ops->teardown(ds); > + rtnl_unlock(); Hi Vladimir, I am curious to know if RTNL could be taken in felix_tag_8021q_teardown() instead. > > dsa_switch_for_each_available_port(dp, ds) > ocelot_deinit_port(ocelot, dp->index); > -- > 2.34.1 > >
On Thu, Aug 03, 2023 at 09:29:38PM +0200, Simon Horman wrote: > On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote: > > diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c > > index fd7eb4a52918..9a3e5ec16972 100644 > > --- a/drivers/net/dsa/ocelot/felix.c > > +++ b/drivers/net/dsa/ocelot/felix.c > > @@ -1619,8 +1619,10 @@ static void felix_teardown(struct dsa_switch *ds) > > struct felix *felix = ocelot_to_felix(ocelot); > > struct dsa_port *dp; > > > > + rtnl_lock(); > > if (felix->tag_proto_ops) > > felix->tag_proto_ops->teardown(ds); > > + rtnl_unlock(); > > Hi Vladimir, > > I am curious to know if RTNL could be taken in > felix_tag_8021q_teardown() instead. Negative. This call path also exists: dsa_tree_change_tag_proto() -> rtnl_trylock() -> dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO, &info) -> dsa_switch_change_tag_proto() -> ds->ops->change_tag_protocol() -> felix_change_tag_protocol() -> old_proto_ops->teardown() -> felix_tag_8021q_teardown() where the rtnl_mutex is already held.
Hi Vladimir, On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote: > When the tagging protocol in current use is "ocelot-8021q" and we unbind > the driver, we see this splat: > > $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind > mscc_felix 0000:00:00.5 swp0: left promiscuous mode > sja1105 spi2.0: Link is Down > DSA: tree 1 torn down > mscc_felix 0000:00:00.5 swp2: left promiscuous mode > sja1105 spi2.2: Link is Down > DSA: tree 3 torn down > fsl_enetc 0000:00:00.2 eno2: left promiscuous mode > mscc_felix 0000:00:00.5: Link is Down > ------------[ cut here ]------------ > RTNL: assertion failed at net/dsa/tag_8021q.c (409) > WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0 > Modules linked in: > CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771 > pc : dsa_tag_8021q_unregister+0x12c/0x1a0 > lr : dsa_tag_8021q_unregister+0x12c/0x1a0 > Call trace: > dsa_tag_8021q_unregister+0x12c/0x1a0 > felix_tag_8021q_teardown+0x130/0x150 > felix_teardown+0x3c/0xd8 > dsa_tree_teardown_switches+0xbc/0xe0 > dsa_unregister_switch+0x168/0x260 > felix_pci_remove+0x30/0x60 > pci_device_remove+0x4c/0x100 > device_release_driver_internal+0x188/0x288 > device_links_unbind_consumers+0xfc/0x138 > device_release_driver_internal+0xe0/0x288 > device_driver_detach+0x24/0x38 > unbind_store+0xd8/0x108 > drv_attr_store+0x30/0x50 > ---[ end trace 0000000000000000 ]--- > ------------[ cut here ]------------ > RTNL: assertion failed at net/8021q/vlan_core.c (376) > WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0 > CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771 > pc : vlan_vid_del+0x1b8/0x1f0 > lr : vlan_vid_del+0x1b8/0x1f0 > dsa_tag_8021q_unregister+0x8c/0x1a0 > felix_tag_8021q_teardown+0x130/0x150 > felix_teardown+0x3c/0xd8 > dsa_tree_teardown_switches+0xbc/0xe0 > dsa_unregister_switch+0x168/0x260 > felix_pci_remove+0x30/0x60 > pci_device_remove+0x4c/0x100 > device_release_driver_internal+0x188/0x288 > device_links_unbind_consumers+0xfc/0x138 > device_release_driver_internal+0xe0/0x288 > device_driver_detach+0x24/0x38 > unbind_store+0xd8/0x108 > drv_attr_store+0x30/0x50 > DSA: tree 0 torn down > > This was somewhat not so easy to spot, because "ocelot-8021q" is not the > default tagging protocol, and thus, not everyone who tests the unbinding > path may have switched to it beforehand. The default > felix_tag_npi_teardown() does not require rtnl_lock() to be held. I ran this unbind test (with just ocelot tagging) on my currently running system (6.5.1-rc1 + 8). This doesn't include your patch, but I suspect this is entirely different because I'm not using ocelot-8021q. # echo spi0.0 > /sys/bus/spi/drivers/ocelot-soc/unbind br0: port 1(swp1) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left promiscuous mode br0: port 1(swp1) entered disabled state br0: port 2(swp2) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left promiscuous mode br0: port 2(swp2) entered disabled state br0: port 3(swp3) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left promiscuous mode br0: port 3(swp3) entered disabled state br0: port 4(swp4) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left promiscuous mode br0: port 4(swp4) entered disabled state br0: port 5(swp5) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left promiscuous mode br0: port 5(swp5) entered disabled state br0: port 6(swp6) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left promiscuous mode br0: port 6(swp6) entered disabled state br0: port 7(swp7) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left allmulticast mode cpsw-switch 4a100000.switch eth0: left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left promiscuous mode cpsw-switch 4a100000.switch eth0: left promiscuous mode br0: port 7(swp7) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down DSA: tree 0 torn down ------------[ cut here ]------------ WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c Modules linked in: CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc1-00008-ga5ed09af118a #1324 Hardware name: Generic AM33XX (Flattened Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:00000009 r6:00000000 r5:c18c0a8c r4:000e0113 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x18/0x1c r7:00000009 r6:c1186e10 r5:000005d2 r4:c1a06270 dump_stack from __warn+0x88/0x160 __warn from warn_slowpath_fmt+0xe4/0x1e0 r8:00000009 r7:000005d2 r6:c1a06270 r5:c1d05590 r4:c1c978a4 warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c r10:c1ea8b7c r9:c4290da8 r8:00000100 r7:c1a06270 r6:c4288380 r5:c427f800 r4:c427f600 dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c r9:c4290da8 r8:00000044 r7:c4255c54 r6:c4290db0 r5:c4290d80 r4:c4288380 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40 r9:c1f6ec1c r8:00000044 r7:c4255c54 r6:c1ec5454 r5:00000000 r4:c26db800 ocelot_ext_remove from platform_remove+0x50/0x6c r5:00000000 r4:c4255c10 platform_remove from device_remove+0x50/0x74 r5:00000000 r4:c4255c10 device_remove from device_release_driver_internal+0x190/0x204 r5:00000000 r4:c4255c10 device_release_driver_internal from device_release_driver+0x20/0x24 r9:c1f6ec1c r8:c2146940 r7:c2146938 r6:c214690c r5:c4255c10 r4:c2146930 device_release_driver from bus_remove_device+0xd0/0xf4 bus_remove_device from device_del+0x164/0x454 r9:c1f6ec1c r8:c424d800 r7:c47b4700 r6:00000000 r5:c4255c10 r4:c4255c54 device_del from platform_device_del.part.0+0x20/0x84 r10:c1ea8b7c r9:c4292e80 r8:00000100 r7:00000122 r6:c4255c00 r5:c4255c00 r4:c4255c00 platform_device_del.part.0 from platform_device_unregister+0x28/0x34 r5:c4255c10 r4:c4255c00 platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4 r5:c4255c10 r4:c1ea8b7c mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8 r10:c47b4700 r9:c1d04d5c r8:c1f099a8 r7:c424d800 r6:c0a98f74 r5:e0c55d78 r4:00000000 r3:00000001 device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68 r6:e0c55dd4 r5:c4270e00 r4:c4270f00 devm_mfd_dev_release from release_nodes+0x78/0x104 release_nodes from devres_release_all+0x90/0xe0 r10:c4b05b10 r9:00000000 r8:c424d444 r7:c424d9b0 r6:80030013 r5:00000039 r4:c424d800 devres_release_all from device_unbind_cleanup+0x1c/0x70 r7:c424d844 r6:c1ea8b94 r5:c424d400 r4:c424d800 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204 r5:c424d400 r4:c424d800 device_release_driver_internal from device_driver_detach+0x20/0x24 r9:00000000 r8:00000000 r7:c1ea8b94 r6:00000007 r5:c424d800 r4:c1eb9108 device_driver_detach from unbind_store+0x64/0xa0 unbind_store from drv_attr_store+0x34/0x40 r7:e0c55f08 r6:c4b05b00 r5:c471d040 r4:c0a53410 drv_attr_store from sysfs_kf_write+0x48/0x54 r5:c471d040 r4:c0a5266c sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc r5:c471d040 r4:00000007 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c r10:00000000 r9:00004004 r8:00000000 r7:00000007 r6:005c9ef8 r5:e0c55f68 r4:c4958cc0 vfs_write from ksys_write+0x70/0xf4 r10:00000004 r9:c47b4700 r8:c03002f4 r7:00000000 r6:00000000 r5:c4958cc0 r4:c4958cc0 ksys_write from sys_write+0x18/0x1c r7:00000004 r6:b6fad550 r5:005c9ef8 r4:00000007 sys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xe0c55fa8 to 0xe0c55ff0) 5fa0: 00000007 005c9ef8 00000001 005c9ef8 00000007 00000000 5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c 5fe0: 00000004 be8e49c8 b6e56767 b6de1e06 ---[ end trace 0000000000000000 ]--- gpio_stub_drv gpiochip6: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED BUG: scheduling while atomic: bash/157/0x00000002 Modules linked in: Preemption disabled at: [<c03b8f98>] __wake_up_klogd.part.0+0x20/0xb4 CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc1-00008-ga5ed09af118a #1324 Hardware name: Generic AM33XX (Flattened Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:c47b4700 r6:00000000 r5:c18c0a8c r4:000e0113 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x18/0x1c r7:c47b4700 r6:c47b4700 r5:c03b8f98 r4:c47b4700 dump_stack from __schedule_bug+0x94/0xa4 __schedule_bug from __schedule+0x8fc/0xc48 r5:00000000 r4:df99a400 __schedule from schedule+0x60/0xf4 r10:e0c55ab4 r9:00000002 r8:e0c55a3c r7:c47b4700 r6:e0c55ab0 r5:e0c55aac r4:c47b4700 schedule from schedule_timeout+0xd8/0x190 r5:e0c55aac r4:7fffffff schedule_timeout from wait_for_completion+0xa0/0x124 r8:e0c55a3c r7:c47b4700 r6:e0c55ab0 r5:e0c55aac r4:7fffffff wait_for_completion from devtmpfs_submit_req+0x70/0x80 r10:c47b4700 r9:c1f6ec1c r8:c424e810 r7:00000000 r6:e0c55aac r5:e0c55aa8 r4:c1f6ed78 devtmpfs_submit_req from devtmpfs_delete_node+0x84/0xb4 r7:c47b4700 r6:c4250264 r5:c4250000 r4:00000000 devtmpfs_delete_node from device_del+0x3b8/0x454 r5:c4250000 r4:c4250044 device_del from cdev_device_del+0x24/0x54 r10:c47b4700 r9:c1d04d5c r8:00000040 r7:c4250234 r6:c4250264 r5:c42501e0 r4:c4250000 cdev_device_del from gpiolib_cdev_unregister+0x20/0x24 r5:c4250000 r4:00000000 gpiolib_cdev_unregister from gpiochip_remove+0x100/0x130 gpiochip_remove from devm_gpio_chip_release+0x18/0x1c r9:c1d04d5c r8:c1f099a8 r7:c424e810 r6:e0c55bf4 r5:c427e700 r4:c427ea80 devm_gpio_chip_release from devm_action_release+0x1c/0x20 devm_action_release from release_nodes+0x78/0x104 release_nodes from devres_release_all+0x90/0xe0 r10:c1ea8b7c r9:c1f6ec1c r8:00000044 r7:c424e9c0 r6:800e0113 r5:00000093 r4:c424e810 devres_release_all from device_unbind_cleanup+0x1c/0x70 r7:c424e854 r6:c1dd9a80 r5:00000000 r4:c424e810 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204 r5:00000000 r4:c424e810 device_release_driver_internal from device_release_driver+0x20/0x24 r9:c1f6ec1c r8:c2146940 r7:c2146938 r6:c214690c r5:c424e810 r4:c2146930 device_release_driver from bus_remove_device+0xd0/0xf4 bus_remove_device from device_del+0x164/0x454 r9:c1f6ec1c r8:c424d800 r7:c47b4700 r6:00000000 r5:c424e810 r4:c424e854 device_del from platform_device_del.part.0+0x20/0x84 r10:c1ea8b7c r9:c4274f00 r8:00000100 r7:00000122 r6:c424e800 r5:c424e800 r4:c424e800 platform_device_del.part.0 from platform_device_unregister+0x28/0x34 r5:c424e810 r4:c424e800 platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4 r5:c424e810 r4:c1ea8b7c mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8 r10:c47b4700 r9:c1d04d5c r8:c1f099a8 r7:c424d800 r6:c0a98f74 r5:e0c55d78 r4:00000000 r3:00000001 device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68 r6:e0c55dd4 r5:c4270e00 r4:c4270f00 devm_mfd_dev_release from release_nodes+0x78/0x104 release_nodes from devres_release_all+0x90/0xe0 r10:c4b05b10 r9:00000000 r8:c424d444 r7:c424d9b0 r6:80030013 r5:00000039 r4:c424d800 devres_release_all from device_unbind_cleanup+0x1c/0x70 r7:c424d844 r6:c1ea8b94 r5:c424d400 r4:c424d800 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204 r5:c424d400 r4:c424d800 device_release_driver_internal from device_driver_detach+0x20/0x24 r9:00000000 r8:00000000 r7:c1ea8b94 r6:00000007 r5:c424d800 r4:c1eb9108 device_driver_detach from unbind_store+0x64/0xa0 unbind_store from drv_attr_store+0x34/0x40 r7:e0c55f08 r6:c4b05b00 r5:c471d040 r4:c0a53410 drv_attr_store from sysfs_kf_write+0x48/0x54 r5:c471d040 r4:c0a5266c sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc r5:c471d040 r4:00000007 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c r10:00000000 r9:00004004 r8:00000000 r7:00000007 r6:005c9ef8 r5:e0c55f68 r4:c4958cc0 vfs_write from ksys_write+0x70/0xf4 r10:00000004 r9:c47b4700 r8:c03002f4 r7:00000000 r6:00000000 r5:c4958cc0 r4:c4958cc0 ksys_write from sys_write+0x18/0x1c r7:00000004 r6:b6fad550 r5:005c9ef8 r4:00000007 sys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xe0c55fa8 to 0xe0c55ff0) 5fa0: 00000007 005c9ef8 00000001 005c9ef8 00000007 00000000 5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c 5fe0: 00000004 be8e49c8 b6e56767 b6de1e06 cpsw-switch 4a100000.switch eth0: Link is Down It looks to me like I have some things to fix :) Is it worth me still trying to recreate / test? I haven't used ocelot-8021q really at all. Colin
Hi Colin, On Fri, Aug 04, 2023 at 09:03:33AM -0700, Colin Foster wrote: > On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote: > I ran this unbind test (with just ocelot tagging) on my currently > running system (6.5.1-rc1 + 8). This doesn't include your patch, but I > suspect this is entirely different because I'm not using ocelot-8021q. > > # echo spi0.0 > /sys/bus/spi/drivers/ocelot-soc/unbind > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c > Modules linked in: > CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc1-00008-ga5ed09af118a #1324 > Hardware name: Generic AM33XX (Flattened Device Tree) > Backtrace: > __warn from warn_slowpath_fmt+0xe4/0x1e0 > warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c > dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c > dsa_unregister_switch from ocelot_ext_remove+0x28/0x40 > ocelot_ext_remove from platform_remove+0x50/0x6c > platform_remove from device_remove+0x50/0x74 > device_remove from device_release_driver_internal+0x190/0x204 > device_release_driver_internal from device_release_driver+0x20/0x24 > device_release_driver from bus_remove_device+0xd0/0xf4 > bus_remove_device from device_del+0x164/0x454 > device_del from platform_device_del.part.0+0x20/0x84 > platform_device_del.part.0 from platform_device_unregister+0x28/0x34 > platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4 > mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8 > device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68 > devm_mfd_dev_release from release_nodes+0x78/0x104 > release_nodes from devres_release_all+0x90/0xe0 > devres_release_all from device_unbind_cleanup+0x1c/0x70 > device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204 > device_release_driver_internal from device_driver_detach+0x20/0x24 > device_driver_detach from unbind_store+0x64/0xa0 > unbind_store from drv_attr_store+0x34/0x40 > drv_attr_store from sysfs_kf_write+0x48/0x54 > sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc > kernfs_fop_write_iter from vfs_write+0x2d0/0x41c > vfs_write from ksys_write+0x70/0xf4 > ksys_write from sys_write+0x18/0x1c > sys_write from ret_fast_syscall+0x0/0x1c > Exception stack(0xe0c55fa8 to 0xe0c55ff0) > 5fa0: 00000007 005c9ef8 00000001 005c9ef8 00000007 00000000 > 5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c > 5fe0: 00000004 be8e49c8 b6e56767 b6de1e06 > ---[ end trace 0000000000000000 ]--- > gpio_stub_drv gpiochip6: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED > BUG: scheduling while atomic: bash/157/0x00000002 > Modules linked in: > Preemption disabled at: > [<c03b8f98>] __wake_up_klogd.part.0+0x20/0xb4 > CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc1-00008-ga5ed09af118a #1324 > Hardware name: Generic AM33XX (Flattened Device Tree) > Backtrace: > __schedule_bug from __schedule+0x8fc/0xc48 > __schedule from schedule+0x60/0xf4 > schedule from schedule_timeout+0xd8/0x190 > schedule_timeout from wait_for_completion+0xa0/0x124 > wait_for_completion from devtmpfs_submit_req+0x70/0x80 > devtmpfs_submit_req from devtmpfs_delete_node+0x84/0xb4 > devtmpfs_delete_node from device_del+0x3b8/0x454 > device_del from cdev_device_del+0x24/0x54 > cdev_device_del from gpiolib_cdev_unregister+0x20/0x24 > gpiolib_cdev_unregister from gpiochip_remove+0x100/0x130 > gpiochip_remove from devm_gpio_chip_release+0x18/0x1c > devm_gpio_chip_release from devm_action_release+0x1c/0x20 > devm_action_release from release_nodes+0x78/0x104 > release_nodes from devres_release_all+0x90/0xe0 > devres_release_all from device_unbind_cleanup+0x1c/0x70 > device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204 > device_release_driver_internal from device_release_driver+0x20/0x24 > device_release_driver from bus_remove_device+0xd0/0xf4 > bus_remove_device from device_del+0x164/0x454 > device_del from platform_device_del.part.0+0x20/0x84 > platform_device_del.part.0 from platform_device_unregister+0x28/0x34 > platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4 > mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8 > device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68 > devm_mfd_dev_release from release_nodes+0x78/0x104 > release_nodes from devres_release_all+0x90/0xe0 > devres_release_all from device_unbind_cleanup+0x1c/0x70 > device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204 > device_release_driver_internal from device_driver_detach+0x20/0x24 > device_driver_detach from unbind_store+0x64/0xa0 > unbind_store from drv_attr_store+0x34/0x40 > drv_attr_store from sysfs_kf_write+0x48/0x54 > sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc > kernfs_fop_write_iter from vfs_write+0x2d0/0x41c > vfs_write from ksys_write+0x70/0xf4 > ksys_write from sys_write+0x18/0x1c > sys_write from ret_fast_syscall+0x0/0x1c > Exception stack(0xe0c55fa8 to 0xe0c55ff0) > 5fa0: 00000007 005c9ef8 00000001 005c9ef8 00000007 00000000 > 5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c > 5fe0: 00000004 be8e49c8 b6e56767 b6de1e06 > cpsw-switch 4a100000.switch eth0: Link is Down > > > It looks to me like I have some things to fix :) > > > Is it worth me still trying to recreate / test? I haven't used > ocelot-8021q really at all. The WARN_ON() in dsa_switch_release_ports() is different, and I tried to fix it here: https://patchwork.kernel.org/project/netdevbpf/patch/20230411144955.1604591-1-vladimir.oltean@nxp.com/ but judging by the fact that that was in April and now we're in August, obviously I didn't succeed. What's worse is the other one, the "scheduling while atomic" bug in the gpiochip removal path from ocelot-pinctrl.c. I'm not sure, at first glance, what causes the calling context to be atomic. Presumably some kind of spinlock which should be tracked down. Unfortunately I'm not very good with kernel debugging the way it should be done, so what I would advise you to do is to walk the stack trace down, from device_del() or so, and sprinkle a few might_sleep() calls until you figure out who's forcing atomic context and why. Otherwise, can't you just unbind the driver from the ethernet-switch child of the SPI device, rather than the entire SPI device? That should avoid the gpiochip/pinctrl bug. And the other one is ignorable for the intents and purposes here (that is, unless you want to take care of it, of course).
Hi Vladimir, On Fri, Aug 04, 2023 at 08:09:58PM +0300, Vladimir Oltean wrote: > Hi Colin, > > The WARN_ON() in dsa_switch_release_ports() is different, and I tried to fix it here: > https://patchwork.kernel.org/project/netdevbpf/patch/20230411144955.1604591-1-vladimir.oltean@nxp.com/ > but judging by the fact that that was in April and now we're in August, > obviously I didn't succeed. > > What's worse is the other one, the "scheduling while atomic" bug in the > gpiochip removal path from ocelot-pinctrl.c. I'm not sure, at first glance, > what causes the calling context to be atomic. Presumably some kind of > spinlock which should be tracked down. > > Unfortunately I'm not very good with kernel debugging the way it should > be done, so what I would advise you to do is to walk the stack trace > down, from device_del() or so, and sprinkle a few might_sleep() calls > until you figure out who's forcing atomic context and why. I'm not super confident in tracking down these types of bugs either, but you already knew that ;-) Thanks for these debugging tips. This is absolutely a problem I want to understand! When I was _really_ early in my SGPIO LED bring-up, I saw a scheduling while atomic warning when I set one of those LEDs to trigger on heartbeat. I have since tested that many times and never saw it again. > Otherwise, can't you just unbind the driver from the ethernet-switch > child of the SPI device, rather than the entire SPI device? That should > avoid the gpiochip/pinctrl bug. And the other one is ignorable for the > intents and purposes here (that is, unless you want to take care of it, > of course). Ok, I might have it recreated. My method: (updated to 6.5-rc4) Set dsa up for ocelot-8021q: echo ocelot-8021q > /sys/class/net/eth0/dsa/tagging Bring up the ports in my jumpered bridge configuration. (Bridge swp1-swp7 and have a bunch of short cables to test RSTP.) ** Side note, in 8021q mode I can ping through the bridge's swp1 to my desktop, but RSTP forwards on all ports. I'm adding this to my list. ** Unbind the ocelot-ext driver: echo ocelot-ext-switch.5.auto > \ /sys/bus/platform/drivers/ocelot-ext-switch/unbind br0: port 1(swp1) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left promiscuous mode br0: port 1(swp1) entered disabled state br0: port 2(swp2) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left promiscuous mode br0: port 2(swp2) entered disabled state br0: port 3(swp3) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left promiscuous mode br0: port 3(swp3) entered disabled state br0: port 4(swp4) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left promiscuous mode br0: port 4(swp4) entered disabled state br0: port 5(swp5) entered disabled state br0: port 5(swp5) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left promiscuous mode br0: port 5(swp5) entered disabled state br0: port 6(swp6) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left promiscuous mode br0: port 6(swp6) entered disabled state br0: port 7(swp7) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left allmulticast mode cpsw-switch 4a100000.switch eth0: left allmulticast mode ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left promiscuous mode cpsw-switch 4a100000.switch eth0: left promiscuous mode br0: port 7(swp7) entered disabled state ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down ------------[ cut here ]------------ WARNING: CPU: 0 PID: 157 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x1a8/0x1ac RTNL: assertion failed at net/dsa/tag_8021q.c (409) Modules linked in: CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc4-00008-gda8b39e58927 #1325 Hardware name: Generic AM33XX (Flattened Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:00000009 r6:00000000 r5:c18c0c5c r4:00030113 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x18/0x1c r7:00000009 r6:c1197930 r5:00000199 r4:c1a06dfc dump_stack from __warn+0x88/0x160 __warn from warn_slowpath_fmt+0x150/0x1e0 r8:00000009 r7:00000199 r6:c1a06dfc r5:c1d05590 r4:c1c978a4 warn_slowpath_fmt from dsa_tag_8021q_unregister+0x1a8/0x1ac r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c46ec180 r5:c4284a30 r4:c4298300 dsa_tag_8021q_unregister from felix_tag_8021q_teardown+0x94/0x140 r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c26db808 r5:c4284a30 r4:c4298300 felix_tag_8021q_teardown from felix_teardown+0x30/0xc0 r7:00000000 r6:c26db808 r5:c4298300 r4:c4298300 felix_teardown from dsa_tree_teardown_switches+0xa4/0xd4 r7:00000000 r6:c4284b48 r5:c4282600 r4:c4298300 dsa_tree_teardown_switches from dsa_unregister_switch+0xe4/0x18c r7:c4255c54 r6:c1ec5414 r5:c4284b40 r4:c4298300 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40 r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800 ocelot_ext_remove from platform_remove+0x50/0x6c r5:c424d800 r4:c4255c10 platform_remove from device_remove+0x50/0x74 r5:c424d800 r4:c4255c10 device_remove from device_release_driver_internal+0x190/0x204 r5:c424d800 r4:c4255c10 device_release_driver_internal from device_driver_detach+0x20/0x24 r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0 device_driver_detach from unbind_store+0x64/0xa0 unbind_store from drv_attr_store+0x34/0x40 r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740 drv_attr_store from sysfs_kf_write+0x48/0x54 r5:c471a100 r4:c0a5299c sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc r5:c471a100 r4:00000019 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68 r4:c4714f00 vfs_write from ksys_write+0x70/0xf4 r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00 r4:c4714f02 ksys_write from sys_write+0x18/0x1c r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019 sys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xe0c55fa8 to 0xe0c55ff0) 5fa0: 00000019 00579a60 00000001 00579a60 00000019 00000000 5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c 5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06 ---[ end trace 0000000000000000 ]--- cpsw-switch 4a100000.switch eth0: Link is Down ------------[ cut here ]------------ WARNING: CPU: 0 PID: 157 at net/8021q/vlan_core.c:376 vlan_vid_del+0x14c/0x16c RTNL: assertion failed at net/8021q/vlan_core.c (376) Modules linked in: CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc4-00008-gda8b39e58927 #1325 Hardware name: Generic AM33XX (Flattened Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:00000009 r6:00000000 r5:c18c0c5c r4:000f0013 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x18/0x1c r7:00000009 r6:c11a5554 r5:00000178 r4:c1a078b4 dump_stack from __warn+0x88/0x160 __warn from warn_slowpath_fmt+0x150/0x1e0 r8:00000009 r7:00000178 r6:c1a078b4 r5:c1d05590 r4:c1c978a4 warn_slowpath_fmt from vlan_vid_del+0x14c/0x16c r10:00000c01 r9:c46ec180 r8:c241f000 r7:c241f000 r6:00000c01 r5:0000a888 r4:c4298300 vlan_vid_del from dsa_tag_8021q_unregister+0x164/0x1ac r9:c46ec180 r8:c241f000 r7:00000000 r6:c46ec180 r5:00000001 r4:c4298300 dsa_tag_8021q_unregister from felix_tag_8021q_teardown+0x94/0x140 r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c26db808 r5:c4284a30 r4:c4298300 felix_tag_8021q_teardown from felix_teardown+0x30/0xc0 r7:00000000 r6:c26db808 r5:c4298300 r4:c4298300 felix_teardown from dsa_tree_teardown_switches+0xa4/0xd4 r7:00000000 r6:c4284b48 r5:c4282600 r4:c4298300 dsa_tree_teardown_switches from dsa_unregister_switch+0xe4/0x18c r7:c4255c54 r6:c1ec5414 r5:c4284b40 r4:c4298300 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40 r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800 ocelot_ext_remove from platform_remove+0x50/0x6c r5:c424d800 r4:c4255c10 platform_remove from device_remove+0x50/0x74 r5:c424d800 r4:c4255c10 device_remove from device_release_driver_internal+0x190/0x204 r5:c424d800 r4:c4255c10 device_release_driver_internal from device_driver_detach+0x20/0x24 r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0 device_driver_detach from unbind_store+0x64/0xa0 unbind_store from drv_attr_store+0x34/0x40 r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740 drv_attr_store from sysfs_kf_write+0x48/0x54 r5:c471a100 r4:c0a5299c sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc r5:c471a100 r4:00000019 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68 r4:c4714f00 vfs_write from ksys_write+0x70/0xf4 r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00 r4:c4714f02 ksys_write from sys_write+0x18/0x1c r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019 sys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xe0c55fa8 to 0xe0c55ff0) 5fa0: 00000019 00579a60 00000001 00579a60 00000019 00000000 5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c 5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06 ---[ end trace 0000000000000000 ]--- DSA: tree 0 torn down ------------[ cut here ]------------ WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c Modules linked in: CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc4-00008-gda8b39e58927 #1325 Hardware name: Generic AM33XX (Flattened Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:00000009 r6:00000000 r5:c18c0c5c r4:000d0013 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x18/0x1c r7:00000009 r6:c1187600 r5:000005d2 r4:c1a06454 dump_stack from __warn+0x88/0x160 __warn from warn_slowpath_fmt+0xe4/0x1e0 r8:00000009 r7:000005d2 r6:c1a06454 r5:c1d05590 r4:c1c978a4 warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c r10:c2182b90 r9:c4284b68 r8:00000100 r7:c1a06454 r6:c4298300 r5:c4282800 r4:c4282600 dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c r9:c4284b68 r8:c424d844 r7:c4255c54 r6:c4284b70 r5:c4284b40 r4:c4298300 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40 r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800 ocelot_ext_remove from platform_remove+0x50/0x6c r5:c424d800 r4:c4255c10 platform_remove from device_remove+0x50/0x74 r5:c424d800 r4:c4255c10 device_remove from device_release_driver_internal+0x190/0x204 r5:c424d800 r4:c4255c10 device_release_driver_internal from device_driver_detach+0x20/0x24 r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0 device_driver_detach from unbind_store+0x64/0xa0 unbind_store from drv_attr_store+0x34/0x40 r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740 drv_attr_store from sysfs_kf_write+0x48/0x54 r5:c471a100 r4:c0a5299c sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc r5:c471a100 r4:00000019 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68 r4:c4714f00 vfs_write from ksys_write+0x70/0xf4 r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00 r4:c4714f02 ksys_write from sys_write+0x18/0x1c r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019 sys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xe0c55fa8 to 0xe0c55ff0) 5fa0: 00000019 00579a60 00000001 00579a60 00000019 00000000 5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c 5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06 ---[ end trace 0000000000000000 ]--- BUG: scheduling while atomic: bash/157/0x00000002 Modules linked in: Preemption disabled at: [<c03b8ff0>] __wake_up_klogd.part.0+0x20/0xb4 CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc4-00008-gda8b39e58927 #1325 Hardware name: Generic AM33XX (Flattened Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:00000002 r6:00000080 r5:c18c0c5c r4:000d0093 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x18/0x1c r7:00000002 r6:c42ad8c0 r5:c03b8ff0 r4:c42ad8c0 dump_stack from __schedule_bug+0x94/0xa4 __schedule_bug from __schedule+0x8fc/0xc48 r5:00000000 r4:df99a400 __schedule from schedule+0x60/0xf4 r10:5ac3c35a r9:befffd07 r8:fffffe30 r7:00000002 r6:c03002f4 r5:e0c55fb0 r4:c42ad8c0 schedule from do_work_pending+0x84/0x568 r5:e0c55fb0 r4:c42ad8c0 do_work_pending from slow_work_pending+0xc/0x20 Exception stack(0xe0c55fb0 to 0xe0c55ff8) 5fa0: 00000019 00579a60 00000019 00000000 5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c 5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06 40010030 00000001 r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019 I repeat with your patch: ... ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down cpsw-switch 4a100000.switch eth0: Link is Down DSA: tree 0 torn down ------------[ cut here ]------------ WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c Modules linked in: CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc4-00008-gda8b39e58927-dirty #1326 Hardware name: Generic AM33XX (Flattened Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:00000009 r6:00000000 r5:c18c0c64 r4:00000013 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x18/0x1c r7:00000009 r6:c1187600 r5:000005d2 r4:c1a06470 dump_stack from __warn+0x88/0x160 __warn from warn_slowpath_fmt+0xe4/0x1e0 r8:00000009 r7:000005d2 r6:c1a06470 r5:c1d05590 r4:c1c978a4 warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c r10:c4afc890 r9:c4284b68 r8:00000100 r7:c1a06470 r6:c4298300 r5:c4282800 r4:c4282600 dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c r9:c4284b68 r8:c424d844 r7:c4255c54 r6:c4284b70 r5:c4284b40 r4:c4298300 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40 r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800 ocelot_ext_remove from platform_remove+0x50/0x6c r5:c424d800 r4:c4255c10 platform_remove from device_remove+0x50/0x74 r5:c424d800 r4:c4255c10 device_remove from device_release_driver_internal+0x190/0x204 r5:c424d800 r4:c4255c10 device_release_driver_internal from device_driver_detach+0x20/0x24 r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0 device_driver_detach from unbind_store+0x64/0xa0 unbind_store from drv_attr_store+0x34/0x40 r7:e0c55f08 r6:c4afc880 r5:c471a600 r4:c0a53740 drv_attr_store from sysfs_kf_write+0x48/0x54 r5:c471a600 r4:c0a5299c sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc r5:c471a600 r4:00000019 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:005598a8 r5:e0c55f68 r4:c4a819c0 vfs_write from ksys_write+0x70/0xf4 r10:00000004 r9:c47d0000 r8:c03002f4 r7:00000000 r6:00000000 r5:c4a819c0 r4:c4a819c2 ksys_write from sys_write+0x18/0x1c r7:00000004 r6:b6f1d550 r5:005598a8 r4:00000019 sys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xe0c55fa8 to 0xe0c55ff0) 5fa0: 00000019 005598a8 00000001 005598a8 00000019 00000000 5fc0: 00000019 005598a8 b6f1d550 00000004 00000019 00000001 00000000 be9f8a6c 5fe0: 00000004 be9f89c8 b6dc6767 b6d51e06 ---[ end trace 0000000000000000 ]--- BUG: scheduling while atomic: bash/157/0x00000002 Modules linked in: Preemption disabled at: [<c03b8ff0>] __wake_up_klogd.part.0+0x20/0xb4 CPU: 0 PID: 157 Comm: bash Tainted: G W 6.5.0-rc4-00008-gda8b39e58927-dirty #1326 Hardware name: Generic AM33XX (Flattened Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:00000002 r6:00000080 r5:c18c0c64 r4:00000093 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x18/0x1c r7:00000002 r6:c47d0000 r5:c03b8ff0 r4:c47d0000 dump_stack from __schedule_bug+0x94/0xa4 __schedule_bug from __schedule+0x8fc/0xc48 r5:00000000 r4:df99a400 __schedule from schedule+0x60/0xf4 r10:5ac3c35a r9:befffd07 r8:fffffe30 r7:00000002 r6:c03002f4 r5:e0c55fb0 r4:c47d0000 schedule from do_work_pending+0x84/0x568 r5:e0c55fb0 r4:c47d0000 do_work_pending from slow_work_pending+0xc/0x20 Exception stack(0xe0c55fb0 to 0xe0c55ff8) 5fa0: 00000019 005598a8 00000019 00000000 5fc0: 00000019 005598a8 b6f1d550 00000004 00000019 00000001 00000000 be9f8a6c 5fe0: 00000004 be9f89c8 b6dc6767 b6d51e06 40010030 00000001 r10:00000004 r9:c47d0000 r8:c03002f4 r7:00000004 r6:b6f1d550 r5:005598a8 So if your patch is confirming the existence and removal of the warnings: WARNING: CPU: 0 PID: 157 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x1a8/0x1ac WARNING: CPU: 0 PID: 157 at net/8021q/vlan_core.c:376 vlan_vid_del+0x14c/0x16c ... and you think my testing methods are valid, I can add Acked / Tested if you like. ("scheduling while atomic" and "dsa_switch_release_ports" I see when unbinding, regardless of ocelot or ocelot-8021q tagging protocols) Hope this helps, and let me know if there's anything else you want me to test.
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 3 Aug 2023 16:42:53 +0300 you wrote: > When the tagging protocol in current use is "ocelot-8021q" and we unbind > the driver, we see this splat: > > $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind > mscc_felix 0000:00:00.5 swp0: left promiscuous mode > sja1105 spi2.0: Link is Down > DSA: tree 1 torn down > mscc_felix 0000:00:00.5 swp2: left promiscuous mode > sja1105 spi2.2: Link is Down > DSA: tree 3 torn down > fsl_enetc 0000:00:00.2 eno2: left promiscuous mode > mscc_felix 0000:00:00.5: Link is Down > > [...] Here is the summary with links: - [net] net: dsa: ocelot: call dsa_tag_8021q_unregister() under rtnl_lock() on driver remove https://git.kernel.org/netdev/net/c/a94c16a2fda0 You are awesome, thank you!
On Fri, Aug 04, 2023 at 02:10:45PM +0300, Vladimir Oltean wrote: > On Thu, Aug 03, 2023 at 09:29:38PM +0200, Simon Horman wrote: > > On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote: > > > diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c > > > index fd7eb4a52918..9a3e5ec16972 100644 > > > --- a/drivers/net/dsa/ocelot/felix.c > > > +++ b/drivers/net/dsa/ocelot/felix.c > > > @@ -1619,8 +1619,10 @@ static void felix_teardown(struct dsa_switch *ds) > > > struct felix *felix = ocelot_to_felix(ocelot); > > > struct dsa_port *dp; > > > > > > + rtnl_lock(); > > > if (felix->tag_proto_ops) > > > felix->tag_proto_ops->teardown(ds); > > > + rtnl_unlock(); > > > > Hi Vladimir, > > > > I am curious to know if RTNL could be taken in > > felix_tag_8021q_teardown() instead. > > Negative. This call path also exists: > > dsa_tree_change_tag_proto() > -> rtnl_trylock() > -> dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO, &info) > -> dsa_switch_change_tag_proto() > -> ds->ops->change_tag_protocol() > -> felix_change_tag_protocol() > -> old_proto_ops->teardown() > -> felix_tag_8021q_teardown() > > where the rtnl_mutex is already held. Ack. Thanks for confirming. In that case this looks good to me. Reviewed-by: Simon Horman <horms@kernel.org>
diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index fd7eb4a52918..9a3e5ec16972 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -1619,8 +1619,10 @@ static void felix_teardown(struct dsa_switch *ds) struct felix *felix = ocelot_to_felix(ocelot); struct dsa_port *dp; + rtnl_lock(); if (felix->tag_proto_ops) felix->tag_proto_ops->teardown(ds); + rtnl_unlock(); dsa_switch_for_each_available_port(dp, ds) ocelot_deinit_port(ocelot, dp->index);
When the tagging protocol in current use is "ocelot-8021q" and we unbind the driver, we see this splat: $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind mscc_felix 0000:00:00.5 swp0: left promiscuous mode sja1105 spi2.0: Link is Down DSA: tree 1 torn down mscc_felix 0000:00:00.5 swp2: left promiscuous mode sja1105 spi2.2: Link is Down DSA: tree 3 torn down fsl_enetc 0000:00:00.2 eno2: left promiscuous mode mscc_felix 0000:00:00.5: Link is Down ------------[ cut here ]------------ RTNL: assertion failed at net/dsa/tag_8021q.c (409) WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0 Modules linked in: CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771 pc : dsa_tag_8021q_unregister+0x12c/0x1a0 lr : dsa_tag_8021q_unregister+0x12c/0x1a0 Call trace: dsa_tag_8021q_unregister+0x12c/0x1a0 felix_tag_8021q_teardown+0x130/0x150 felix_teardown+0x3c/0xd8 dsa_tree_teardown_switches+0xbc/0xe0 dsa_unregister_switch+0x168/0x260 felix_pci_remove+0x30/0x60 pci_device_remove+0x4c/0x100 device_release_driver_internal+0x188/0x288 device_links_unbind_consumers+0xfc/0x138 device_release_driver_internal+0xe0/0x288 device_driver_detach+0x24/0x38 unbind_store+0xd8/0x108 drv_attr_store+0x30/0x50 ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ RTNL: assertion failed at net/8021q/vlan_core.c (376) WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0 CPU: 1 PID: 329 Comm: bash Tainted: G W 6.5.0-rc3+ #771 pc : vlan_vid_del+0x1b8/0x1f0 lr : vlan_vid_del+0x1b8/0x1f0 dsa_tag_8021q_unregister+0x8c/0x1a0 felix_tag_8021q_teardown+0x130/0x150 felix_teardown+0x3c/0xd8 dsa_tree_teardown_switches+0xbc/0xe0 dsa_unregister_switch+0x168/0x260 felix_pci_remove+0x30/0x60 pci_device_remove+0x4c/0x100 device_release_driver_internal+0x188/0x288 device_links_unbind_consumers+0xfc/0x138 device_release_driver_internal+0xe0/0x288 device_driver_detach+0x24/0x38 unbind_store+0xd8/0x108 drv_attr_store+0x30/0x50 DSA: tree 0 torn down This was somewhat not so easy to spot, because "ocelot-8021q" is not the default tagging protocol, and thus, not everyone who tests the unbinding path may have switched to it beforehand. The default felix_tag_npi_teardown() does not require rtnl_lock() to be held. Fixes: 7c83a7c539ab ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- drivers/net/dsa/ocelot/felix.c | 2 ++ 1 file changed, 2 insertions(+)