diff mbox series

[net] net: dsa: ocelot: call dsa_tag_8021q_unregister() under rtnl_lock() on driver remove

Message ID 20230803134253.2711124-1-vladimir.oltean@nxp.com (mailing list archive)
State Accepted
Commit a94c16a2fda010866b8858a386a8bfbeba4f72c5
Delegated to: Netdev Maintainers
Headers show
Series [net] net: dsa: ocelot: call dsa_tag_8021q_unregister() under rtnl_lock() on driver remove | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers success CCed 11 of 11 maintainers
netdev/build_clang success Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vladimir Oltean Aug. 3, 2023, 1:42 p.m. UTC
When the tagging protocol in current use is "ocelot-8021q" and we unbind
the driver, we see this splat:

$ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
mscc_felix 0000:00:00.5 swp0: left promiscuous mode
sja1105 spi2.0: Link is Down
DSA: tree 1 torn down
mscc_felix 0000:00:00.5 swp2: left promiscuous mode
sja1105 spi2.2: Link is Down
DSA: tree 3 torn down
fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
mscc_felix 0000:00:00.5: Link is Down
------------[ cut here ]------------
RTNL: assertion failed at net/dsa/tag_8021q.c (409)
WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0
Modules linked in:
CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771
pc : dsa_tag_8021q_unregister+0x12c/0x1a0
lr : dsa_tag_8021q_unregister+0x12c/0x1a0
Call trace:
 dsa_tag_8021q_unregister+0x12c/0x1a0
 felix_tag_8021q_teardown+0x130/0x150
 felix_teardown+0x3c/0xd8
 dsa_tree_teardown_switches+0xbc/0xe0
 dsa_unregister_switch+0x168/0x260
 felix_pci_remove+0x30/0x60
 pci_device_remove+0x4c/0x100
 device_release_driver_internal+0x188/0x288
 device_links_unbind_consumers+0xfc/0x138
 device_release_driver_internal+0xe0/0x288
 device_driver_detach+0x24/0x38
 unbind_store+0xd8/0x108
 drv_attr_store+0x30/0x50
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
RTNL: assertion failed at net/8021q/vlan_core.c (376)
WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0
CPU: 1 PID: 329 Comm: bash Tainted: G        W          6.5.0-rc3+ #771
pc : vlan_vid_del+0x1b8/0x1f0
lr : vlan_vid_del+0x1b8/0x1f0
 dsa_tag_8021q_unregister+0x8c/0x1a0
 felix_tag_8021q_teardown+0x130/0x150
 felix_teardown+0x3c/0xd8
 dsa_tree_teardown_switches+0xbc/0xe0
 dsa_unregister_switch+0x168/0x260
 felix_pci_remove+0x30/0x60
 pci_device_remove+0x4c/0x100
 device_release_driver_internal+0x188/0x288
 device_links_unbind_consumers+0xfc/0x138
 device_release_driver_internal+0xe0/0x288
 device_driver_detach+0x24/0x38
 unbind_store+0xd8/0x108
 drv_attr_store+0x30/0x50
DSA: tree 0 torn down

This was somewhat not so easy to spot, because "ocelot-8021q" is not the
default tagging protocol, and thus, not everyone who tests the unbinding
path may have switched to it beforehand. The default
felix_tag_npi_teardown() does not require rtnl_lock() to be held.

Fixes: 7c83a7c539ab ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/ocelot/felix.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Simon Horman Aug. 3, 2023, 7:29 p.m. UTC | #1
On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote:
> When the tagging protocol in current use is "ocelot-8021q" and we unbind
> the driver, we see this splat:
> 
> $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
> mscc_felix 0000:00:00.5 swp0: left promiscuous mode
> sja1105 spi2.0: Link is Down
> DSA: tree 1 torn down
> mscc_felix 0000:00:00.5 swp2: left promiscuous mode
> sja1105 spi2.2: Link is Down
> DSA: tree 3 torn down
> fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
> mscc_felix 0000:00:00.5: Link is Down
> ------------[ cut here ]------------
> RTNL: assertion failed at net/dsa/tag_8021q.c (409)
> WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0
> Modules linked in:
> CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771
> pc : dsa_tag_8021q_unregister+0x12c/0x1a0
> lr : dsa_tag_8021q_unregister+0x12c/0x1a0
> Call trace:
>  dsa_tag_8021q_unregister+0x12c/0x1a0
>  felix_tag_8021q_teardown+0x130/0x150
>  felix_teardown+0x3c/0xd8
>  dsa_tree_teardown_switches+0xbc/0xe0
>  dsa_unregister_switch+0x168/0x260
>  felix_pci_remove+0x30/0x60
>  pci_device_remove+0x4c/0x100
>  device_release_driver_internal+0x188/0x288
>  device_links_unbind_consumers+0xfc/0x138
>  device_release_driver_internal+0xe0/0x288
>  device_driver_detach+0x24/0x38
>  unbind_store+0xd8/0x108
>  drv_attr_store+0x30/0x50
> ---[ end trace 0000000000000000 ]---
> ------------[ cut here ]------------
> RTNL: assertion failed at net/8021q/vlan_core.c (376)
> WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0
> CPU: 1 PID: 329 Comm: bash Tainted: G        W          6.5.0-rc3+ #771
> pc : vlan_vid_del+0x1b8/0x1f0
> lr : vlan_vid_del+0x1b8/0x1f0
>  dsa_tag_8021q_unregister+0x8c/0x1a0
>  felix_tag_8021q_teardown+0x130/0x150
>  felix_teardown+0x3c/0xd8
>  dsa_tree_teardown_switches+0xbc/0xe0
>  dsa_unregister_switch+0x168/0x260
>  felix_pci_remove+0x30/0x60
>  pci_device_remove+0x4c/0x100
>  device_release_driver_internal+0x188/0x288
>  device_links_unbind_consumers+0xfc/0x138
>  device_release_driver_internal+0xe0/0x288
>  device_driver_detach+0x24/0x38
>  unbind_store+0xd8/0x108
>  drv_attr_store+0x30/0x50
> DSA: tree 0 torn down
> 
> This was somewhat not so easy to spot, because "ocelot-8021q" is not the
> default tagging protocol, and thus, not everyone who tests the unbinding
> path may have switched to it beforehand. The default
> felix_tag_npi_teardown() does not require rtnl_lock() to be held.
> 
> Fixes: 7c83a7c539ab ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  drivers/net/dsa/ocelot/felix.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
> index fd7eb4a52918..9a3e5ec16972 100644
> --- a/drivers/net/dsa/ocelot/felix.c
> +++ b/drivers/net/dsa/ocelot/felix.c
> @@ -1619,8 +1619,10 @@ static void felix_teardown(struct dsa_switch *ds)
>  	struct felix *felix = ocelot_to_felix(ocelot);
>  	struct dsa_port *dp;
>  
> +	rtnl_lock();
>  	if (felix->tag_proto_ops)
>  		felix->tag_proto_ops->teardown(ds);
> +	rtnl_unlock();

Hi Vladimir,

I am curious to know if RTNL could be taken in
felix_tag_8021q_teardown() instead.

>  
>  	dsa_switch_for_each_available_port(dp, ds)
>  		ocelot_deinit_port(ocelot, dp->index);
> -- 
> 2.34.1
> 
>
Vladimir Oltean Aug. 4, 2023, 11:10 a.m. UTC | #2
On Thu, Aug 03, 2023 at 09:29:38PM +0200, Simon Horman wrote:
> On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote:
> > diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
> > index fd7eb4a52918..9a3e5ec16972 100644
> > --- a/drivers/net/dsa/ocelot/felix.c
> > +++ b/drivers/net/dsa/ocelot/felix.c
> > @@ -1619,8 +1619,10 @@ static void felix_teardown(struct dsa_switch *ds)
> >  	struct felix *felix = ocelot_to_felix(ocelot);
> >  	struct dsa_port *dp;
> >  
> > +	rtnl_lock();
> >  	if (felix->tag_proto_ops)
> >  		felix->tag_proto_ops->teardown(ds);
> > +	rtnl_unlock();
> 
> Hi Vladimir,
> 
> I am curious to know if RTNL could be taken in
> felix_tag_8021q_teardown() instead.

Negative. This call path also exists:

dsa_tree_change_tag_proto()
-> rtnl_trylock()
-> dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO, &info)
   -> dsa_switch_change_tag_proto()
      -> ds->ops->change_tag_protocol()
         -> felix_change_tag_protocol()
            -> old_proto_ops->teardown()
               -> felix_tag_8021q_teardown()

where the rtnl_mutex is already held.
Colin Foster Aug. 4, 2023, 4:03 p.m. UTC | #3
Hi Vladimir,

On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote:
> When the tagging protocol in current use is "ocelot-8021q" and we unbind
> the driver, we see this splat:
> 
> $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
> mscc_felix 0000:00:00.5 swp0: left promiscuous mode
> sja1105 spi2.0: Link is Down
> DSA: tree 1 torn down
> mscc_felix 0000:00:00.5 swp2: left promiscuous mode
> sja1105 spi2.2: Link is Down
> DSA: tree 3 torn down
> fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
> mscc_felix 0000:00:00.5: Link is Down
> ------------[ cut here ]------------
> RTNL: assertion failed at net/dsa/tag_8021q.c (409)
> WARNING: CPU: 1 PID: 329 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x12c/0x1a0
> Modules linked in:
> CPU: 1 PID: 329 Comm: bash Not tainted 6.5.0-rc3+ #771
> pc : dsa_tag_8021q_unregister+0x12c/0x1a0
> lr : dsa_tag_8021q_unregister+0x12c/0x1a0
> Call trace:
>  dsa_tag_8021q_unregister+0x12c/0x1a0
>  felix_tag_8021q_teardown+0x130/0x150
>  felix_teardown+0x3c/0xd8
>  dsa_tree_teardown_switches+0xbc/0xe0
>  dsa_unregister_switch+0x168/0x260
>  felix_pci_remove+0x30/0x60
>  pci_device_remove+0x4c/0x100
>  device_release_driver_internal+0x188/0x288
>  device_links_unbind_consumers+0xfc/0x138
>  device_release_driver_internal+0xe0/0x288
>  device_driver_detach+0x24/0x38
>  unbind_store+0xd8/0x108
>  drv_attr_store+0x30/0x50
> ---[ end trace 0000000000000000 ]---
> ------------[ cut here ]------------
> RTNL: assertion failed at net/8021q/vlan_core.c (376)
> WARNING: CPU: 1 PID: 329 at net/8021q/vlan_core.c:376 vlan_vid_del+0x1b8/0x1f0
> CPU: 1 PID: 329 Comm: bash Tainted: G        W          6.5.0-rc3+ #771
> pc : vlan_vid_del+0x1b8/0x1f0
> lr : vlan_vid_del+0x1b8/0x1f0
>  dsa_tag_8021q_unregister+0x8c/0x1a0
>  felix_tag_8021q_teardown+0x130/0x150
>  felix_teardown+0x3c/0xd8
>  dsa_tree_teardown_switches+0xbc/0xe0
>  dsa_unregister_switch+0x168/0x260
>  felix_pci_remove+0x30/0x60
>  pci_device_remove+0x4c/0x100
>  device_release_driver_internal+0x188/0x288
>  device_links_unbind_consumers+0xfc/0x138
>  device_release_driver_internal+0xe0/0x288
>  device_driver_detach+0x24/0x38
>  unbind_store+0xd8/0x108
>  drv_attr_store+0x30/0x50
> DSA: tree 0 torn down
> 
> This was somewhat not so easy to spot, because "ocelot-8021q" is not the
> default tagging protocol, and thus, not everyone who tests the unbinding
> path may have switched to it beforehand. The default
> felix_tag_npi_teardown() does not require rtnl_lock() to be held.

I ran this unbind test (with just ocelot tagging) on my currently
running system (6.5.1-rc1 + 8). This doesn't include your patch, but I
suspect this is entirely different because I'm not using ocelot-8021q.

# echo spi0.0 > /sys/bus/spi/drivers/ocelot-soc/unbind
br0: port 1(swp1) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left promiscuous mode
br0: port 1(swp1) entered disabled state
br0: port 2(swp2) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left promiscuous mode
br0: port 2(swp2) entered disabled state
br0: port 3(swp3) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left promiscuous mode
br0: port 3(swp3) entered disabled state
br0: port 4(swp4) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left promiscuous mode
br0: port 4(swp4) entered disabled state
br0: port 5(swp5) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left promiscuous mode
br0: port 5(swp5) entered disabled state
br0: port 6(swp6) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left promiscuous mode
br0: port 6(swp6) entered disabled state
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left allmulticast mode
cpsw-switch 4a100000.switch eth0: left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left promiscuous mode
cpsw-switch 4a100000.switch eth0: left promiscuous mode
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down
DSA: tree 0 torn down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
Modules linked in:
CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc1-00008-ga5ed09af118a #1324
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:00000009 r6:00000000 r5:c18c0a8c r4:000e0113
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:00000009 r6:c1186e10 r5:000005d2 r4:c1a06270
 dump_stack from __warn+0x88/0x160
 __warn from warn_slowpath_fmt+0xe4/0x1e0
 r8:00000009 r7:000005d2 r6:c1a06270 r5:c1d05590 r4:c1c978a4
 warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
 r10:c1ea8b7c r9:c4290da8 r8:00000100 r7:c1a06270 r6:c4288380 r5:c427f800
 r4:c427f600
 dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
 r9:c4290da8 r8:00000044 r7:c4255c54 r6:c4290db0 r5:c4290d80 r4:c4288380
 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
 r9:c1f6ec1c r8:00000044 r7:c4255c54 r6:c1ec5454 r5:00000000 r4:c26db800
 ocelot_ext_remove from platform_remove+0x50/0x6c
 r5:00000000 r4:c4255c10
 platform_remove from device_remove+0x50/0x74
 r5:00000000 r4:c4255c10
 device_remove from device_release_driver_internal+0x190/0x204
 r5:00000000 r4:c4255c10
 device_release_driver_internal from device_release_driver+0x20/0x24
 r9:c1f6ec1c r8:c2146940 r7:c2146938 r6:c214690c r5:c4255c10 r4:c2146930
 device_release_driver from bus_remove_device+0xd0/0xf4
 bus_remove_device from device_del+0x164/0x454
 r9:c1f6ec1c r8:c424d800 r7:c47b4700 r6:00000000 r5:c4255c10 r4:c4255c54
 device_del from platform_device_del.part.0+0x20/0x84
 r10:c1ea8b7c r9:c4292e80 r8:00000100 r7:00000122 r6:c4255c00 r5:c4255c00
 r4:c4255c00
 platform_device_del.part.0 from platform_device_unregister+0x28/0x34
 r5:c4255c10 r4:c4255c00
 platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
 r5:c4255c10 r4:c1ea8b7c
 mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
 r10:c47b4700 r9:c1d04d5c r8:c1f099a8 r7:c424d800 r6:c0a98f74 r5:e0c55d78
 r4:00000000 r3:00000001
 device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
 r6:e0c55dd4 r5:c4270e00 r4:c4270f00
 devm_mfd_dev_release from release_nodes+0x78/0x104
 release_nodes from devres_release_all+0x90/0xe0
 r10:c4b05b10 r9:00000000 r8:c424d444 r7:c424d9b0 r6:80030013 r5:00000039
 r4:c424d800
 devres_release_all from device_unbind_cleanup+0x1c/0x70
 r7:c424d844 r6:c1ea8b94 r5:c424d400 r4:c424d800
 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
 r5:c424d400 r4:c424d800
 device_release_driver_internal from device_driver_detach+0x20/0x24
 r9:00000000 r8:00000000 r7:c1ea8b94 r6:00000007 r5:c424d800 r4:c1eb9108
 device_driver_detach from unbind_store+0x64/0xa0
 unbind_store from drv_attr_store+0x34/0x40
 r7:e0c55f08 r6:c4b05b00 r5:c471d040 r4:c0a53410
 drv_attr_store from sysfs_kf_write+0x48/0x54
 r5:c471d040 r4:c0a5266c
 sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
 r5:c471d040 r4:00000007
 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
 r10:00000000 r9:00004004 r8:00000000 r7:00000007 r6:005c9ef8 r5:e0c55f68
 r4:c4958cc0
 vfs_write from ksys_write+0x70/0xf4
 r10:00000004 r9:c47b4700 r8:c03002f4 r7:00000000 r6:00000000 r5:c4958cc0
 r4:c4958cc0
 ksys_write from sys_write+0x18/0x1c
 r7:00000004 r6:b6fad550 r5:005c9ef8 r4:00000007
 sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0:                   00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
---[ end trace 0000000000000000 ]---
gpio_stub_drv gpiochip6: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
BUG: scheduling while atomic: bash/157/0x00000002
Modules linked in:
Preemption disabled at:
[<c03b8f98>] __wake_up_klogd.part.0+0x20/0xb4
CPU: 0 PID: 157 Comm: bash Tainted: G        W          6.5.0-rc1-00008-ga5ed09af118a #1324
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c47b4700 r6:00000000 r5:c18c0a8c r4:000e0113
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c47b4700 r6:c47b4700 r5:c03b8f98 r4:c47b4700
 dump_stack from __schedule_bug+0x94/0xa4
 __schedule_bug from __schedule+0x8fc/0xc48
 r5:00000000 r4:df99a400
 __schedule from schedule+0x60/0xf4
 r10:e0c55ab4 r9:00000002 r8:e0c55a3c r7:c47b4700 r6:e0c55ab0 r5:e0c55aac
 r4:c47b4700
 schedule from schedule_timeout+0xd8/0x190
 r5:e0c55aac r4:7fffffff
 schedule_timeout from wait_for_completion+0xa0/0x124
 r8:e0c55a3c r7:c47b4700 r6:e0c55ab0 r5:e0c55aac r4:7fffffff
 wait_for_completion from devtmpfs_submit_req+0x70/0x80
 r10:c47b4700 r9:c1f6ec1c r8:c424e810 r7:00000000 r6:e0c55aac r5:e0c55aa8
 r4:c1f6ed78
 devtmpfs_submit_req from devtmpfs_delete_node+0x84/0xb4
 r7:c47b4700 r6:c4250264 r5:c4250000 r4:00000000
 devtmpfs_delete_node from device_del+0x3b8/0x454
 r5:c4250000 r4:c4250044
 device_del from cdev_device_del+0x24/0x54
 r10:c47b4700 r9:c1d04d5c r8:00000040 r7:c4250234 r6:c4250264 r5:c42501e0
 r4:c4250000
 cdev_device_del from gpiolib_cdev_unregister+0x20/0x24
 r5:c4250000 r4:00000000
 gpiolib_cdev_unregister from gpiochip_remove+0x100/0x130
 gpiochip_remove from devm_gpio_chip_release+0x18/0x1c
 r9:c1d04d5c r8:c1f099a8 r7:c424e810 r6:e0c55bf4 r5:c427e700 r4:c427ea80
 devm_gpio_chip_release from devm_action_release+0x1c/0x20
 devm_action_release from release_nodes+0x78/0x104
 release_nodes from devres_release_all+0x90/0xe0
 r10:c1ea8b7c r9:c1f6ec1c r8:00000044 r7:c424e9c0 r6:800e0113 r5:00000093
 r4:c424e810
 devres_release_all from device_unbind_cleanup+0x1c/0x70
 r7:c424e854 r6:c1dd9a80 r5:00000000 r4:c424e810
 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
 r5:00000000 r4:c424e810
 device_release_driver_internal from device_release_driver+0x20/0x24
 r9:c1f6ec1c r8:c2146940 r7:c2146938 r6:c214690c r5:c424e810 r4:c2146930
 device_release_driver from bus_remove_device+0xd0/0xf4
 bus_remove_device from device_del+0x164/0x454
 r9:c1f6ec1c r8:c424d800 r7:c47b4700 r6:00000000 r5:c424e810 r4:c424e854
 device_del from platform_device_del.part.0+0x20/0x84
 r10:c1ea8b7c r9:c4274f00 r8:00000100 r7:00000122 r6:c424e800 r5:c424e800
 r4:c424e800
 platform_device_del.part.0 from platform_device_unregister+0x28/0x34
 r5:c424e810 r4:c424e800
 platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
 r5:c424e810 r4:c1ea8b7c
 mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
 r10:c47b4700 r9:c1d04d5c r8:c1f099a8 r7:c424d800 r6:c0a98f74 r5:e0c55d78
 r4:00000000 r3:00000001
 device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
 r6:e0c55dd4 r5:c4270e00 r4:c4270f00
 devm_mfd_dev_release from release_nodes+0x78/0x104
 release_nodes from devres_release_all+0x90/0xe0
 r10:c4b05b10 r9:00000000 r8:c424d444 r7:c424d9b0 r6:80030013 r5:00000039
 r4:c424d800
 devres_release_all from device_unbind_cleanup+0x1c/0x70
 r7:c424d844 r6:c1ea8b94 r5:c424d400 r4:c424d800
 device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
 r5:c424d400 r4:c424d800
 device_release_driver_internal from device_driver_detach+0x20/0x24
 r9:00000000 r8:00000000 r7:c1ea8b94 r6:00000007 r5:c424d800 r4:c1eb9108
 device_driver_detach from unbind_store+0x64/0xa0
 unbind_store from drv_attr_store+0x34/0x40
 r7:e0c55f08 r6:c4b05b00 r5:c471d040 r4:c0a53410
 drv_attr_store from sysfs_kf_write+0x48/0x54
 r5:c471d040 r4:c0a5266c
 sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
 r5:c471d040 r4:00000007
 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
 r10:00000000 r9:00004004 r8:00000000 r7:00000007 r6:005c9ef8 r5:e0c55f68
 r4:c4958cc0
 vfs_write from ksys_write+0x70/0xf4
 r10:00000004 r9:c47b4700 r8:c03002f4 r7:00000000 r6:00000000 r5:c4958cc0
 r4:c4958cc0
 ksys_write from sys_write+0x18/0x1c
 r7:00000004 r6:b6fad550 r5:005c9ef8 r4:00000007
 sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0:                   00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
cpsw-switch 4a100000.switch eth0: Link is Down


It looks to me like I have some things to fix :)


Is it worth me still trying to recreate / test? I haven't used
ocelot-8021q really at all.


Colin
Vladimir Oltean Aug. 4, 2023, 5:09 p.m. UTC | #4
Hi Colin,

On Fri, Aug 04, 2023 at 09:03:33AM -0700, Colin Foster wrote:
> On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote:
> I ran this unbind test (with just ocelot tagging) on my currently
> running system (6.5.1-rc1 + 8). This doesn't include your patch, but I
> suspect this is entirely different because I'm not using ocelot-8021q.
> 
> # echo spi0.0 > /sys/bus/spi/drivers/ocelot-soc/unbind
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
> Modules linked in:
> CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc1-00008-ga5ed09af118a #1324
> Hardware name: Generic AM33XX (Flattened Device Tree)
> Backtrace:
>  __warn from warn_slowpath_fmt+0xe4/0x1e0
>  warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
>  dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
>  dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
>  ocelot_ext_remove from platform_remove+0x50/0x6c
>  platform_remove from device_remove+0x50/0x74
>  device_remove from device_release_driver_internal+0x190/0x204
>  device_release_driver_internal from device_release_driver+0x20/0x24
>  device_release_driver from bus_remove_device+0xd0/0xf4
>  bus_remove_device from device_del+0x164/0x454
>  device_del from platform_device_del.part.0+0x20/0x84
>  platform_device_del.part.0 from platform_device_unregister+0x28/0x34
>  platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
>  mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
>  device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
>  devm_mfd_dev_release from release_nodes+0x78/0x104
>  release_nodes from devres_release_all+0x90/0xe0
>  devres_release_all from device_unbind_cleanup+0x1c/0x70
>  device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
>  device_release_driver_internal from device_driver_detach+0x20/0x24
>  device_driver_detach from unbind_store+0x64/0xa0
>  unbind_store from drv_attr_store+0x34/0x40
>  drv_attr_store from sysfs_kf_write+0x48/0x54
>  sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
>  kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
>  vfs_write from ksys_write+0x70/0xf4
>  ksys_write from sys_write+0x18/0x1c
>  sys_write from ret_fast_syscall+0x0/0x1c
> Exception stack(0xe0c55fa8 to 0xe0c55ff0)
> 5fa0:                   00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
> 5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
> 5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
> ---[ end trace 0000000000000000 ]---
> gpio_stub_drv gpiochip6: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
> BUG: scheduling while atomic: bash/157/0x00000002
> Modules linked in:
> Preemption disabled at:
> [<c03b8f98>] __wake_up_klogd.part.0+0x20/0xb4
> CPU: 0 PID: 157 Comm: bash Tainted: G        W          6.5.0-rc1-00008-ga5ed09af118a #1324
> Hardware name: Generic AM33XX (Flattened Device Tree)
> Backtrace:
>  __schedule_bug from __schedule+0x8fc/0xc48
>  __schedule from schedule+0x60/0xf4
>  schedule from schedule_timeout+0xd8/0x190
>  schedule_timeout from wait_for_completion+0xa0/0x124
>  wait_for_completion from devtmpfs_submit_req+0x70/0x80
>  devtmpfs_submit_req from devtmpfs_delete_node+0x84/0xb4
>  devtmpfs_delete_node from device_del+0x3b8/0x454
>  device_del from cdev_device_del+0x24/0x54
>  cdev_device_del from gpiolib_cdev_unregister+0x20/0x24
>  gpiolib_cdev_unregister from gpiochip_remove+0x100/0x130
>  gpiochip_remove from devm_gpio_chip_release+0x18/0x1c
>  devm_gpio_chip_release from devm_action_release+0x1c/0x20
>  devm_action_release from release_nodes+0x78/0x104
>  release_nodes from devres_release_all+0x90/0xe0
>  devres_release_all from device_unbind_cleanup+0x1c/0x70
>  device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
>  device_release_driver_internal from device_release_driver+0x20/0x24
>  device_release_driver from bus_remove_device+0xd0/0xf4
>  bus_remove_device from device_del+0x164/0x454
>  device_del from platform_device_del.part.0+0x20/0x84
>  platform_device_del.part.0 from platform_device_unregister+0x28/0x34
>  platform_device_unregister from mfd_remove_devices_fn+0xe8/0xf4
>  mfd_remove_devices_fn from device_for_each_child_reverse+0x80/0xc8
>  device_for_each_child_reverse from devm_mfd_dev_release+0x40/0x68
>  devm_mfd_dev_release from release_nodes+0x78/0x104
>  release_nodes from devres_release_all+0x90/0xe0
>  devres_release_all from device_unbind_cleanup+0x1c/0x70
>  device_unbind_cleanup from device_release_driver_internal+0x1c0/0x204
>  device_release_driver_internal from device_driver_detach+0x20/0x24
>  device_driver_detach from unbind_store+0x64/0xa0
>  unbind_store from drv_attr_store+0x34/0x40
>  drv_attr_store from sysfs_kf_write+0x48/0x54
>  sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
>  kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
>  vfs_write from ksys_write+0x70/0xf4
>  ksys_write from sys_write+0x18/0x1c
>  sys_write from ret_fast_syscall+0x0/0x1c
> Exception stack(0xe0c55fa8 to 0xe0c55ff0)
> 5fa0:                   00000007 005c9ef8 00000001 005c9ef8 00000007 00000000
> 5fc0: 00000007 005c9ef8 b6fad550 00000004 00000007 00000001 00000000 be8e4a6c
> 5fe0: 00000004 be8e49c8 b6e56767 b6de1e06
> cpsw-switch 4a100000.switch eth0: Link is Down
> 
> 
> It looks to me like I have some things to fix :)
> 
> 
> Is it worth me still trying to recreate / test? I haven't used
> ocelot-8021q really at all.

The WARN_ON() in dsa_switch_release_ports() is different, and I tried to fix it here:
https://patchwork.kernel.org/project/netdevbpf/patch/20230411144955.1604591-1-vladimir.oltean@nxp.com/
but judging by the fact that that was in April and now we're in August,
obviously I didn't succeed.

What's worse is the other one, the "scheduling while atomic" bug in the
gpiochip removal path from ocelot-pinctrl.c. I'm not sure, at first glance,
what causes the calling context to be atomic. Presumably some kind of
spinlock which should be tracked down.

Unfortunately I'm not very good with kernel debugging the way it should
be done, so what I would advise you to do is to walk the stack trace
down, from device_del() or so, and sprinkle a few might_sleep() calls
until you figure out who's forcing atomic context and why.

Otherwise, can't you just unbind the driver from the ethernet-switch
child of the SPI device, rather than the entire SPI device? That should
avoid the gpiochip/pinctrl bug. And the other one is ignorable for the
intents and purposes here (that is, unless you want to take care of it,
of course).
Colin Foster Aug. 4, 2023, 9:49 p.m. UTC | #5
Hi Vladimir,

On Fri, Aug 04, 2023 at 08:09:58PM +0300, Vladimir Oltean wrote:
> Hi Colin,
> 
> The WARN_ON() in dsa_switch_release_ports() is different, and I tried to fix it here:
> https://patchwork.kernel.org/project/netdevbpf/patch/20230411144955.1604591-1-vladimir.oltean@nxp.com/
> but judging by the fact that that was in April and now we're in August,
> obviously I didn't succeed.
> 
> What's worse is the other one, the "scheduling while atomic" bug in the
> gpiochip removal path from ocelot-pinctrl.c. I'm not sure, at first glance,
> what causes the calling context to be atomic. Presumably some kind of
> spinlock which should be tracked down.
> 
> Unfortunately I'm not very good with kernel debugging the way it should
> be done, so what I would advise you to do is to walk the stack trace
> down, from device_del() or so, and sprinkle a few might_sleep() calls
> until you figure out who's forcing atomic context and why.

I'm not super confident in tracking down these types of bugs either, but
you already knew that ;-)  Thanks for these debugging tips. This is
absolutely a problem I want to understand!

When I was _really_ early in my SGPIO LED bring-up, I saw a scheduling
while atomic warning when I set one of those LEDs to trigger on
heartbeat. I have since tested that many times and never saw it again.

> Otherwise, can't you just unbind the driver from the ethernet-switch
> child of the SPI device, rather than the entire SPI device? That should
> avoid the gpiochip/pinctrl bug. And the other one is ignorable for the
> intents and purposes here (that is, unless you want to take care of it,
> of course).

Ok, I might have it recreated. My method:

(updated to 6.5-rc4)
Set dsa up for ocelot-8021q:

echo ocelot-8021q > /sys/class/net/eth0/dsa/tagging

Bring up the ports in my jumpered bridge configuration. (Bridge
swp1-swp7 and have a bunch of short cables to test RSTP.)

**
Side note, in 8021q mode I can ping through the bridge's swp1 to my
desktop, but RSTP forwards on all ports. I'm adding this to my list.
**

Unbind the ocelot-ext driver:

echo ocelot-ext-switch.5.auto > \
/sys/bus/platform/drivers/ocelot-ext-switch/unbind

br0: port 1(swp1) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp1 (unregistering): left promiscuous mode
br0: port 1(swp1) entered disabled state
br0: port 2(swp2) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp2 (unregistering): left promiscuous mode
br0: port 2(swp2) entered disabled state
br0: port 3(swp3) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp3 (unregistering): left promiscuous mode
br0: port 3(swp3) entered disabled state
br0: port 4(swp4) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp4 (unregistering): left promiscuous mode
br0: port 4(swp4) entered disabled state
br0: port 5(swp5) entered disabled state
br0: port 5(swp5) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp5 (unregistering): left promiscuous mode
br0: port 5(swp5) entered disabled state
br0: port 6(swp6) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp6 (unregistering): left promiscuous mode
br0: port 6(swp6) entered disabled state
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left allmulticast mode
cpsw-switch 4a100000.switch eth0: left allmulticast mode
ocelot-ext-switch ocelot-ext-switch.5.auto swp7 (unregistering): left promiscuous mode
cpsw-switch 4a100000.switch eth0: left promiscuous mode
br0: port 7(swp7) entered disabled state
ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x1a8/0x1ac
RTNL: assertion failed at net/dsa/tag_8021q.c (409)
Modules linked in:
CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc4-00008-gda8b39e58927 #1325
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:00000009 r6:00000000 r5:c18c0c5c r4:00030113
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:00000009 r6:c1197930 r5:00000199 r4:c1a06dfc
 dump_stack from __warn+0x88/0x160
 __warn from warn_slowpath_fmt+0x150/0x1e0
 r8:00000009 r7:00000199 r6:c1a06dfc r5:c1d05590 r4:c1c978a4
 warn_slowpath_fmt from dsa_tag_8021q_unregister+0x1a8/0x1ac
 r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c46ec180 r5:c4284a30
 r4:c4298300
 dsa_tag_8021q_unregister from felix_tag_8021q_teardown+0x94/0x140
 r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c26db808 r5:c4284a30
 r4:c4298300
 felix_tag_8021q_teardown from felix_teardown+0x30/0xc0
 r7:00000000 r6:c26db808 r5:c4298300 r4:c4298300
 felix_teardown from dsa_tree_teardown_switches+0xa4/0xd4
 r7:00000000 r6:c4284b48 r5:c4282600 r4:c4298300
 dsa_tree_teardown_switches from dsa_unregister_switch+0xe4/0x18c
 r7:c4255c54 r6:c1ec5414 r5:c4284b40 r4:c4298300
 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
 r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800
 ocelot_ext_remove from platform_remove+0x50/0x6c
 r5:c424d800 r4:c4255c10
 platform_remove from device_remove+0x50/0x74
 r5:c424d800 r4:c4255c10
 device_remove from device_release_driver_internal+0x190/0x204
 r5:c424d800 r4:c4255c10
 device_release_driver_internal from device_driver_detach+0x20/0x24
 r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0
 device_driver_detach from unbind_store+0x64/0xa0
 unbind_store from drv_attr_store+0x34/0x40
 r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740
 drv_attr_store from sysfs_kf_write+0x48/0x54
 r5:c471a100 r4:c0a5299c
 sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
 r5:c471a100 r4:00000019
 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
 r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68
 r4:c4714f00
 vfs_write from ksys_write+0x70/0xf4
 r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00
 r4:c4714f02
 ksys_write from sys_write+0x18/0x1c
 r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019
 sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0:                   00000019 00579a60 00000001 00579a60 00000019 00000000
5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c
5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06
---[ end trace 0000000000000000 ]---
cpsw-switch 4a100000.switch eth0: Link is Down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/8021q/vlan_core.c:376 vlan_vid_del+0x14c/0x16c
RTNL: assertion failed at net/8021q/vlan_core.c (376)
Modules linked in:
CPU: 0 PID: 157 Comm: bash Tainted: G        W          6.5.0-rc4-00008-gda8b39e58927 #1325
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:00000009 r6:00000000 r5:c18c0c5c r4:000f0013
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:00000009 r6:c11a5554 r5:00000178 r4:c1a078b4
 dump_stack from __warn+0x88/0x160
 __warn from warn_slowpath_fmt+0x150/0x1e0
 r8:00000009 r7:00000178 r6:c1a078b4 r5:c1d05590 r4:c1c978a4
 warn_slowpath_fmt from vlan_vid_del+0x14c/0x16c
 r10:00000c01 r9:c46ec180 r8:c241f000 r7:c241f000 r6:00000c01 r5:0000a888
 r4:c4298300
 vlan_vid_del from dsa_tag_8021q_unregister+0x164/0x1ac
 r9:c46ec180 r8:c241f000 r7:00000000 r6:c46ec180 r5:00000001 r4:c4298300
 dsa_tag_8021q_unregister from felix_tag_8021q_teardown+0x94/0x140
 r10:c2182b90 r9:00000000 r8:c424d844 r7:0100003e r6:c26db808 r5:c4284a30
 r4:c4298300
 felix_tag_8021q_teardown from felix_teardown+0x30/0xc0
 r7:00000000 r6:c26db808 r5:c4298300 r4:c4298300
 felix_teardown from dsa_tree_teardown_switches+0xa4/0xd4
 r7:00000000 r6:c4284b48 r5:c4282600 r4:c4298300
 dsa_tree_teardown_switches from dsa_unregister_switch+0xe4/0x18c
 r7:c4255c54 r6:c1ec5414 r5:c4284b40 r4:c4298300
 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
 r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800
 ocelot_ext_remove from platform_remove+0x50/0x6c
 r5:c424d800 r4:c4255c10
 platform_remove from device_remove+0x50/0x74
 r5:c424d800 r4:c4255c10
 device_remove from device_release_driver_internal+0x190/0x204
 r5:c424d800 r4:c4255c10
 device_release_driver_internal from device_driver_detach+0x20/0x24
 r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0
 device_driver_detach from unbind_store+0x64/0xa0
 unbind_store from drv_attr_store+0x34/0x40
 r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740
 drv_attr_store from sysfs_kf_write+0x48/0x54
 r5:c471a100 r4:c0a5299c
 sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
 r5:c471a100 r4:00000019
 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
 r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68
 r4:c4714f00
 vfs_write from ksys_write+0x70/0xf4
 r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00
 r4:c4714f02
 ksys_write from sys_write+0x18/0x1c
 r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019
 sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0:                   00000019 00579a60 00000001 00579a60 00000019 00000000
5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c
5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06
---[ end trace 0000000000000000 ]---
DSA: tree 0 torn down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
Modules linked in:
CPU: 0 PID: 157 Comm: bash Tainted: G        W          6.5.0-rc4-00008-gda8b39e58927 #1325
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:00000009 r6:00000000 r5:c18c0c5c r4:000d0013
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:00000009 r6:c1187600 r5:000005d2 r4:c1a06454
 dump_stack from __warn+0x88/0x160
 __warn from warn_slowpath_fmt+0xe4/0x1e0
 r8:00000009 r7:000005d2 r6:c1a06454 r5:c1d05590 r4:c1c978a4
 warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
 r10:c2182b90 r9:c4284b68 r8:00000100 r7:c1a06454 r6:c4298300 r5:c4282800
 r4:c4282600
 dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
 r9:c4284b68 r8:c424d844 r7:c4255c54 r6:c4284b70 r5:c4284b40 r4:c4298300
 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
 r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800
 ocelot_ext_remove from platform_remove+0x50/0x6c
 r5:c424d800 r4:c4255c10
 platform_remove from device_remove+0x50/0x74
 r5:c424d800 r4:c4255c10
 device_remove from device_release_driver_internal+0x190/0x204
 r5:c424d800 r4:c4255c10
 device_release_driver_internal from device_driver_detach+0x20/0x24
 r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0
 device_driver_detach from unbind_store+0x64/0xa0
 unbind_store from drv_attr_store+0x34/0x40
 r7:e0c55f08 r6:c2182b80 r5:c471a100 r4:c0a53740
 drv_attr_store from sysfs_kf_write+0x48/0x54
 r5:c471a100 r4:c0a5299c
 sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
 r5:c471a100 r4:00000019
 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
 r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:00579a60 r5:e0c55f68
 r4:c4714f00
 vfs_write from ksys_write+0x70/0xf4
 r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000000 r6:00000000 r5:c4714f00
 r4:c4714f02
 ksys_write from sys_write+0x18/0x1c
 r7:00000004 r6:b6f86550 r5:00579a60 r4:00000019
 sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0:                   00000019 00579a60 00000001 00579a60 00000019 00000000
5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c
5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06
---[ end trace 0000000000000000 ]---
BUG: scheduling while atomic: bash/157/0x00000002
Modules linked in:
Preemption disabled at:
[<c03b8ff0>] __wake_up_klogd.part.0+0x20/0xb4
CPU: 0 PID: 157 Comm: bash Tainted: G        W          6.5.0-rc4-00008-gda8b39e58927 #1325
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:00000002 r6:00000080 r5:c18c0c5c r4:000d0093
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:00000002 r6:c42ad8c0 r5:c03b8ff0 r4:c42ad8c0
 dump_stack from __schedule_bug+0x94/0xa4
 __schedule_bug from __schedule+0x8fc/0xc48
 r5:00000000 r4:df99a400
 __schedule from schedule+0x60/0xf4
 r10:5ac3c35a r9:befffd07 r8:fffffe30 r7:00000002 r6:c03002f4 r5:e0c55fb0
 r4:c42ad8c0
 schedule from do_work_pending+0x84/0x568
 r5:e0c55fb0 r4:c42ad8c0
 do_work_pending from slow_work_pending+0xc/0x20
Exception stack(0xe0c55fb0 to 0xe0c55ff8)
5fa0:                                     00000019 00579a60 00000019 00000000
5fc0: 00000019 00579a60 b6f86550 00000004 00000019 00000001 00000000 bed4ea6c
5fe0: 00000004 bed4e9c8 b6e2f767 b6dbae06 40010030 00000001
 r10:00000004 r9:c42ad8c0 r8:c03002f4 r7:00000004 r6:b6f86550 r5:00579a60
 r4:00000019



I repeat with your patch:

...

ocelot-ext-switch ocelot-ext-switch.5.auto: Link is Down
cpsw-switch 4a100000.switch eth0: Link is Down
DSA: tree 0 torn down
------------[ cut here ]------------
WARNING: CPU: 0 PID: 157 at net/dsa/dsa.c:1490 dsa_switch_release_ports+0x104/0x12c
Modules linked in:
CPU: 0 PID: 157 Comm: bash Not tainted 6.5.0-rc4-00008-gda8b39e58927-dirty #1326
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:00000009 r6:00000000 r5:c18c0c64 r4:00000013
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:00000009 r6:c1187600 r5:000005d2 r4:c1a06470
 dump_stack from __warn+0x88/0x160
 __warn from warn_slowpath_fmt+0xe4/0x1e0
 r8:00000009 r7:000005d2 r6:c1a06470 r5:c1d05590 r4:c1c978a4
 warn_slowpath_fmt from dsa_switch_release_ports+0x104/0x12c
 r10:c4afc890 r9:c4284b68 r8:00000100 r7:c1a06470 r6:c4298300 r5:c4282800
 r4:c4282600
 dsa_switch_release_ports from dsa_unregister_switch+0x38/0x18c
 r9:c4284b68 r8:c424d844 r7:c4255c54 r6:c4284b70 r5:c4284b40 r4:c4298300
 dsa_unregister_switch from ocelot_ext_remove+0x28/0x40
 r9:00000000 r8:c424d844 r7:c4255c54 r6:c1ec5414 r5:c424d800 r4:c26db800
 ocelot_ext_remove from platform_remove+0x50/0x6c
 r5:c424d800 r4:c4255c10
 platform_remove from device_remove+0x50/0x74
 r5:c424d800 r4:c4255c10
 device_remove from device_release_driver_internal+0x190/0x204
 r5:c424d800 r4:c4255c10
 device_release_driver_internal from device_driver_detach+0x20/0x24
 r9:00000000 r8:00000000 r7:c1ec5414 r6:00000019 r5:c4255c10 r4:c1ea59e0
 device_driver_detach from unbind_store+0x64/0xa0
 unbind_store from drv_attr_store+0x34/0x40
 r7:e0c55f08 r6:c4afc880 r5:c471a600 r4:c0a53740
 drv_attr_store from sysfs_kf_write+0x48/0x54
 r5:c471a600 r4:c0a5299c
 sysfs_kf_write from kernfs_fop_write_iter+0x11c/0x1dc
 r5:c471a600 r4:00000019
 kernfs_fop_write_iter from vfs_write+0x2d0/0x41c
 r10:00000000 r9:00004004 r8:00000000 r7:00000019 r6:005598a8 r5:e0c55f68
 r4:c4a819c0
 vfs_write from ksys_write+0x70/0xf4
 r10:00000004 r9:c47d0000 r8:c03002f4 r7:00000000 r6:00000000 r5:c4a819c0
 r4:c4a819c2
 ksys_write from sys_write+0x18/0x1c
 r7:00000004 r6:b6f1d550 r5:005598a8 r4:00000019
 sys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xe0c55fa8 to 0xe0c55ff0)
5fa0:                   00000019 005598a8 00000001 005598a8 00000019 00000000
5fc0: 00000019 005598a8 b6f1d550 00000004 00000019 00000001 00000000 be9f8a6c
5fe0: 00000004 be9f89c8 b6dc6767 b6d51e06
---[ end trace 0000000000000000 ]---
BUG: scheduling while atomic: bash/157/0x00000002
Modules linked in:
Preemption disabled at:
[<c03b8ff0>] __wake_up_klogd.part.0+0x20/0xb4
CPU: 0 PID: 157 Comm: bash Tainted: G        W          6.5.0-rc4-00008-gda8b39e58927-dirty #1326
Hardware name: Generic AM33XX (Flattened Device Tree)
Backtrace:
 dump_backtrace from show_stack+0x20/0x24
 r7:00000002 r6:00000080 r5:c18c0c64 r4:00000093
 show_stack from dump_stack_lvl+0x60/0x78
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:00000002 r6:c47d0000 r5:c03b8ff0 r4:c47d0000
 dump_stack from __schedule_bug+0x94/0xa4
 __schedule_bug from __schedule+0x8fc/0xc48
 r5:00000000 r4:df99a400
 __schedule from schedule+0x60/0xf4
 r10:5ac3c35a r9:befffd07 r8:fffffe30 r7:00000002 r6:c03002f4 r5:e0c55fb0
 r4:c47d0000
 schedule from do_work_pending+0x84/0x568
 r5:e0c55fb0 r4:c47d0000
 do_work_pending from slow_work_pending+0xc/0x20
Exception stack(0xe0c55fb0 to 0xe0c55ff8)
5fa0:                                     00000019 005598a8 00000019 00000000
5fc0: 00000019 005598a8 b6f1d550 00000004 00000019 00000001 00000000 be9f8a6c
5fe0: 00000004 be9f89c8 b6dc6767 b6d51e06 40010030 00000001
 r10:00000004 r9:c47d0000 r8:c03002f4 r7:00000004 r6:b6f1d550 r5:005598a8





So if your patch is confirming the existence and removal of the warnings:
WARNING: CPU: 0 PID: 157 at net/dsa/tag_8021q.c:409 dsa_tag_8021q_unregister+0x1a8/0x1ac
WARNING: CPU: 0 PID: 157 at net/8021q/vlan_core.c:376 vlan_vid_del+0x14c/0x16c

... and you think my testing methods are valid, I can add Acked / Tested
if you like.

("scheduling while atomic" and "dsa_switch_release_ports" I see when
unbinding, regardless of ocelot or ocelot-8021q tagging protocols)


Hope this helps, and let me know if there's anything else you want me to
test.
patchwork-bot+netdevbpf@kernel.org Aug. 4, 2023, 10:40 p.m. UTC | #6
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu,  3 Aug 2023 16:42:53 +0300 you wrote:
> When the tagging protocol in current use is "ocelot-8021q" and we unbind
> the driver, we see this splat:
> 
> $ echo '0000:00:00.2' > /sys/bus/pci/drivers/fsl_enetc/unbind
> mscc_felix 0000:00:00.5 swp0: left promiscuous mode
> sja1105 spi2.0: Link is Down
> DSA: tree 1 torn down
> mscc_felix 0000:00:00.5 swp2: left promiscuous mode
> sja1105 spi2.2: Link is Down
> DSA: tree 3 torn down
> fsl_enetc 0000:00:00.2 eno2: left promiscuous mode
> mscc_felix 0000:00:00.5: Link is Down
> 
> [...]

Here is the summary with links:
  - [net] net: dsa: ocelot: call dsa_tag_8021q_unregister() under rtnl_lock() on driver remove
    https://git.kernel.org/netdev/net/c/a94c16a2fda0

You are awesome, thank you!
Simon Horman Aug. 5, 2023, 8:10 a.m. UTC | #7
On Fri, Aug 04, 2023 at 02:10:45PM +0300, Vladimir Oltean wrote:
> On Thu, Aug 03, 2023 at 09:29:38PM +0200, Simon Horman wrote:
> > On Thu, Aug 03, 2023 at 04:42:53PM +0300, Vladimir Oltean wrote:
> > > diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
> > > index fd7eb4a52918..9a3e5ec16972 100644
> > > --- a/drivers/net/dsa/ocelot/felix.c
> > > +++ b/drivers/net/dsa/ocelot/felix.c
> > > @@ -1619,8 +1619,10 @@ static void felix_teardown(struct dsa_switch *ds)
> > >  	struct felix *felix = ocelot_to_felix(ocelot);
> > >  	struct dsa_port *dp;
> > >  
> > > +	rtnl_lock();
> > >  	if (felix->tag_proto_ops)
> > >  		felix->tag_proto_ops->teardown(ds);
> > > +	rtnl_unlock();
> > 
> > Hi Vladimir,
> > 
> > I am curious to know if RTNL could be taken in
> > felix_tag_8021q_teardown() instead.
> 
> Negative. This call path also exists:
> 
> dsa_tree_change_tag_proto()
> -> rtnl_trylock()
> -> dsa_tree_notify(dst, DSA_NOTIFIER_TAG_PROTO, &info)
>    -> dsa_switch_change_tag_proto()
>       -> ds->ops->change_tag_protocol()
>          -> felix_change_tag_protocol()
>             -> old_proto_ops->teardown()
>                -> felix_tag_8021q_teardown()
> 
> where the rtnl_mutex is already held.

Ack. Thanks for confirming.
In that case this looks good to me.

Reviewed-by: Simon Horman <horms@kernel.org>
diff mbox series

Patch

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index fd7eb4a52918..9a3e5ec16972 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -1619,8 +1619,10 @@  static void felix_teardown(struct dsa_switch *ds)
 	struct felix *felix = ocelot_to_felix(ocelot);
 	struct dsa_port *dp;
 
+	rtnl_lock();
 	if (felix->tag_proto_ops)
 		felix->tag_proto_ops->teardown(ds);
+	rtnl_unlock();
 
 	dsa_switch_for_each_available_port(dp, ds)
 		ocelot_deinit_port(ocelot, dp->index);