mbox series

[0/3] Fix for KSZ DSA switch shutdown

Message ID 20210909095324.12978-1-LinoSanfilippo@gmx.de (mailing list archive)
Headers show
Series Fix for KSZ DSA switch shutdown | expand

Message

Lino Sanfilippo Sept. 9, 2021, 9:53 a.m. UTC
This patch series fixes a system hang I got each time i tried to shutdown
or reboot a system that uses a KSZ9897 as a DSA switch with a broadcom
GENET network device as the DSA master device. At the time the system hangs
the message "unregister_netdevice: waiting for eth0 to become free. Usage
count = 2." is dumped periodically to the console.

After some investigation I found the reason to be unreleased references to
the master device which are still held by the slave devices at the time the
system is shut down (I have two slave devices in use).

While these references are supposed to be released in ksz_switch_remove()
this function never gets the chance to be called due to the system hang at
the master device deregistration which happens before ksz_switch_remove()
is called.

The fix is to make sure that the master device references are already
released when the device is unregistered. For this reason PATCH1 provides
a new function dsa_tree_shutdown() that can be called by DSA drivers to
untear the DSA switch at shutdown. PATCH2 uses this function in a new
helper function for KSZ switches to properly shutdown the KSZ switch.
PATCH 3 uses the new helper function in the KSZ9477 shutdown handler.

Theses patches have been tested on a Raspberry PI 5.10 kernel with a
KSZ9897. The patches have been adjusted to apply against net-next and are
compile tested with next-next.

Lino Sanfilippo (3):
  net: dsa: introduce function dsa_tree_shutdown()
  net: dsa: microchip: provide the function ksz_switch_shutdown()
  net: dsa: microchip: tear down DSA tree at system shutdown

 drivers/net/dsa/microchip/ksz9477.c    | 12 +++++++++++-
 drivers/net/dsa/microchip/ksz_common.c | 13 +++++++++++++
 drivers/net/dsa/microchip/ksz_common.h |  1 +
 include/net/dsa.h                      |  1 +
 net/dsa/dsa2.c                         |  8 ++++++++
 5 files changed, 34 insertions(+), 1 deletion(-)


base-commit: 626bf91a292e2035af5b9d9cce35c5c138dfe06d

Comments

Vladimir Oltean Sept. 9, 2021, 10:14 a.m. UTC | #1
On Thu, Sep 09, 2021 at 11:53:21AM +0200, Lino Sanfilippo wrote:
> This patch series fixes a system hang I got each time i tried to shutdown
> or reboot a system that uses a KSZ9897 as a DSA switch with a broadcom
> GENET network device as the DSA master device. At the time the system hangs
> the message "unregister_netdevice: waiting for eth0 to become free. Usage
> count = 2." is dumped periodically to the console.
> 
> After some investigation I found the reason to be unreleased references to
> the master device which are still held by the slave devices at the time the
> system is shut down (I have two slave devices in use).
> 
> While these references are supposed to be released in ksz_switch_remove()
> this function never gets the chance to be called due to the system hang at
> the master device deregistration which happens before ksz_switch_remove()
> is called.
> 
> The fix is to make sure that the master device references are already
> released when the device is unregistered. For this reason PATCH1 provides
> a new function dsa_tree_shutdown() that can be called by DSA drivers to
> untear the DSA switch at shutdown. PATCH2 uses this function in a new
> helper function for KSZ switches to properly shutdown the KSZ switch.
> PATCH 3 uses the new helper function in the KSZ9477 shutdown handler.
> 
> Theses patches have been tested on a Raspberry PI 5.10 kernel with a
> KSZ9897. The patches have been adjusted to apply against net-next and are
> compile tested with next-next.

Can you try this patch

commit 07b90056cb15ff9877dca0d8f1b6583d1051f724
Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Tue Jan 12 01:09:43 2021 +0200

    net: dsa: unbind all switches from tree when DSA master unbinds

    Currently the following happens when a DSA master driver unbinds while
    there are DSA switches attached to it:
Lino Sanfilippo Sept. 9, 2021, 11:08 a.m. UTC | #2
Hi,

On 09.09.21 at 12:14, Vladimir Oltean wrote:
>
> Can you try this patch
>
> commit 07b90056cb15ff9877dca0d8f1b6583d1051f724
> Author: Vladimir Oltean <vladimir.oltean@nxp.com>
> Date:   Tue Jan 12 01:09:43 2021 +0200
>
>     net: dsa: unbind all switches from tree when DSA master unbinds
>
>     Currently the following happens when a DSA master driver unbinds while
>     there are DSA switches attached to it:
>

This patch is already part of the kernel which shows the described shutdown issues.

Regards,
Lino
Vladimir Oltean Sept. 9, 2021, 11:42 a.m. UTC | #3
On Thu, Sep 09, 2021 at 01:08:26PM +0200, Lino Sanfilippo wrote:
> Hi,
> 
> On 09.09.21 at 12:14, Vladimir Oltean wrote:
> >
> > Can you try this patch
> >
> > commit 07b90056cb15ff9877dca0d8f1b6583d1051f724
> > Author: Vladimir Oltean <vladimir.oltean@nxp.com>
> > Date:   Tue Jan 12 01:09:43 2021 +0200
> >
> >     net: dsa: unbind all switches from tree when DSA master unbinds
> >
> >     Currently the following happens when a DSA master driver unbinds while
> >     there are DSA switches attached to it:
> >
> 
> This patch is already part of the kernel which shows the described shutdown issues.

How can I reproduce this issue?

When I test with sysrq-o, I do see the various DSA trees getting torn
down:

[   16.731468] sysrq: HELP : loglevel(0-9) reboot(b) crash(c) show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z)
[   29.912535] sysrq: Power Off
[   29.917806] kvm: exiting hardware virtualization
[   29.988036] device swp0 left promiscuous mode
[   30.370424] sja1105 spi2.0: Link is Down
[   30.402790] DSA: tree 1 torn down
[   30.495096] device swp2 left promiscuous mode
[   31.011576] sja1105 spi2.1: Link is Down
[   31.032925] DSA: tree 2 torn down
[   31.074226] reboot: Power down

I feel that something is missing in your system. Is the device link
created? Is it deleted before going into effect on shutdown?
Andrew Lunn Sept. 9, 2021, 12:40 p.m. UTC | #4
On Thu, Sep 09, 2021 at 11:53:21AM +0200, Lino Sanfilippo wrote:
> This patch series fixes a system hang I got each time i tried to shutdown
> or reboot a system that uses a KSZ9897 as a DSA switch with a broadcom
> GENET network device as the DSA master device. At the time the system hangs
> the message "unregister_netdevice: waiting for eth0 to become free. Usage
> count = 2." is dumped periodically to the console.
> 
> After some investigation I found the reason to be unreleased references to
> the master device which are still held by the slave devices at the time the
> system is shut down (I have two slave devices in use).
> 
> While these references are supposed to be released in ksz_switch_remove()
> this function never gets the chance to be called due to the system hang at
> the master device deregistration which happens before ksz_switch_remove()
> is called.
> 
> The fix is to make sure that the master device references are already
> released when the device is unregistered. For this reason PATCH1 provides
> a new function dsa_tree_shutdown() that can be called by DSA drivers to
> untear the DSA switch at shutdown. PATCH2 uses this function in a new
> helper function for KSZ switches to properly shutdown the KSZ switch.
> PATCH 3 uses the new helper function in the KSZ9477 shutdown handler.

I agree with Vladimir here. Shutdown works without issue on mv88e6xxx,
i do it frequently. I'm sure other developers shutdown there devices
at the end of the edit/compile/test cycle. If there was a generic
problem, we would probably know about it. So it seems like there is
something specific to your system which breaks the reference
counting. We need to understand that first, then we can see how we fix
it.

> 
> Theses patches have been tested on a Raspberry PI 5.10 kernel with a
> KSZ9897. The patches have been adjusted to apply against net-next and are
> compile tested with next-next.

Is the switch on a hat? Are you using DT overlays?

   Andrew
Vladimir Oltean Sept. 9, 2021, 12:56 p.m. UTC | #5
On Thu, Sep 09, 2021 at 02:42:48PM +0300, Vladimir Oltean wrote:
> I feel that something is missing in your system. Is the device link
> created? Is it deleted before going into effect on shutdown?

So in case my questions were confusing, you can check the presence of
the device links via sysfs.

On my board, eno2 is the top-level DSA master, there is a switch which
is PCIe PF 0000:00:00.5 which is its consumer:

ls -la /sys/class/net/eno2/device/consumer\:pci\:0000\:00\:00.5
lrwxrwxrwx    1 root     root             0 Jan  1 00:00 /sys/class/net/eno2/device/consumer:pci:0000:00:00.5 -> ../../../../../virtual/devlink/pci:0000:00:00.2--pci:0000:00:00.5

In turn, that switch is a DSA master on two ports for SPI-attached switches:

ls -la /sys/class/net/swp0/device/consumer\:spi\:spi2.*
lrwxrwxrwx    1 root     root             0 Jan  1 00:04 /sys/class/net/swp0/device/consumer:spi:spi2.0 -> ../../../../../virtual/devlink/pci:0000:00:00.5--spi:spi2.0
lrwxrwxrwx    1 root     root             0 Jan  1 00:04 /sys/class/net/swp0/device/consumer:spi:spi2.1 -> ../../../../../virtual/devlink/pci:0000:00:00.5--spi:spi2.1

Do you see similar things on your 5.10 kernel?

Please note that I don't think that particular patch with device links
was backported to v5.10, at least I don't see it when I run:

git tag --contains  07b90056cb15f

So how did it reach your tree?
Lino Sanfilippo Sept. 9, 2021, 1:19 p.m. UTC | #6
Hi Vladimir, Andrew,

sorry for the late response.

> Gesendet: Donnerstag, 09. September 2021 um 14:56 Uhr
> Von: "Vladimir Oltean" <olteanv@gmail.com>
> An: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>
> Cc: p.rosenberger@kunbus.com, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, andrew@lunn.ch, vivien.didelot@gmail.com, f.fainelli@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
> Betreff: Re: [PATCH 0/3] Fix for KSZ DSA switch shutdown
>
> On Thu, Sep 09, 2021 at 02:42:48PM +0300, Vladimir Oltean wrote:
> > I feel that something is missing in your system. Is the device link
> > created? Is it deleted before going into effect on shutdown?
>
> So in case my questions were confusing, you can check the presence of
> the device links via sysfs.
>
> On my board, eno2 is the top-level DSA master, there is a switch which
> is PCIe PF 0000:00:00.5 which is its consumer:
>
> ls -la /sys/class/net/eno2/device/consumer\:pci\:0000\:00\:00.5
> lrwxrwxrwx    1 root     root             0 Jan  1 00:00 /sys/class/net/eno2/device/consumer:pci:0000:00:00.5 -> ../../../../../virtual/devlink/pci:0000:00:00.2--pci:0000:00:00.5
>
> In turn, that switch is a DSA master on two ports for SPI-attached switches:
>
> ls -la /sys/class/net/swp0/device/consumer\:spi\:spi2.*
> lrwxrwxrwx    1 root     root             0 Jan  1 00:04 /sys/class/net/swp0/device/consumer:spi:spi2.0 -> ../../../../../virtual/devlink/pci:0000:00:00.5--spi:spi2.0
> lrwxrwxrwx    1 root     root             0 Jan  1 00:04 /sys/class/net/swp0/device/consumer:spi:spi2.1 -> ../../../../../virtual/devlink/pci:0000:00:00.5--spi:spi2.1
>
> Do you see similar things on your 5.10 kernel?

For the master device is see

lrwxrwxrwx 1 root root 0 Sep  9 14:10 /sys/class/net/eth0/device/consumer:spi:spi3.0 -> ../../../virtual/devlink/platform:fd580000.ethernet--spi:spi3.0


>
> Please note that I don't think that particular patch with device links
> was backported to v5.10, at least I don't see it when I run:


> git tag --contains  07b90056cb15f
>
> So how did it reach your tree?
>

The kernel I use is the Raspberry Pi 5.10 kernel. The commit number in this kernel is d0b97c8cd63e37e6d4dc9fefd6381b09f6c31a67

Andrew: the switch is not on a hat, the device tree part I use is:



       spi@7e204c00 {
            cs-gpios = <0x0000000f 0x00000010 0x00000001 0x0000000f 0x00000012 0x00000001>;
            pinctrl-0 = <0x000000e5 0x000000e6>;
            pinctrl-names = "default";
            compatible = "brcm,bcm2835-spi";
            reg = <0x7e204c00 0x00000200>;
            interrupts = <0x00000000 0x00000076 0x00000004>;
            clocks = <0x00000007 0x00000014>;
            #address-cells = <0x00000001>;
            #size-cells = <0x00000000>;
            status = "okay";
            phandle = <0x000000be>;
            tpm@1 {
                phandle = <0x000000ed>;
                status = "okay";
                interrupts = <0x0000000a 0x00000008>;
                #interrupt-cells = <0x00000002>;
                interrupt-parent = <0x0000000f>;
                spi-max-frequency = <0x000f4240>;
                reg = <0x00000001>;
                pinctrl-0 = <0x000000e7>;
                pinctrl-names = "default";
                compatible = "infineon,slb9670";
            };
            ksz9897@0 {
                phandle = <0x000000ec>;
                status = "okay";
                reset-gpios = <0x000000e1 0x0000000d 0x00000001>;
                spi-cpol;
                spi-cpha;
                spi-max-frequency = <0x01f78a40>;
                reg = <0x00000000>;
                compatible = "microchip,ksz9897";
                ports {
                    #size-cells = <0x00000000>;
                    #address-cells = <0x00000001>;
                    port@2 {
                        label = "piright";
                        reg = <0x00000002>;
                    };
                    port@1 {
                        label = "pileft";
                        reg = <0x00000001>;
                    };
                    port@0 {
                        ethernet = <0x000000d7>;
                        label = "cpu";
                        reg = <0x00000000>;
                    };
                };
            };

Regards,
Lino
Lino Sanfilippo Sept. 9, 2021, 2:29 p.m. UTC | #7
On 09.09.21 at 15:19, Lino Sanfilippo wrote:

>>
>
> The kernel I use is the Raspberry Pi 5.10 kernel. The commit number in this kernel is d0b97c8cd63e37e6d4dc9fefd6381b09f6c31a67
>

This is not correct. The kernel I use right now is based on Gregs stable linux-5.10.y.
The commit number is correct here. Sorry for the confusion.

Regards,
Lino
Andrew Lunn Sept. 9, 2021, 3:11 p.m. UTC | #8
> Andrew: the switch is not on a hat, the device tree part I use is:

And this is not an overlay. It is all there at boot?

I was just thinking that maybe the Ethernet interface gets opened at
boot, and overlay is loaded, and the interface is opened a second
time. I don't know of anybody using DSA with overlays, so that could
of been the key difference which breaks it for you.

Your decompiled DT blob looks O.K.

    Andrew
Andrew Lunn Sept. 9, 2021, 3:17 p.m. UTC | #9
> This is not correct. The kernel I use right now is based on Gregs stable linux-5.10.y.
> The commit number is correct here. Sorry for the confusion.

Can you use 5.14.2?

When we understand the problem, the fixes will need to be for
net-next, which will be based on 5.15-rcX. They will then be
backported to 5.10. So you need to do some testing on a newer
kernel. Such testing will also help us figure out if it is a new
problem, or a backporting problem.

	 Andrew
Vladimir Oltean Sept. 9, 2021, 3:47 p.m. UTC | #10
On Thu, Sep 09, 2021 at 03:19:52PM +0200, Lino Sanfilippo wrote:
> > Do you see similar things on your 5.10 kernel?
> 
> For the master device is see
> 
> lrwxrwxrwx 1 root root 0 Sep  9 14:10 /sys/class/net/eth0/device/consumer:spi:spi3.0 -> ../../../virtual/devlink/platform:fd580000.ethernet--spi:spi3.0

So this is the worst of the worst, we have a device link but it doesn't help.

Where the device link helps is here:

__device_release_driver
	while (device_links_busy(dev))
		device_links_unbind_consumers(dev);

but during dev_shutdown, device_links_unbind_consumers does not get called
(actually I am not even sure whether it should).

I've reproduced your issue by making this very simple change:

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 60d94e0a07d6..ec00f34cac47 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -1372,6 +1372,7 @@ static struct pci_driver enetc_pf_driver = {
 	.id_table = enetc_pf_id_table,
 	.probe = enetc_pf_probe,
 	.remove = enetc_pf_remove,
+	.shutdown = enetc_pf_remove,
 #ifdef CONFIG_PCI_IOV
 	.sriov_configure = enetc_sriov_configure,
 #endif

on my DSA master driver. This is what the genet driver has "special".

I was led into grave error by Documentation/driver-api/device_link.rst,
which I've based my patch on, where it clearly says that device links
are supposed to help with shutdown ordering (how?!).

So the question is, why did my DSA trees get torn down on shutdown?
Basically the short answer is that my SPI controller driver does
implement .shutdown, and calls the same code path as the .remove code,
which calls spi_unregister_controller which removes all SPI children..

When I added this device link, one of the main objectives was to not
modify all DSA drivers. I was certain based on the documentation that
device links would help, now I'm not so sure anymore.

So what happens is that the DSA master attempts to unregister its net
device on .shutdown, but DSA does not implement .shutdown, so it just
sits there holding a reference (supposedly via dev_hold, but where from?!)
to the master, which makes netdev_wait_allrefs to wait and wait.

I need more time for the denial phase to pass, and to understand what
can actually be done. I will also be away from the keyboard for the next
few days, so it might take a while. Your patches obviously offer a
solution only for KSZ switches, we need something more general. If I
understand your solution, it works not by virtue of there being any
shutdown ordering guarantee at all, but simply due to the fact that
DSA's .shutdown hook gets called eventually, and the reference to the
master gets freed eventually, which unblocks the unregister_netdevice
call from the master. I don't yet understand why DSA holds a long-term
reference to the master, that's one thing I need to figure out.
Florian Fainelli Sept. 9, 2021, 4 p.m. UTC | #11
+Saravana,

On 9/9/2021 8:47 AM, Vladimir Oltean wrote:
> On Thu, Sep 09, 2021 at 03:19:52PM +0200, Lino Sanfilippo wrote:
>>> Do you see similar things on your 5.10 kernel?
>>
>> For the master device is see
>>
>> lrwxrwxrwx 1 root root 0 Sep  9 14:10 /sys/class/net/eth0/device/consumer:spi:spi3.0 -> ../../../virtual/devlink/platform:fd580000.ethernet--spi:spi3.0
> 
> So this is the worst of the worst, we have a device link but it doesn't help.
> 
> Where the device link helps is here:
> 
> __device_release_driver
> 	while (device_links_busy(dev))
> 		device_links_unbind_consumers(dev);
> 
> but during dev_shutdown, device_links_unbind_consumers does not get called
> (actually I am not even sure whether it should).
> 
> I've reproduced your issue by making this very simple change:
> 
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> index 60d94e0a07d6..ec00f34cac47 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> @@ -1372,6 +1372,7 @@ static struct pci_driver enetc_pf_driver = {
>   	.id_table = enetc_pf_id_table,
>   	.probe = enetc_pf_probe,
>   	.remove = enetc_pf_remove,
> +	.shutdown = enetc_pf_remove,
>   #ifdef CONFIG_PCI_IOV
>   	.sriov_configure = enetc_sriov_configure,
>   #endif
> 
> on my DSA master driver. This is what the genet driver has "special".
> 
> I was led into grave error by Documentation/driver-api/device_link.rst,
> which I've based my patch on, where it clearly says that device links
> are supposed to help with shutdown ordering (how?!).

I was also under the impression that device links were supposed to help 
with shutdown ordering, because it does matter a lot. One thing that I 
had to work before (and seems like it came back recently) is the 
shutdown ordering between gpio_keys.c and the GPIO controller. If you 
suspend the GPIO controller first, gpio_keys.c never gets a chance to 
keep the GPIO pin configured for a wake-up interrupt, therefore no 
wake-up event happens on key presses, whoops.

> 
> So the question is, why did my DSA trees get torn down on shutdown?
> Basically the short answer is that my SPI controller driver does
> implement .shutdown, and calls the same code path as the .remove code,
> which calls spi_unregister_controller which removes all SPI children..
> 
> When I added this device link, one of the main objectives was to not
> modify all DSA drivers. I was certain based on the documentation that
> device links would help, now I'm not so sure anymore.
> 
> So what happens is that the DSA master attempts to unregister its net
> device on .shutdown, but DSA does not implement .shutdown, so it just
> sits there holding a reference (supposedly via dev_hold, but where from?!)
> to the master, which makes netdev_wait_allrefs to wait and wait.

It's not coming from of_find_net_device_by_node() that's for sure and 
with OF we don't go through the code path calling 
dsa_dev_to_net_device() which does call dev_hold() and then shortly 
thereafter the caller calls dev_put() anyway.

> 
> I need more time for the denial phase to pass, and to understand what
> can actually be done. I will also be away from the keyboard for the next
> few days, so it might take a while. Your patches obviously offer a
> solution only for KSZ switches, we need something more general. If I
> understand your solution, it works not by virtue of there being any
> shutdown ordering guarantee at all, but simply due to the fact that
> DSA's .shutdown hook gets called eventually, and the reference to the
> master gets freed eventually, which unblocks the unregister_netdevice
> call from the master. I don't yet understand why DSA holds a long-term
> reference to the master, that's one thing I need to figure out.
> 

Agreed.
Lino Sanfilippo Sept. 9, 2021, 4:37 p.m. UTC | #12
On 09.09.21 at 17:47, Vladimir Oltean wrote:
> On Thu, Sep 09, 2021 at 03:19:52PM +0200, Lino Sanfilippo wrote:
>>> Do you see similar things on your 5.10 kernel?
>>
>> For the master device is see
>>
>> lrwxrwxrwx 1 root root 0 Sep  9 14:10 /sys/class/net/eth0/device/consumer:spi:spi3.0 -> ../../../virtual/devlink/platform:fd580000.ethernet--spi:spi3.0
>
> So this is the worst of the worst, we have a device link but it doesn't help.
>
> Where the device link helps is here:
>
> __device_release_driver
> 	while (device_links_busy(dev))
> 		device_links_unbind_consumers(dev);
>
> but during dev_shutdown, device_links_unbind_consumers does not get called
> (actually I am not even sure whether it should).
>
> I've reproduced your issue by making this very simple change:
>
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> index 60d94e0a07d6..ec00f34cac47 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> @@ -1372,6 +1372,7 @@ static struct pci_driver enetc_pf_driver = {
>  	.id_table = enetc_pf_id_table,
>  	.probe = enetc_pf_probe,
>  	.remove = enetc_pf_remove,
> +	.shutdown = enetc_pf_remove,
>  #ifdef CONFIG_PCI_IOV
>  	.sriov_configure = enetc_sriov_configure,
>  #endif
>
> on my DSA master driver. This is what the genet driver has "special".
>

Ah, that is interesting.

> I was led into grave error by Documentation/driver-api/device_link.rst,
> which I've based my patch on, where it clearly says that device links
> are supposed to help with shutdown ordering (how?!).
>
> So the question is, why did my DSA trees get torn down on shutdown?
> Basically the short answer is that my SPI controller driver does
> implement .shutdown, and calls the same code path as the .remove code,
> which calls spi_unregister_controller which removes all SPI children..
>
> When I added this device link, one of the main objectives was to not
> modify all DSA drivers. I was certain based on the documentation that
> device links would help, now I'm not so sure anymore.
>
> So what happens is that the DSA master attempts to unregister its net
> device on .shutdown, but DSA does not implement .shutdown, so it just
> sits there holding a reference (supposedly via dev_hold, but where from?!)
> to the master, which makes netdev_wait_allrefs to wait and wait.
>

Right, that was also my conclusion.

> I need more time for the denial phase to pass, and to understand what
> can actually be done. I will also be away from the keyboard for the next
> few days, so it might take a while. Your patches obviously offer a
> solution only for KSZ switches, we need something more general. If I
> understand your solution, it works not by virtue of there being any
> shutdown ordering guarantee at all, but simply due to the fact that
> DSA's .shutdown hook gets called eventually, and the reference to the
> master gets freed eventually, which unblocks the unregister_netdevice
> call from the master.

Well actually the SPI shutdown hook gets called which then calls ksz9477_shutdown
(formerly ksz9477_reset_switch) which then shuts down the switch by
stopping the worker thread and tearing down the DSA tree (via dsa_tree_shutdown()).

While it is right that the patch series only fixes the KSZ case for now, the idea was that
other drivers could use a similar approach in by calling the new function dsa_tree_shutdown()
in their shutdown handler to make sure that all refs to the master device are released.


Regards,
Lino
Lino Sanfilippo Sept. 9, 2021, 4:41 p.m. UTC | #13
On 09.09.21 at 17:17, Andrew Lunn wrote:
>> This is not correct. The kernel I use right now is based on Gregs stable linux-5.10.y.
>> The commit number is correct here. Sorry for the confusion.
>
> Can you use 5.14.2?
>
> When we understand the problem, the fixes will need to be for
> net-next, which will be based on 5.15-rcX. They will then be
> backported to 5.10. So you need to do some testing on a newer
> kernel. Such testing will also help us figure out if it is a new
> problem, or a backporting problem.
>
> 	 Andrew
>


You are right, I will try to switch to a newer kernel to test future DSA issue like that
(I have the impression that DSA is an area in which a lot of progress is done from
one kernel version to the next).

Regards,
Lino
Florian Fainelli Sept. 9, 2021, 4:44 p.m. UTC | #14
On 9/9/2021 9:37 AM, Lino Sanfilippo wrote:
> On 09.09.21 at 17:47, Vladimir Oltean wrote:
>> On Thu, Sep 09, 2021 at 03:19:52PM +0200, Lino Sanfilippo wrote:
>>>> Do you see similar things on your 5.10 kernel?
>>>
>>> For the master device is see
>>>
>>> lrwxrwxrwx 1 root root 0 Sep  9 14:10 /sys/class/net/eth0/device/consumer:spi:spi3.0 -> ../../../virtual/devlink/platform:fd580000.ethernet--spi:spi3.0
>>
>> So this is the worst of the worst, we have a device link but it doesn't help.
>>
>> Where the device link helps is here:
>>
>> __device_release_driver
>> 	while (device_links_busy(dev))
>> 		device_links_unbind_consumers(dev);
>>
>> but during dev_shutdown, device_links_unbind_consumers does not get called
>> (actually I am not even sure whether it should).
>>
>> I've reproduced your issue by making this very simple change:
>>
>> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
>> index 60d94e0a07d6..ec00f34cac47 100644
>> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
>> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
>> @@ -1372,6 +1372,7 @@ static struct pci_driver enetc_pf_driver = {
>>   	.id_table = enetc_pf_id_table,
>>   	.probe = enetc_pf_probe,
>>   	.remove = enetc_pf_remove,
>> +	.shutdown = enetc_pf_remove,
>>   #ifdef CONFIG_PCI_IOV
>>   	.sriov_configure = enetc_sriov_configure,
>>   #endif
>>
>> on my DSA master driver. This is what the genet driver has "special".
>>
> 
> Ah, that is interesting.
> 
>> I was led into grave error by Documentation/driver-api/device_link.rst,
>> which I've based my patch on, where it clearly says that device links
>> are supposed to help with shutdown ordering (how?!).
>>
>> So the question is, why did my DSA trees get torn down on shutdown?
>> Basically the short answer is that my SPI controller driver does
>> implement .shutdown, and calls the same code path as the .remove code,
>> which calls spi_unregister_controller which removes all SPI children..
>>
>> When I added this device link, one of the main objectives was to not
>> modify all DSA drivers. I was certain based on the documentation that
>> device links would help, now I'm not so sure anymore.
>>
>> So what happens is that the DSA master attempts to unregister its net
>> device on .shutdown, but DSA does not implement .shutdown, so it just
>> sits there holding a reference (supposedly via dev_hold, but where from?!)
>> to the master, which makes netdev_wait_allrefs to wait and wait.
>>
> 
> Right, that was also my conclusion.
> 
>> I need more time for the denial phase to pass, and to understand what
>> can actually be done. I will also be away from the keyboard for the next
>> few days, so it might take a while. Your patches obviously offer a
>> solution only for KSZ switches, we need something more general. If I
>> understand your solution, it works not by virtue of there being any
>> shutdown ordering guarantee at all, but simply due to the fact that
>> DSA's .shutdown hook gets called eventually, and the reference to the
>> master gets freed eventually, which unblocks the unregister_netdevice
>> call from the master.
> 
> Well actually the SPI shutdown hook gets called which then calls ksz9477_shutdown
> (formerly ksz9477_reset_switch) which then shuts down the switch by
> stopping the worker thread and tearing down the DSA tree (via dsa_tree_shutdown()).
> 
> While it is right that the patch series only fixes the KSZ case for now, the idea was that
> other drivers could use a similar approach in by calling the new function dsa_tree_shutdown()
> in their shutdown handler to make sure that all refs to the master device are released.

It does not scale really well to have individual drivers call 
dsa_tree_shutdown() in their respective .shutdown callback, and in a 
multi-switch configuration, I am not sure what the results would look like.

In premise, each driver ought to be able to call 
dsa_unregister_switch(), along with all of the driver specific shutdown 
and eventually, given proper device ordering the DSA tree would get 
automatically torn down, and then the DSA master's .shutdown() callback 
would be called.

FWIW, the reason why we call .shutdown() in bcmgenet is to turn off DMA 
and clocks, which matters for kexec (DMA) as well as power savings (S5 
mode).
Lino Sanfilippo Sept. 9, 2021, 4:46 p.m. UTC | #15
On 09.09.21 at 17:11, Andrew Lunn wrote:
>> Andrew: the switch is not on a hat, the device tree part I use is:
>
> And this is not an overlay. It is all there at boot?
>

Well actually we DO use an overlay. The dev tree snipped I posted was an excerpt form
fdtdump. The concerning fragment looks like this in the overlay file:


        fragment@4 {
                target = <&spi6>;
                __overlay__ {
                        #address-cells = <1>;
                        #size-cells = <0>;
                        pinctrl-names = "default";
                        pinctrl-0 = <&spi6_pins>, <&spi6_cs_pins>;
                        cs-gpios = <&gpio 16 GPIO_ACTIVE_LOW>,
                                   <&gpio 18 GPIO_ACTIVE_LOW>;
                        status = "okay";

                        ksz9897: ksz9897@0 {
                                compatible = "microchip,ksz9897";
                                reg = <0>;
                                spi-max-frequency = <50000000>;
                                spi-cpha;
                                spi-cpol;
                                reset-gpios = <&expander_core 13 GPIO_ACTIVE_LOW>;
                                status = "okay";

                                ports {
                                        #address-cells = <1>;
                                        #size-cells = <0>;
                                        /* PORT 1 */
                                        port@0 {
                                                reg = <0>;
                                                label = "cpu";
                                                ethernet = <&genet>;
                                        };
                                        /* PORT 2 */
                                        port@1 {
                                                reg = <1>;
                                                label = "pileft";
                                        };
                                        /* PORT 3 */
                                        port@2 {
                                                reg = <2>;
                                                label = "piright";
                                        };
                                        /*
                                         * PORT 4-7 unused
                                         */
                                };
                        };

                        tpm: tpm@1 {
                                compatible = "infineon,slb9670";
                                pinctrl-names = "default";
                                pinctrl-0 = <&tpm_pins>;
                                reg = <1>;
                                spi-max-frequency = <1000000>;
                                interrupt-parent = <&gpio>;
                                #interrupt-cells = <2>;
                                interrupts = <10 IRQ_TYPE_LEVEL_LOW>;
                                status = "okay";
                        };
                };
        };


But probably this does not matter any more now
that Vladimir was able to reproduce the issue.

> I was just thinking that maybe the Ethernet interface gets opened at
> boot, and overlay is loaded, and the interface is opened a second
> time. I don't know of anybody using DSA with overlays, so that could
> of been the key difference which breaks it for you.
>
> Your decompiled DT blob looks O.K.
>
>     Andrew
>
Lino Sanfilippo Sept. 9, 2021, 5:07 p.m. UTC | #16
On 09.09.21 at 18:44, Florian Fainelli wrote:
>
>
> On 9/9/2021 9:37 AM, Lino Sanfilippo wrote:
>> On 09.09.21 at 17:47, Vladimir Oltean wrote:
>>> On Thu, Sep 09, 2021 at 03:19:52PM +0200, Lino Sanfilippo wrote:
>>>>> Do you see similar things on your 5.10 kernel?
>>>>
>>>> For the master device is see
>>>>
>>>> lrwxrwxrwx 1 root root 0 Sep  9 14:10 /sys/class/net/eth0/device/consumer:spi:spi3.0 -> ../../../virtual/devlink/platform:fd580000.ethernet--spi:spi3.0
>>>
>>> So this is the worst of the worst, we have a device link but it doesn't help.
>>>
>>> Where the device link helps is here:
>>>
>>> __device_release_driver
>>>     while (device_links_busy(dev))
>>>         device_links_unbind_consumers(dev);
>>>
>>> but during dev_shutdown, device_links_unbind_consumers does not get called
>>> (actually I am not even sure whether it should).
>>>
>>> I've reproduced your issue by making this very simple change:
>>>
>>> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
>>> index 60d94e0a07d6..ec00f34cac47 100644
>>> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
>>> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
>>> @@ -1372,6 +1372,7 @@ static struct pci_driver enetc_pf_driver = {
>>>       .id_table = enetc_pf_id_table,
>>>       .probe = enetc_pf_probe,
>>>       .remove = enetc_pf_remove,
>>> +    .shutdown = enetc_pf_remove,
>>>   #ifdef CONFIG_PCI_IOV
>>>       .sriov_configure = enetc_sriov_configure,
>>>   #endif
>>>
>>> on my DSA master driver. This is what the genet driver has "special".
>>>
>>
>> Ah, that is interesting.
>>
>>> I was led into grave error by Documentation/driver-api/device_link.rst,
>>> which I've based my patch on, where it clearly says that device links
>>> are supposed to help with shutdown ordering (how?!).
>>>
>>> So the question is, why did my DSA trees get torn down on shutdown?
>>> Basically the short answer is that my SPI controller driver does
>>> implement .shutdown, and calls the same code path as the .remove code,
>>> which calls spi_unregister_controller which removes all SPI children..
>>>
>>> When I added this device link, one of the main objectives was to not
>>> modify all DSA drivers. I was certain based on the documentation that
>>> device links would help, now I'm not so sure anymore.
>>>
>>> So what happens is that the DSA master attempts to unregister its net
>>> device on .shutdown, but DSA does not implement .shutdown, so it just
>>> sits there holding a reference (supposedly via dev_hold, but where from?!)
>>> to the master, which makes netdev_wait_allrefs to wait and wait.
>>>
>>
>> Right, that was also my conclusion.
>>
>>> I need more time for the denial phase to pass, and to understand what
>>> can actually be done. I will also be away from the keyboard for the next
>>> few days, so it might take a while. Your patches obviously offer a
>>> solution only for KSZ switches, we need something more general. If I
>>> understand your solution, it works not by virtue of there being any
>>> shutdown ordering guarantee at all, but simply due to the fact that
>>> DSA's .shutdown hook gets called eventually, and the reference to the
>>> master gets freed eventually, which unblocks the unregister_netdevice
>>> call from the master.
>>
>> Well actually the SPI shutdown hook gets called which then calls ksz9477_shutdown
>> (formerly ksz9477_reset_switch) which then shuts down the switch by
>> stopping the worker thread and tearing down the DSA tree (via dsa_tree_shutdown()).
>>
>> While it is right that the patch series only fixes the KSZ case for now, the idea was that
>> other drivers could use a similar approach in by calling the new function dsa_tree_shutdown()
>> in their shutdown handler to make sure that all refs to the master device are released.
> It does not scale really well to have individual drivers call dsa_tree_shutdown() in their respective .shutdown callback, and in a multi-switch configuration, I am not sure what the results would look like.
>
> In premise, each driver ought to be able to call dsa_unregister_switch(), along with all of the driver specific shutdown and eventually, given proper device ordering the DSA tree would get automatically torn down, and then the DSA master's .shutdown() callback would be called.
>
> FWIW, the reason why we call .shutdown() in bcmgenet is to turn off DMA and clocks, which matters for kexec (DMA) as well as power savings (S5 mode).

I agree with the scalability. Concerning the multi-switch case I dont know about the possible issues (I am quite new to working with DSA).
So lets wait for Vladimirs solution.

Regards,
Lino
Andrew Lunn Sept. 9, 2021, 5:55 p.m. UTC | #17
On Thu, Sep 09, 2021 at 06:46:49PM +0200, Lino Sanfilippo wrote:
> On 09.09.21 at 17:11, Andrew Lunn wrote:
> >> Andrew: the switch is not on a hat, the device tree part I use is:
> >
> > And this is not an overlay. It is all there at boot?
> >
> 
> Well actually we DO use an overlay. The dev tree snipped I posted was an excerpt form
> fdtdump. The concerning fragment looks like this in the overlay file:

Thanks for the information. Good to know somebody is using DSA like
this. The device tree description can be quite complex, especially for
some of the other switches.

> But probably this does not matter any more now that Vladimir was
> able to reproduce the issue.

Agreed.

	Andrew
Vladimir Oltean Sept. 9, 2021, 10:54 p.m. UTC | #18
On Thu, Sep 09, 2021 at 07:07:33PM +0200, Lino Sanfilippo wrote:
> > It does not scale really well to have individual drivers call
> > dsa_tree_shutdown() in their respective .shutdown callback, and in a
> > multi-switch configuration, I am not sure what the results would
> > look like.
> >
> > In premise, each driver ought to be able to call
> > dsa_unregister_switch(), along with all of the driver specific
> > shutdown and eventually, given proper device ordering the DSA tree
> > would get automatically torn down, and then the DSA master's
> > .shutdown() callback would be called.
> >
> > FWIW, the reason why we call .shutdown() in bcmgenet is to turn off
> > DMA and clocks, which matters for kexec (DMA) as well as power
> > savings (S5 mode).
>
> I agree with the scalability. Concerning the multi-switch case I dont
> know about the possible issues (I am quite new to working with DSA).
> So lets wait for Vladimirs solution.

I'm back for now and was able to spend a bit more time and understand
what is happening.

So first things first: why does DSA call dev_hold long-term on the
master, and where from?

Answer: it does so since commit 2f1e8ea726e9 ("net: dsa: link interfaces
with the DSA master to get rid of lockdep warnings"), see this call path:

dsa_slave_create
-> netdev_upper_dev_link
   -> __netdev_upper_dev_link
      -> __netdev_adjacent_dev_insert
         -> dev_hold

Ok, so since DSA holds a reference to the master interface, it is
natural that unregister_netdevice() will not finish, and it will hang
the system.

Question 2: why does bcmgenet need to unregister the net device on
shutdown?

See Florian's answer, it doesn't, strictly speaking, it just needs to
turn off the DMA and some clocks.

Question 3: can we revert commit 2f1e8ea726e9?

Answer: not so easily, we are looking at >10 commits to revert, and find
other solutions to some problems. We have built in the meantime on top
of the fact that there is an upper/lower relationship between DSA user
ports and the DSA master.

Question 4: how do other stacked interfaces deal with this?

Answer: as I said in the commit message of 2f1e8ea726e9, DSA is not
VLAN, DSA has unique challenges of its own, like a tree of struct
devices to manage, with their own lifetime. So what other drivers do is
not really relevant. Anyway, to entertain the question: VLAN watches the
NETDEV_UNREGISTER event emitted on the netdev notifier chain for its
real_dev, and effectively unregisters itself. Now this is exactly why it
is irrelevant, we can watch for NETDEV_UNREGISTER on the DSA master, but
then what? There is nothing sensible to do. Consider that in the master
unbind case (not shutdown), both the NETDEV_UNREGISTER code path will
execute, and the unbind of the DSA switch itself, due to that device
link. But let's say we delete the device link and leave only the
NETDEV_UNREGISTER code path to do something. What?
device_release_driver(ds->dev), most probably. That would effectively
force the DSA unbind path. But surprise: the DSA unbind path takes the
rtnl_mutex from quite a couple of places, and we are already under the
rtnl_lock (held by the netdev notifier chain). So, unless we schedule
the DSA device driver detach, there is an impending deadlock.
Ok, let's entertain even that: detach the DSA driver in a scheduled work
item, with the rtnl_lock not held. First off, we will trigger again the
WARN_ON solved by commit 2f1e8ea726e9 (because the unregistering of the
DSA master has "completed", but it still has an upper interface - us),
and secondly, the unregister_netdev function will have already deleted
stuff belonging to the DSA master, namely its sysfs entries. But DSA
also touches the master's sysfs, namely the "tagging" file. So NULL
pointer dereference on the master's sysfs.
So very simply put, DSA cannot unbind itself from the switch device when
the master net device unregisters. The best case scenario would be for
DSA to unbind _before_ the net device even unregisters. That was the
whole point of my attempt with the device links, to ensure shutdown
_ordering_.

Question 5: can the device core actually be patched to call
device_links_unbind_consumers() from device_shutdown()? This would
actually simplify DSA's options, and make the device links live up to
their documented expectations.

Answer: yes and no, technically it can, but it is an invasive change
which will certainly introduce regressions. See the answer to question 2
for an example. Technically .shutdown exists so that drivers can do
something lightweight to quiesce the hardware, without really caring too
much about data structure integrity (hey, the kernel is going to die
soon anyway). But some drivers, like bcmgenet, do the same thing in
.resume and .shutdown, which blurs the lines quite a lot. If the device
links were to start calling .remove at shutdown time, potentially after
.shutdown was already called, bcmgenet would effectively unregister its
net device twice. Yikes.

Question 6: How about a patch on the device core that is more lightweight?
Wouldn't it be sensible for device_shutdown() to just call ->remove if
the device's bus has no ->shutdown, and the device's driver doesn't have
a ->shutdown either?

Answer: This would sometimes work, the vast majority of DSA switch
drivers, and Ethernet controllers (in this case used as DSA masters) do
not have a .shutdown method implemented. But their bus does: PCI does,
SPI controllers do, most of the time. So it would work for limited
scenarios, but would be ineffective in the general sense.

Question 7: I said that .shutdown, as opposed to .remove, doesn't really
care so much about the integrity of data structures. So how far should
we really go to fix this issue? Should we even bother to unbind the
whole DSA tree, when the sole problem is that we are the DSA master's
upper, and that is keeping a reference on it?

Answer: Well, any solution that does unnecessary data structure teardown
only delays the reboot for nothing. Lino's patch just bluntly calls
dsa_tree_teardown() from the switch .shutdown method, and this leaks
memory, namely dst->ports. But does this really matter? Nope, so let's
extrapolate. In this case, IMO, the simplest possible solution would be
to patch bcmgenet to not unregister the net device. Then treat every
other DSA master driver in the same way as they come, one by one.
Do you need to unregister_netdevice() at shutdown? No. Then don't.
Is it nice? Probably not, but I'm not seeing alternatives.

Also, unless I'm missing something, Lino probably still sees the WARN_ON
in bcmgenet's unregister_netdevice() about eth0 getting unregistered
while having an upper interface. If not, it's by sheer luck that the DSA
switch's ->shutdown gets called before bcmgenet's ->shutdown. But for
this reason, it isn't a great solution either. If the device links can't
guarantee us some sort of shutdown ordering (what we ideally want, as
mentioned, is for the DSA switch driver to get _unbound_ (->remove)
before the DSA master gets unbound or shut down).
Vladimir Oltean Sept. 9, 2021, 11:23 p.m. UTC | #19
On Fri, Sep 10, 2021 at 01:54:57AM +0300, Vladimir Oltean wrote:
> On Thu, Sep 09, 2021 at 07:07:33PM +0200, Lino Sanfilippo wrote:
> > > It does not scale really well to have individual drivers call
> > > dsa_tree_shutdown() in their respective .shutdown callback, and in a
> > > multi-switch configuration, I am not sure what the results would
> > > look like.
> > >
> > > In premise, each driver ought to be able to call
> > > dsa_unregister_switch(), along with all of the driver specific
> > > shutdown and eventually, given proper device ordering the DSA tree
> > > would get automatically torn down, and then the DSA master's
> > > .shutdown() callback would be called.
> > >
> > > FWIW, the reason why we call .shutdown() in bcmgenet is to turn off
> > > DMA and clocks, which matters for kexec (DMA) as well as power
> > > savings (S5 mode).
> >
> > I agree with the scalability. Concerning the multi-switch case I dont
> > know about the possible issues (I am quite new to working with DSA).
> > So lets wait for Vladimirs solution.
> 
> I'm back for now and was able to spend a bit more time and understand
> what is happening.
> 
> So first things first: why does DSA call dev_hold long-term on the
> master, and where from?
> 
> Answer: it does so since commit 2f1e8ea726e9 ("net: dsa: link interfaces
> with the DSA master to get rid of lockdep warnings"), see this call path:
> 
> dsa_slave_create
> -> netdev_upper_dev_link
>    -> __netdev_upper_dev_link
>       -> __netdev_adjacent_dev_insert
>          -> dev_hold
> 
> Ok, so since DSA holds a reference to the master interface, it is
> natural that unregister_netdevice() will not finish, and it will hang
> the system.
> 
> Question 2: why does bcmgenet need to unregister the net device on
> shutdown?
> 
> See Florian's answer, it doesn't, strictly speaking, it just needs to
> turn off the DMA and some clocks.
> 
> Question 3: can we revert commit 2f1e8ea726e9?
> 
> Answer: not so easily, we are looking at >10 commits to revert, and find
> other solutions to some problems. We have built in the meantime on top
> of the fact that there is an upper/lower relationship between DSA user
> ports and the DSA master.
> 
> Question 4: how do other stacked interfaces deal with this?
> 
> Answer: as I said in the commit message of 2f1e8ea726e9, DSA is not
> VLAN, DSA has unique challenges of its own, like a tree of struct
> devices to manage, with their own lifetime. So what other drivers do is
> not really relevant. Anyway, to entertain the question: VLAN watches the
> NETDEV_UNREGISTER event emitted on the netdev notifier chain for its
> real_dev, and effectively unregisters itself. Now this is exactly why it
> is irrelevant, we can watch for NETDEV_UNREGISTER on the DSA master, but
> then what? There is nothing sensible to do. Consider that in the master
> unbind case (not shutdown), both the NETDEV_UNREGISTER code path will
> execute, and the unbind of the DSA switch itself, due to that device
> link. But let's say we delete the device link and leave only the
> NETDEV_UNREGISTER code path to do something. What?
> device_release_driver(ds->dev), most probably. That would effectively
> force the DSA unbind path. But surprise: the DSA unbind path takes the
> rtnl_mutex from quite a couple of places, and we are already under the
> rtnl_lock (held by the netdev notifier chain). So, unless we schedule
> the DSA device driver detach, there is an impending deadlock.
> Ok, let's entertain even that: detach the DSA driver in a scheduled work
> item, with the rtnl_lock not held. First off, we will trigger again the
> WARN_ON solved by commit 2f1e8ea726e9 (because the unregistering of the
> DSA master has "completed", but it still has an upper interface - us),
> and secondly, the unregister_netdev function will have already deleted
> stuff belonging to the DSA master, namely its sysfs entries. But DSA
> also touches the master's sysfs, namely the "tagging" file. So NULL
> pointer dereference on the master's sysfs.
> So very simply put, DSA cannot unbind itself from the switch device when
> the master net device unregisters. The best case scenario would be for
> DSA to unbind _before_ the net device even unregisters. That was the
> whole point of my attempt with the device links, to ensure shutdown
> _ordering_.
> 
> Question 5: can the device core actually be patched to call
> device_links_unbind_consumers() from device_shutdown()? This would
> actually simplify DSA's options, and make the device links live up to
> their documented expectations.
> 
> Answer: yes and no, technically it can, but it is an invasive change
> which will certainly introduce regressions. See the answer to question 2
> for an example. Technically .shutdown exists so that drivers can do
> something lightweight to quiesce the hardware, without really caring too
> much about data structure integrity (hey, the kernel is going to die
> soon anyway). But some drivers, like bcmgenet, do the same thing in
> .resume and .shutdown, which blurs the lines quite a lot. If the device
> links were to start calling .remove at shutdown time, potentially after
> .shutdown was already called, bcmgenet would effectively unregister its
> net device twice. Yikes.
> 
> Question 6: How about a patch on the device core that is more lightweight?
> Wouldn't it be sensible for device_shutdown() to just call ->remove if
> the device's bus has no ->shutdown, and the device's driver doesn't have
> a ->shutdown either?
> 
> Answer: This would sometimes work, the vast majority of DSA switch
> drivers, and Ethernet controllers (in this case used as DSA masters) do
> not have a .shutdown method implemented. But their bus does: PCI does,
> SPI controllers do, most of the time. So it would work for limited
> scenarios, but would be ineffective in the general sense.
> 
> Question 7: I said that .shutdown, as opposed to .remove, doesn't really
> care so much about the integrity of data structures. So how far should
> we really go to fix this issue? Should we even bother to unbind the
> whole DSA tree, when the sole problem is that we are the DSA master's
> upper, and that is keeping a reference on it?
> 
> Answer: Well, any solution that does unnecessary data structure teardown
> only delays the reboot for nothing. Lino's patch just bluntly calls
> dsa_tree_teardown() from the switch .shutdown method, and this leaks
> memory, namely dst->ports. But does this really matter? Nope, so let's
> extrapolate. In this case, IMO, the simplest possible solution would be
> to patch bcmgenet to not unregister the net device. Then treat every
> other DSA master driver in the same way as they come, one by one.
> Do you need to unregister_netdevice() at shutdown? No. Then don't.
> Is it nice? Probably not, but I'm not seeing alternatives.
> 
> Also, unless I'm missing something, Lino probably still sees the WARN_ON
> in bcmgenet's unregister_netdevice() about eth0 getting unregistered
> while having an upper interface. If not, it's by sheer luck that the DSA
> switch's ->shutdown gets called before bcmgenet's ->shutdown. But for
> this reason, it isn't a great solution either. If the device links can't
> guarantee us some sort of shutdown ordering (what we ideally want, as
> mentioned, is for the DSA switch driver to get _unbound_ (->remove)
> before the DSA master gets unbound or shut down).

I forgot about this, for completeness:

Question 8: Ok, so this is an even more lightweight variant of question 6.
To patch device_shutdown here:

 		if (dev->bus && dev->bus->shutdown) {
 			if (initcall_debug)
 				dev_info(dev, "shutdown\n");
 			dev->bus->shutdown(dev);
 		} else if (dev->driver && dev->driver->shutdown) {
 			if (initcall_debug)
 				dev_info(dev, "shutdown\n");
 			dev->driver->shutdown(dev);
+		} else {
+			__device_release_driver(dev, parent);
 		}

would go towards helping DSA in general, but it wouldn't help the situation at hand,
and it would introduce regressions.

So what about patching bcmgenet (and other drivers) to implement .shutdown in the following way:

	device_release_driver(&pdev->dev);

basically this should force-unbind the driver from the device, which
would quite nicely make the device link go into action and make DSA
unbind too.

Answer: device_release_driver calls device_lock(dev), and device_shutdown
also holds that lock when it calls our ->shutdown method. So unless the
device core would be so nice so as to provide a generic shutdown method
that just unbinds the device using an unlocked version of device_release_driver,
we are back to square one with this solution. Anything that needs to patch
the device core is more or less disqualified, especially for a bug fix.
Vladimir Oltean Sept. 10, 2021, 1:08 a.m. UTC | #20
On Fri, Sep 10, 2021 at 02:23:58AM +0300, Vladimir Oltean wrote:
> On Fri, Sep 10, 2021 at 01:54:57AM +0300, Vladimir Oltean wrote:
> > On Thu, Sep 09, 2021 at 07:07:33PM +0200, Lino Sanfilippo wrote:
> > > > It does not scale really well to have individual drivers call
> > > > dsa_tree_shutdown() in their respective .shutdown callback, and in a
> > > > multi-switch configuration, I am not sure what the results would
> > > > look like.
> > > >
> > > > In premise, each driver ought to be able to call
> > > > dsa_unregister_switch(), along with all of the driver specific
> > > > shutdown and eventually, given proper device ordering the DSA tree
> > > > would get automatically torn down, and then the DSA master's
> > > > .shutdown() callback would be called.
> > > >
> > > > FWIW, the reason why we call .shutdown() in bcmgenet is to turn off
> > > > DMA and clocks, which matters for kexec (DMA) as well as power
> > > > savings (S5 mode).
> > >
> > > I agree with the scalability. Concerning the multi-switch case I dont
> > > know about the possible issues (I am quite new to working with DSA).
> > > So lets wait for Vladimirs solution.
> > 
> > I'm back for now and was able to spend a bit more time and understand
> > what is happening.
> > 
> > So first things first: why does DSA call dev_hold long-term on the
> > master, and where from?
> > 
> > Answer: it does so since commit 2f1e8ea726e9 ("net: dsa: link interfaces
> > with the DSA master to get rid of lockdep warnings"), see this call path:
> > 
> > dsa_slave_create
> > -> netdev_upper_dev_link
> >    -> __netdev_upper_dev_link
> >       -> __netdev_adjacent_dev_insert
> >          -> dev_hold
> > 
> > Ok, so since DSA holds a reference to the master interface, it is
> > natural that unregister_netdevice() will not finish, and it will hang
> > the system.
> > 
> > Question 2: why does bcmgenet need to unregister the net device on
> > shutdown?
> > 
> > See Florian's answer, it doesn't, strictly speaking, it just needs to
> > turn off the DMA and some clocks.
> > 
> > Question 3: can we revert commit 2f1e8ea726e9?
> > 
> > Answer: not so easily, we are looking at >10 commits to revert, and find
> > other solutions to some problems. We have built in the meantime on top
> > of the fact that there is an upper/lower relationship between DSA user
> > ports and the DSA master.
> > 
> > Question 4: how do other stacked interfaces deal with this?
> > 
> > Answer: as I said in the commit message of 2f1e8ea726e9, DSA is not
> > VLAN, DSA has unique challenges of its own, like a tree of struct
> > devices to manage, with their own lifetime. So what other drivers do is
> > not really relevant. Anyway, to entertain the question: VLAN watches the
> > NETDEV_UNREGISTER event emitted on the netdev notifier chain for its
> > real_dev, and effectively unregisters itself. Now this is exactly why it
> > is irrelevant, we can watch for NETDEV_UNREGISTER on the DSA master, but
> > then what? There is nothing sensible to do. Consider that in the master
> > unbind case (not shutdown), both the NETDEV_UNREGISTER code path will
> > execute, and the unbind of the DSA switch itself, due to that device
> > link. But let's say we delete the device link and leave only the
> > NETDEV_UNREGISTER code path to do something. What?
> > device_release_driver(ds->dev), most probably. That would effectively
> > force the DSA unbind path. But surprise: the DSA unbind path takes the
> > rtnl_mutex from quite a couple of places, and we are already under the
> > rtnl_lock (held by the netdev notifier chain). So, unless we schedule
> > the DSA device driver detach, there is an impending deadlock.
> > Ok, let's entertain even that: detach the DSA driver in a scheduled work
> > item, with the rtnl_lock not held. First off, we will trigger again the
> > WARN_ON solved by commit 2f1e8ea726e9 (because the unregistering of the
> > DSA master has "completed", but it still has an upper interface - us),
> > and secondly, the unregister_netdev function will have already deleted
> > stuff belonging to the DSA master, namely its sysfs entries. But DSA
> > also touches the master's sysfs, namely the "tagging" file. So NULL
> > pointer dereference on the master's sysfs.
> > So very simply put, DSA cannot unbind itself from the switch device when
> > the master net device unregisters. The best case scenario would be for
> > DSA to unbind _before_ the net device even unregisters. That was the
> > whole point of my attempt with the device links, to ensure shutdown
> > _ordering_.
> > 
> > Question 5: can the device core actually be patched to call
> > device_links_unbind_consumers() from device_shutdown()? This would
> > actually simplify DSA's options, and make the device links live up to
> > their documented expectations.
> > 
> > Answer: yes and no, technically it can, but it is an invasive change
> > which will certainly introduce regressions. See the answer to question 2
> > for an example. Technically .shutdown exists so that drivers can do
> > something lightweight to quiesce the hardware, without really caring too
> > much about data structure integrity (hey, the kernel is going to die
> > soon anyway). But some drivers, like bcmgenet, do the same thing in
> > .resume and .shutdown, which blurs the lines quite a lot. If the device
> > links were to start calling .remove at shutdown time, potentially after
> > .shutdown was already called, bcmgenet would effectively unregister its
> > net device twice. Yikes.
> > 
> > Question 6: How about a patch on the device core that is more lightweight?
> > Wouldn't it be sensible for device_shutdown() to just call ->remove if
> > the device's bus has no ->shutdown, and the device's driver doesn't have
> > a ->shutdown either?
> > 
> > Answer: This would sometimes work, the vast majority of DSA switch
> > drivers, and Ethernet controllers (in this case used as DSA masters) do
> > not have a .shutdown method implemented. But their bus does: PCI does,
> > SPI controllers do, most of the time. So it would work for limited
> > scenarios, but would be ineffective in the general sense.
> > 
> > Question 7: I said that .shutdown, as opposed to .remove, doesn't really
> > care so much about the integrity of data structures. So how far should
> > we really go to fix this issue? Should we even bother to unbind the
> > whole DSA tree, when the sole problem is that we are the DSA master's
> > upper, and that is keeping a reference on it?
> > 
> > Answer: Well, any solution that does unnecessary data structure teardown
> > only delays the reboot for nothing. Lino's patch just bluntly calls
> > dsa_tree_teardown() from the switch .shutdown method, and this leaks
> > memory, namely dst->ports. But does this really matter? Nope, so let's
> > extrapolate. In this case, IMO, the simplest possible solution would be
> > to patch bcmgenet to not unregister the net device. Then treat every
> > other DSA master driver in the same way as they come, one by one.
> > Do you need to unregister_netdevice() at shutdown? No. Then don't.
> > Is it nice? Probably not, but I'm not seeing alternatives.
> > 
> > Also, unless I'm missing something, Lino probably still sees the WARN_ON
> > in bcmgenet's unregister_netdevice() about eth0 getting unregistered
> > while having an upper interface. If not, it's by sheer luck that the DSA
> > switch's ->shutdown gets called before bcmgenet's ->shutdown. But for
> > this reason, it isn't a great solution either. If the device links can't
> > guarantee us some sort of shutdown ordering (what we ideally want, as
> > mentioned, is for the DSA switch driver to get _unbound_ (->remove)
> > before the DSA master gets unbound or shut down).
> 
> I forgot about this, for completeness:
> 
> Question 8: Ok, so this is an even more lightweight variant of question 6.
> To patch device_shutdown here:
> 
>  		if (dev->bus && dev->bus->shutdown) {
>  			if (initcall_debug)
>  				dev_info(dev, "shutdown\n");
>  			dev->bus->shutdown(dev);
>  		} else if (dev->driver && dev->driver->shutdown) {
>  			if (initcall_debug)
>  				dev_info(dev, "shutdown\n");
>  			dev->driver->shutdown(dev);
> +		} else {
> +			__device_release_driver(dev, parent);
>  		}
> 
> would go towards helping DSA in general, but it wouldn't help the situation at hand,
> and it would introduce regressions.
> 
> So what about patching bcmgenet (and other drivers) to implement .shutdown in the following way:
> 
> 	device_release_driver(&pdev->dev);
> 
> basically this should force-unbind the driver from the device, which
> would quite nicely make the device link go into action and make DSA
> unbind too.
> 
> Answer: device_release_driver calls device_lock(dev), and device_shutdown
> also holds that lock when it calls our ->shutdown method. So unless the
> device core would be so nice so as to provide a generic shutdown method
> that just unbinds the device using an unlocked version of device_release_driver,
> we are back to square one with this solution. Anything that needs to patch
> the device core is more or less disqualified, especially for a bug fix.

Question 9: can Lino's patch set be generalized to all DSA switches,
i.e. can all DSA drivers redirect their ->shutdown method to their
->remove method?

Answer: Apart from the fact that mdio_driver_register() does not provide
for a ->shutdown() method, which can be trivially addressed, we still
have issues. The fundamental problem is that some bus device drivers
implement ->shutdown as ->remove for their own device, see dspi_shutdown()
and dspi_remove() for an example.
When that happens, the DSA switch device attached to that bus will be
(a) once unbound from its driver (by dspi_shutdown -> dspi_remove ->
spi_unregister_controller -> device_for_each_child(...unregister), and
(b) once shut down (by its own ->shutdown method).

With the "ds" structure being naturally destroyed by the first call to
the ->remove function, a second call to ->remove is impossible to
succeed without tripping some NULL pointer, from the device's ->shutdown path.
Simply said, DSA is not designed to support an unbalanced number of
calls to dsa_register_switch and dsa_unregister_switch.

In fact, if Lino's ksz9897 switch was attached to the dspi driver for a
SPI controller, even his own patches would not work and result in this
unbalance of calls that I mentioned earlier. If we're going to fix it,
let's think of something that covers all cases at least.

Simply put, it looks like we need some guidance (again) from driver core
maintainers as to what should a ->suspend method _not_ do. Specifically,
if it is okay to redirect ->suspend to ->remove. It looks like in the
case of buses doing that, and child devices doing that too, the results
are not quite sane. And maybe that would give us some clue as to what to
do about the genet driver which does the same thing.
Saravana Kannan Sept. 10, 2021, 1:32 a.m. UTC | #21
On Thu, Sep 9, 2021 at 9:00 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> +Saravana,
>
> On 9/9/2021 8:47 AM, Vladimir Oltean wrote:
> > On Thu, Sep 09, 2021 at 03:19:52PM +0200, Lino Sanfilippo wrote:
> >>> Do you see similar things on your 5.10 kernel?
> >>
> >> For the master device is see
> >>
> >> lrwxrwxrwx 1 root root 0 Sep  9 14:10 /sys/class/net/eth0/device/consumer:spi:spi3.0 -> ../../../virtual/devlink/platform:fd580000.ethernet--spi:spi3.0
> >
> > So this is the worst of the worst, we have a device link but it doesn't help.
> >
> > Where the device link helps is here:
> >
> > __device_release_driver
> >       while (device_links_busy(dev))
> >               device_links_unbind_consumers(dev);
> >
> > but during dev_shutdown, device_links_unbind_consumers does not get called
> > (actually I am not even sure whether it should).
> >
> > I've reproduced your issue by making this very simple change:
> >
> > diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> > index 60d94e0a07d6..ec00f34cac47 100644
> > --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> > +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> > @@ -1372,6 +1372,7 @@ static struct pci_driver enetc_pf_driver = {
> >       .id_table = enetc_pf_id_table,
> >       .probe = enetc_pf_probe,
> >       .remove = enetc_pf_remove,
> > +     .shutdown = enetc_pf_remove,
> >   #ifdef CONFIG_PCI_IOV
> >       .sriov_configure = enetc_sriov_configure,
> >   #endif
> >
> > on my DSA master driver. This is what the genet driver has "special".
> >
> > I was led into grave error by Documentation/driver-api/device_link.rst,
> > which I've based my patch on, where it clearly says that device links
> > are supposed to help with shutdown ordering (how?!).
>
> I was also under the impression that device links were supposed to help
> with shutdown ordering, because it does matter a lot. One thing that I
> had to work before (and seems like it came back recently) is the
> shutdown ordering between gpio_keys.c and the GPIO controller. If you
> suspend the GPIO controller first, gpio_keys.c never gets a chance to
> keep the GPIO pin configured for a wake-up interrupt, therefore no
> wake-up event happens on key presses, whoops.

This is more of a Rafael question. Adding him. I haven't looked too
closely at device links and shutdown.

-Saravana

>
> >
> > So the question is, why did my DSA trees get torn down on shutdown?
> > Basically the short answer is that my SPI controller driver does
> > implement .shutdown, and calls the same code path as the .remove code,
> > which calls spi_unregister_controller which removes all SPI children..
> >
> > When I added this device link, one of the main objectives was to not
> > modify all DSA drivers. I was certain based on the documentation that
> > device links would help, now I'm not so sure anymore.
> >
> > So what happens is that the DSA master attempts to unregister its net
> > device on .shutdown, but DSA does not implement .shutdown, so it just
> > sits there holding a reference (supposedly via dev_hold, but where from?!)
> > to the master, which makes netdev_wait_allrefs to wait and wait.
>
> It's not coming from of_find_net_device_by_node() that's for sure and
> with OF we don't go through the code path calling
> dsa_dev_to_net_device() which does call dev_hold() and then shortly
> thereafter the caller calls dev_put() anyway.
>
> >
> > I need more time for the denial phase to pass, and to understand what
> > can actually be done. I will also be away from the keyboard for the next
> > few days, so it might take a while. Your patches obviously offer a
> > solution only for KSZ switches, we need something more general. If I
> > understand your solution, it works not by virtue of there being any
> > shutdown ordering guarantee at all, but simply due to the fact that
> > DSA's .shutdown hook gets called eventually, and the reference to the
> > master gets freed eventually, which unblocks the unregister_netdevice
> > call from the master. I don't yet understand why DSA holds a long-term
> > reference to the master, that's one thing I need to figure out.
> >
>
> Agreed.
> --
> Florian
Florian Fainelli Sept. 10, 2021, 2:15 a.m. UTC | #22
On 9/9/2021 3:54 PM, Vladimir Oltean wrote:
[snip]
> Question 6: How about a patch on the device core that is more lightweight?
> Wouldn't it be sensible for device_shutdown() to just call ->remove if
> the device's bus has no ->shutdown, and the device's driver doesn't have
> a ->shutdown either?
> 
> Answer: This would sometimes work, the vast majority of DSA switch
> drivers, and Ethernet controllers (in this case used as DSA masters) do
> not have a .shutdown method implemented. But their bus does: PCI does,
> SPI controllers do, most of the time. So it would work for limited
> scenarios, but would be ineffective in the general sense.

Having wondered about that question as well, I don't really see a 
compelling reason as to why we do not default to calling .remove() when 
.shutdown() is not implemented. In almost all of the cases the semantics 
of .remove() are superior to those required by .shutdown().

> 
> Question 7: I said that .shutdown, as opposed to .remove, doesn't really
> care so much about the integrity of data structures. So how far should
> we really go to fix this issue? Should we even bother to unbind the
> whole DSA tree, when the sole problem is that we are the DSA master's
> upper, and that is keeping a reference on it?
> 
> Answer: Well, any solution that does unnecessary data structure teardown
> only delays the reboot for nothing. Lino's patch just bluntly calls
> dsa_tree_teardown() from the switch .shutdown method, and this leaks
> memory, namely dst->ports. But does this really matter? Nope, so let's
> extrapolate. In this case, IMO, the simplest possible solution would be
> to patch bcmgenet to not unregister the net device. Then treat every
> other DSA master driver in the same way as they come, one by one.
> Do you need to unregister_netdevice() at shutdown? No. Then don't.
> Is it nice? Probably not, but I'm not seeing alternatives.

It does not really scale but we also don't have that many DSA masters to 
support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, 
enetc, mv643xx_eth, cpsw, macb. If you want me to patch bcmgenet, give 
me a few days to test and make sure there is no power management 
regression, that's the primary concern I have.

> 
> Also, unless I'm missing something, Lino probably still sees the WARN_ON
> in bcmgenet's unregister_netdevice() about eth0 getting unregistered
> while having an upper interface. If not, it's by sheer luck that the DSA
> switch's ->shutdown gets called before bcmgenet's ->shutdown. But for
> this reason, it isn't a great solution either. If the device links can't
> guarantee us some sort of shutdown ordering (what we ideally want, as
> mentioned, is for the DSA switch driver to get _unbound_ (->remove)
> before the DSA master gets unbound or shut down).
> 

All of your questions are good and I don't have answers to any of them, 
however I would like you and others to reason about .shutdown() not just 
in the context of a reboot, or kexec'd kernel but also in the context of 
putting the system into ACPI S5 (via poweroff). In that case the goal is 
not only to quiesce the device, the goal is also to put it in a low 
power mode.

For bcmgenet specifically the code path that leads to a driver remove is 
well tested and is guaranteeing the network device registration, thus 
putting the PHY into suspend, shutting down DMAs, turning off clocks. 
This is a big hammer, but it gets the job done and does not introduce 
yet another code path to test, it's the same as the module removal.
Andrew Lunn Sept. 10, 2021, 11:51 a.m. UTC | #23
> It does not really scale but we also don't have that many DSA masters to
> support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, enetc,
> mv643xx_eth, cpsw, macb.

fec, mvneta, mvpp2, i210/igb.

     Andrew
Vladimir Oltean Sept. 10, 2021, 2:58 p.m. UTC | #24
On Fri, Sep 10, 2021 at 01:51:56PM +0200, Andrew Lunn wrote:
> > It does not really scale but we also don't have that many DSA masters to
> > support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, enetc,
> > mv643xx_eth, cpsw, macb.
> 
> fec, mvneta, mvpp2, i210/igb.

I can probably double that list only with Freescale/NXP Ethernet
drivers, some of which are not even submitted to mainline. To name some
mainline drivers: gianfar, dpaa-eth, dpaa2-eth, dpaa2-switch, ucc_geth.
Also consider that DSA/switchdev drivers can also be DSA masters of
their own, we have boards doing that too.

Anyway, I've decided to at least try and accept the fact that DSA
masters will unregister their net_device on shutdown, and attempt to do
something sane for all DSA switches in that case.

Attached are two patches (they are fairly big so I won't paste them
inline, and I would like initial feedback before posting them to the
list).

As mentioned in those patches, the shutdown ordering guarantee is still
very important, I still have no clue what goes on there, what we need to
do, etc.
Vladimir Oltean Sept. 11, 2021, 11:44 a.m. UTC | #25
On Fri, Sep 10, 2021 at 05:58:52PM +0300, Vladimir Oltean wrote:
> On Fri, Sep 10, 2021 at 01:51:56PM +0200, Andrew Lunn wrote:
> > > It does not really scale but we also don't have that many DSA masters to
> > > support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, enetc,
> > > mv643xx_eth, cpsw, macb.
> > 
> > fec, mvneta, mvpp2, i210/igb.
> 
> I can probably double that list only with Freescale/NXP Ethernet
> drivers, some of which are not even submitted to mainline. To name some
> mainline drivers: gianfar, dpaa-eth, dpaa2-eth, dpaa2-switch, ucc_geth.
> Also consider that DSA/switchdev drivers can also be DSA masters of
> their own, we have boards doing that too.
> 
> Anyway, I've decided to at least try and accept the fact that DSA
> masters will unregister their net_device on shutdown, and attempt to do
> something sane for all DSA switches in that case.
> 
> Attached are two patches (they are fairly big so I won't paste them
> inline, and I would like initial feedback before posting them to the
> list).
> 
> As mentioned in those patches, the shutdown ordering guarantee is still
> very important, I still have no clue what goes on there, what we need to
> do, etc.

So to answer my own question, there is a comment above device_link_add:

 * A side effect of the link creation is re-ordering of dpm_list and the
 * devices_kset list by moving the consumer device and all devices depending
 * on it to the ends of these lists (that does not happen to devices that have
 * not been registered when this function is called).

so the fact that DSA uses device_link_add towards its master is not
exactly for nothing. device_shutdown() walks devices_kset from the back,
so this is our guarantee that DSA's shutdown happens before the master's
shutdown.

So these patches should be okay. Any other comments? If not, I will
formally submit them tomorrow towards the net tree.
Vladimir Oltean Sept. 12, 2021, 8:29 p.m. UTC | #26
On Sun, Sep 12, 2021 at 10:19:24PM +0200, Lino Sanfilippo wrote:
> 
> Hi,
> 
> On 10.09.21 at 16:58, Vladimir Oltean wrote:
> > On Fri, Sep 10, 2021 at 01:51:56PM +0200, Andrew Lunn wrote:
> >>> It does not really scale but we also don't have that many DSA masters to
> >>> support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, enetc,
> >>> mv643xx_eth, cpsw, macb.
> >>
> >> fec, mvneta, mvpp2, i210/igb.
> >
> > I can probably double that list only with Freescale/NXP Ethernet
> > drivers, some of which are not even submitted to mainline. To name some
> > mainline drivers: gianfar, dpaa-eth, dpaa2-eth, dpaa2-switch, ucc_geth.
> > Also consider that DSA/switchdev drivers can also be DSA masters of
> > their own, we have boards doing that too.
> >
> > Anyway, I've decided to at least try and accept the fact that DSA
> > masters will unregister their net_device on shutdown, and attempt to do
> > something sane for all DSA switches in that case.
> >
> > Attached are two patches (they are fairly big so I won't paste them
> > inline, and I would like initial feedback before posting them to the
> > list).
> >
> > As mentioned in those patches, the shutdown ordering guarantee is still
> > very important, I still have no clue what goes on there, what we need to
> > do, etc.
> >
> 
> I tested these patches with my 5.10 kernel (based on Gregs 5.10.27 stable
> kernel) and while I do not see the message "unregister_netdevice: waiting
> for eth0 to become free. Usage count = 2." any more the shutdown/reboot hangs, too.
> After a few attempts without any error messages on the console I was able to get a
>  stack trace. Something still seems to go wrong in bcm2835_spi_shutdown() (see attachment).
> I have not had the time yet to investigate this further (or to test the patches
>  with a newer kernel).

Could you post the full kernel output? The picture you've posted is
truncated and only shows a WARN_ON in rpi_firmware_transaction and is
probably a symptom and not the issue (which is above and not shown).
Lino Sanfilippo Sept. 13, 2021, 10:32 a.m. UTC | #27
Hi,

> Gesendet: Sonntag, 12. September 2021 um 22:29 Uhr
> Von: "Vladimir Oltean" <olteanv@gmail.com>
> An: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>
> Cc: "Andrew Lunn" <andrew@lunn.ch>, "Florian Fainelli" <f.fainelli@gmail.com>, "Saravana Kannan" <saravanak@google.com>, "Rafael J. Wysocki" <rafael@kernel.org>, p.rosenberger@kunbus.com, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, vivien.didelot@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
> Betreff: Re: [PATCH 0/3] Fix for KSZ DSA switch shutdown
>
> On Sun, Sep 12, 2021 at 10:19:24PM +0200, Lino Sanfilippo wrote:
> >
> > Hi,
> >
> > On 10.09.21 at 16:58, Vladimir Oltean wrote:
> > > On Fri, Sep 10, 2021 at 01:51:56PM +0200, Andrew Lunn wrote:
> > >>> It does not really scale but we also don't have that many DSA masters to
> > >>> support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, enetc,
> > >>> mv643xx_eth, cpsw, macb.
> > >>
> > >> fec, mvneta, mvpp2, i210/igb.
> > >
> > > I can probably double that list only with Freescale/NXP Ethernet
> > > drivers, some of which are not even submitted to mainline. To name some
> > > mainline drivers: gianfar, dpaa-eth, dpaa2-eth, dpaa2-switch, ucc_geth.
> > > Also consider that DSA/switchdev drivers can also be DSA masters of
> > > their own, we have boards doing that too.
> > >
> > > Anyway, I've decided to at least try and accept the fact that DSA
> > > masters will unregister their net_device on shutdown, and attempt to do
> > > something sane for all DSA switches in that case.
> > >
> > > Attached are two patches (they are fairly big so I won't paste them
> > > inline, and I would like initial feedback before posting them to the
> > > list).
> > >
> > > As mentioned in those patches, the shutdown ordering guarantee is still
> > > very important, I still have no clue what goes on there, what we need to
> > > do, etc.
> > >
> >
> > I tested these patches with my 5.10 kernel (based on Gregs 5.10.27 stable
> > kernel) and while I do not see the message "unregister_netdevice: waiting
> > for eth0 to become free. Usage count = 2." any more the shutdown/reboot hangs, too.
> > After a few attempts without any error messages on the console I was able to get a
> >  stack trace. Something still seems to go wrong in bcm2835_spi_shutdown() (see attachment).
> > I have not had the time yet to investigate this further (or to test the patches
> >  with a newer kernel).
>
> Could you post the full kernel output? The picture you've posted is
> truncated and only shows a WARN_ON in rpi_firmware_transaction and is
> probably a symptom and not the issue (which is above and not shown).
>

Unfortunately I dont see anything in the kernel log. The console output is all I get,
thats why I made the photo.

Regards,
Lino
Vladimir Oltean Sept. 13, 2021, 10:44 a.m. UTC | #28
On Mon, Sep 13, 2021 at 12:32:14PM +0200, Lino Sanfilippo wrote:
> Hi,
> 
> > Gesendet: Sonntag, 12. September 2021 um 22:29 Uhr
> > Von: "Vladimir Oltean" <olteanv@gmail.com>
> > An: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>
> > Cc: "Andrew Lunn" <andrew@lunn.ch>, "Florian Fainelli" <f.fainelli@gmail.com>, "Saravana Kannan" <saravanak@google.com>, "Rafael J. Wysocki" <rafael@kernel.org>, p.rosenberger@kunbus.com, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, vivien.didelot@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
> > Betreff: Re: [PATCH 0/3] Fix for KSZ DSA switch shutdown
> >
> > On Sun, Sep 12, 2021 at 10:19:24PM +0200, Lino Sanfilippo wrote:
> > >
> > > Hi,
> > >
> > > On 10.09.21 at 16:58, Vladimir Oltean wrote:
> > > > On Fri, Sep 10, 2021 at 01:51:56PM +0200, Andrew Lunn wrote:
> > > >>> It does not really scale but we also don't have that many DSA masters to
> > > >>> support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, enetc,
> > > >>> mv643xx_eth, cpsw, macb.
> > > >>
> > > >> fec, mvneta, mvpp2, i210/igb.
> > > >
> > > > I can probably double that list only with Freescale/NXP Ethernet
> > > > drivers, some of which are not even submitted to mainline. To name some
> > > > mainline drivers: gianfar, dpaa-eth, dpaa2-eth, dpaa2-switch, ucc_geth.
> > > > Also consider that DSA/switchdev drivers can also be DSA masters of
> > > > their own, we have boards doing that too.
> > > >
> > > > Anyway, I've decided to at least try and accept the fact that DSA
> > > > masters will unregister their net_device on shutdown, and attempt to do
> > > > something sane for all DSA switches in that case.
> > > >
> > > > Attached are two patches (they are fairly big so I won't paste them
> > > > inline, and I would like initial feedback before posting them to the
> > > > list).
> > > >
> > > > As mentioned in those patches, the shutdown ordering guarantee is still
> > > > very important, I still have no clue what goes on there, what we need to
> > > > do, etc.
> > > >
> > >
> > > I tested these patches with my 5.10 kernel (based on Gregs 5.10.27 stable
> > > kernel) and while I do not see the message "unregister_netdevice: waiting
> > > for eth0 to become free. Usage count = 2." any more the shutdown/reboot hangs, too.
> > > After a few attempts without any error messages on the console I was able to get a
> > >  stack trace. Something still seems to go wrong in bcm2835_spi_shutdown() (see attachment).
> > > I have not had the time yet to investigate this further (or to test the patches
> > >  with a newer kernel).
> >
> > Could you post the full kernel output? The picture you've posted is
> > truncated and only shows a WARN_ON in rpi_firmware_transaction and is
> > probably a symptom and not the issue (which is above and not shown).
> >
> 
> Unfortunately I dont see anything in the kernel log. The console output is all I get,
> thats why I made the photo.

To clarify, are you saying nothing above this line gets printed? Because
the part of the log you've posted in the picture is pretty much
unworkable:

[   99.375389] [<bf0dc56c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863ca0>] (platform_drv_shutdown+0x2c/0x30)

How do you access the device's serial console? Use a program with a
scrollback buffer like GNU screen or something.
Lino Sanfilippo Sept. 13, 2021, 11:01 a.m. UTC | #29
> Gesendet: Montag, 13. September 2021 um 12:44 Uhr
> Von: "Vladimir Oltean" <olteanv@gmail.com>
> An: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>
> Cc: "Andrew Lunn" <andrew@lunn.ch>, "Florian Fainelli" <f.fainelli@gmail.com>, "Saravana Kannan" <saravanak@google.com>, "Rafael J. Wysocki" <rafael@kernel.org>, p.rosenberger@kunbus.com, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, vivien.didelot@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
> Betreff: Re: [PATCH 0/3] Fix for KSZ DSA switch shutdown
>
> On Mon, Sep 13, 2021 at 12:32:14PM +0200, Lino Sanfilippo wrote:
> > Hi,
> >
> > > Gesendet: Sonntag, 12. September 2021 um 22:29 Uhr
> > > Von: "Vladimir Oltean" <olteanv@gmail.com>
> > > An: "Lino Sanfilippo" <LinoSanfilippo@gmx.de>
> > > Cc: "Andrew Lunn" <andrew@lunn.ch>, "Florian Fainelli" <f.fainelli@gmail.com>, "Saravana Kannan" <saravanak@google.com>, "Rafael J. Wysocki" <rafael@kernel.org>, p.rosenberger@kunbus.com, woojung.huh@microchip.com, UNGLinuxDriver@microchip.com, vivien.didelot@gmail.com, davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
> > > Betreff: Re: [PATCH 0/3] Fix for KSZ DSA switch shutdown
> > >
> > > On Sun, Sep 12, 2021 at 10:19:24PM +0200, Lino Sanfilippo wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 10.09.21 at 16:58, Vladimir Oltean wrote:
> > > > > On Fri, Sep 10, 2021 at 01:51:56PM +0200, Andrew Lunn wrote:
> > > > >>> It does not really scale but we also don't have that many DSA masters to
> > > > >>> support, I believe I can name them all: bcmgenet, stmmac, bcmsysport, enetc,
> > > > >>> mv643xx_eth, cpsw, macb.
> > > > >>
> > > > >> fec, mvneta, mvpp2, i210/igb.
> > > > >
> > > > > I can probably double that list only with Freescale/NXP Ethernet
> > > > > drivers, some of which are not even submitted to mainline. To name some
> > > > > mainline drivers: gianfar, dpaa-eth, dpaa2-eth, dpaa2-switch, ucc_geth.
> > > > > Also consider that DSA/switchdev drivers can also be DSA masters of
> > > > > their own, we have boards doing that too.
> > > > >
> > > > > Anyway, I've decided to at least try and accept the fact that DSA
> > > > > masters will unregister their net_device on shutdown, and attempt to do
> > > > > something sane for all DSA switches in that case.
> > > > >
> > > > > Attached are two patches (they are fairly big so I won't paste them
> > > > > inline, and I would like initial feedback before posting them to the
> > > > > list).
> > > > >
> > > > > As mentioned in those patches, the shutdown ordering guarantee is still
> > > > > very important, I still have no clue what goes on there, what we need to
> > > > > do, etc.
> > > > >
> > > >
> > > > I tested these patches with my 5.10 kernel (based on Gregs 5.10.27 stable
> > > > kernel) and while I do not see the message "unregister_netdevice: waiting
> > > > for eth0 to become free. Usage count = 2." any more the shutdown/reboot hangs, too.
> > > > After a few attempts without any error messages on the console I was able to get a
> > > >  stack trace. Something still seems to go wrong in bcm2835_spi_shutdown() (see attachment).
> > > > I have not had the time yet to investigate this further (or to test the patches
> > > >  with a newer kernel).
> > >
> > > Could you post the full kernel output? The picture you've posted is
> > > truncated and only shows a WARN_ON in rpi_firmware_transaction and is
> > > probably a symptom and not the issue (which is above and not shown).
> > >
> >
> > Unfortunately I dont see anything in the kernel log. The console output is all I get,
> > thats why I made the photo.
>
> To clarify, are you saying nothing above this line gets printed? Because
> the part of the log you've posted in the picture is pretty much
> unworkable:
>
> [   99.375389] [<bf0dc56c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863ca0>] (platform_drv_shutdown+0x2c/0x30)
>
> How do you access the device's serial console? Use a program with a
> scrollback buffer like GNU screen or something.
>

Ah no, this is not over a serial console. This is what I see via hdmi. I do not have a working serial connection yet.
Sorry I know this trace part is not very useful, I will try to get a full dump.
Vladimir Oltean Sept. 14, 2021, 6:48 p.m. UTC | #30
On Mon, Sep 13, 2021 at 01:01:20PM +0200, Lino Sanfilippo wrote:
> > > > Could you post the full kernel output? The picture you've posted is
> > > > truncated and only shows a WARN_ON in rpi_firmware_transaction and is
> > > > probably a symptom and not the issue (which is above and not shown).
> > > >
> > >
> > > Unfortunately I dont see anything in the kernel log. The console output is all I get,
> > > thats why I made the photo.
> >
> > To clarify, are you saying nothing above this line gets printed? Because
> > the part of the log you've posted in the picture is pretty much
> > unworkable:
> >
> > [   99.375389] [<bf0dc56c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863ca0>] (platform_drv_shutdown+0x2c/0x30)
> >
> > How do you access the device's serial console? Use a program with a
> > scrollback buffer like GNU screen or something.
> >
> 
> Ah no, this is not over a serial console. This is what I see via hdmi. I do not have a working serial connection yet.
> Sorry I know this trace part is not very useful, I will try to get a full dump.

Lino, are you going to provide a kernel output so I could look at your new breakage?
If you could set up a pstore logger with a ramoops region, you could
dump the log after the fact. Or if HDMI is all you have, you could use
an HDMI capture card to record it. Or just record the screen you're
looking at, as long as you don't have very shaky hands, whatever...
Lino Sanfilippo Sept. 15, 2021, 5:42 a.m. UTC | #31
Hi,

On 14.09.21 at 20:48, Vladimir Oltean wrote:
> On Mon, Sep 13, 2021 at 01:01:20PM +0200, Lino Sanfilippo wrote:
>>>>> Could you post the full kernel output? The picture you've posted is
>>>>> truncated and only shows a WARN_ON in rpi_firmware_transaction and is
>>>>> probably a symptom and not the issue (which is above and not shown).
>>>>>
>>>>
>>>> Unfortunately I dont see anything in the kernel log. The console output is all I get,
>>>> thats why I made the photo.
>>>
>>> To clarify, are you saying nothing above this line gets printed? Because
>>> the part of the log you've posted in the picture is pretty much
>>> unworkable:
>>>
>>> [   99.375389] [<bf0dc56c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863ca0>] (platform_drv_shutdown+0x2c/0x30)
>>>
>>> How do you access the device's serial console? Use a program with a
>>> scrollback buffer like GNU screen or something.
>>>
>>
>> Ah no, this is not over a serial console. This is what I see via hdmi. I do not have a working serial connection yet.
>> Sorry I know this trace part is not very useful, I will try to get a full dump.
>
> Lino, are you going to provide a kernel output so I could look at your new breakage?
> If you could set up a pstore logger with a ramoops region, you could
> dump the log after the fact. Or if HDMI is all you have, you could use
> an HDMI capture card to record it. Or just record the screen you're
> looking at, as long as you don't have very shaky hands, whatever...
>

Yes, I will try to get something useful. I have already set up a serial connection
now. I still see the shutdown stopping with your patch but I have not seen the
kernel dump any more. I will try further and provide a dump as soon as I am successful.

Regards,
Lino
Lino Sanfilippo Sept. 18, 2021, 7:37 p.m. UTC | #32
Hi Vladimir,

On 15.09.21 at 07:42, Lino Sanfilippo wrote:
> On 14.09.21 at 20:48, Vladimir Oltean wrote:
>> On Mon, Sep 13, 2021 at 01:01:20PM +0200, Lino Sanfilippo wrote:
>>>>>> Could you post the full kernel output? The picture you've posted is
>>>>>> truncated and only shows a WARN_ON in rpi_firmware_transaction and is
>>>>>> probably a symptom and not the issue (which is above and not shown).
>>>>>>
>>>>>
>>>>> Unfortunately I dont see anything in the kernel log. The console output is all I get,
>>>>> thats why I made the photo.
>>>>
>>>> To clarify, are you saying nothing above this line gets printed? Because
>>>> the part of the log you've posted in the picture is pretty much
>>>> unworkable:
>>>>
>>>> [   99.375389] [<bf0dc56c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863ca0>] (platform_drv_shutdown+0x2c/0x30)
>>>>
>>>> How do you access the device's serial console? Use a program with a
>>>> scrollback buffer like GNU screen or something.
>>>>
>>>
>>> Ah no, this is not over a serial console. This is what I see via hdmi. I do not have a working serial connection yet.
>>> Sorry I know this trace part is not very useful, I will try to get a full dump.
>>
>> Lino, are you going to provide a kernel output so I could look at your new breakage?
>> If you could set up a pstore logger with a ramoops region, you could
>> dump the log after the fact. Or if HDMI is all you have, you could use
>> an HDMI capture card to record it. Or just record the screen you're
>> looking at, as long as you don't have very shaky hands, whatever...
>>
>
> Yes, I will try to get something useful. I have already set up a serial connection
> now. I still see the shutdown stopping with your patch but I have not seen the
> kernel dump any more. I will try further and provide a dump as soon as I am successful.
>

Sorry for the delay. I was finally able to do some tests and get a dump via the serial console.
I tested with the latest Raspberry Pi kernel 5.10.y. Based on commit
4117cba235d24a7c4630dc38cb55cc80a04f5cf3. I applied your patches and got the following result
at shutdown:

raspberrypi login: [   58.754533] ------------[ cut here ]------------
[   58.760053] kernel BUG at drivers/net/phy/mdio_bus.c:651!
[   58.766361] Internal error: Oops - BUG: 0 [#1] SMP ARM
[   58.772376] Modules linked in: 8021q garp at24 tag_ksz tpm_tis_spi ksz9477_spi tpm_tis_core ksz9477 ksz_common tpm rts
[   58.837539] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G         C        5.10.63-RP_PURE_510_VLADFIX+ #3
[   58.848388] Hardware name: BCM2711
[   58.852875] PC is at mdiobus_free+0x4c/0x50
[   58.858143] LR is at devm_mdiobus_free+0x1c/0x20
[   58.863853] pc : [<c08c9218>]    lr : [<c08c1898>]    psr: 80000013
[   58.871212] sp : c18fdc38  ip : c18fdc48  fp : c18fdc44
[   58.877505] r10: 00000000  r9 : c0867104  r8 : c18fdc5c
[   58.883823] r7 : 00000013  r6 : c31c8000  r5 : c3a50000  r4 : c379db80
[   58.891442] r3 : c2ab4000  r2 : 00000002  r1 : c379dbc0  r0 : c2ab4000
[   58.899037] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   58.907297] Control: 30c5383d  Table: 03ac92c0  DAC: 55555555
[   58.914139] Process systemd-shutdow (pid: 1, stack limit = 0xff8113c1)
[   58.921774] Stack: (0xc18fdc38 to 0xc18fe000)
[   58.927285] dc20:                                                       c18fdc54 c18fdc48
[   58.936601] dc40: c08c1898 c08c91d8 c18fdc94 c18fdc58 c0866dac c08c1888 c31c819c c3527180
[   58.945921] dc60: c332d200 c1405048 c32f8800 c31c8000 00000000 bf191010 00000000 c32f8800
[   58.955289] dc80: c1095f3c c1aa6454 c18fdcac c18fdc98 c086715c c0866be8 c31c8000 00000000
[   58.964644] dca0: c18fdccc c18fdcb0 c0862c7c c0867128 c1a42e30 c31c8000 c14f7cf0 00000000
[   58.974018] dcc0: c18fdcdc c18fdcd0 c0862d40 c0862b68 c18fdcfc c18fdce0 c08613dc c0862d2c
[   58.983391] dce0: c31c8000 00000a68 c08ba6cc 00000000 c18fdd44 c18fdd00 c085c710 c086130c
[   58.992778] dd00: c0331394 c0332604 60000013 c18fdd74 c3656294 c1405048 c31c8000 c31c8000
[   59.002140] dd20: 00000000 c08ba6cc c160657c c155c018 c1095f3c c1aa6454 c18fdd5c c18fdd48
[   59.011521] dd40: c08ba6a8 c085c58c 00000000 00000000 c18fdd6c c18fdd60 c08ba6e4 c08ba670
[   59.020921] dd60: c18fdd9c c18fdd70 c085bc84 c08ba6d8 c18fdd8c c3656200 c3656394 c1405048
[   59.030334] dd80: c18fdda4 c32f8800 c32f8800 00000003 c18fddbc c18fdda0 c08bab7c c085bc20
[   59.039737] dda0: c32f8b80 c32f8800 00000000 c160657c c18fdddc c18fddc0 bf182554 c08bab4c
[   59.049164] ddc0: c1aa6400 c1a6e810 c1aa6410 c160657c c18fddf4 c18fdde0 bf1825a8 bf18252c
[   59.058602] dde0: c1aa6414 c1a6e810 c18fde04 c18fddf8 c0863dec bf182598 c18fde3c c18fde08
[   59.068057] de00: c085fd9c c0863dcc c18fde3c c1095f2c c024865c 00000000 00000000 620bef00
[   59.077487] de20: c140f510 fee1dead c18fc000 00000058 c18fde4c c18fde40 c0249c84 c085fc0c
[   59.086920] de40: c18fde64 c18fde50 c0249d74 c0249c4c 01234567 00000000 c18fdf94 c18fde68
[   59.096386] de60: c024a018 c0249d64 c18fded4 c31b0c00 00000024 c18fdf58 00000005 c0441cec
[   59.105852] de80: c18fdec4 c18fde90 c0441b30 c049852c 00000000 c18fdea0 c073ad04 00000024
[   59.115330] dea0: c31b0c00 c18fdf58 c18fded4 c31b0c00 00000005 00000000 c18fdf4c c18fdec8
[   59.124821] dec0: c0441cec c0425cb0 c18fded0 c18fded4 00000000 00000005 00000000 00000024
[   59.134317] dee0: c18fdeec 00000005 c0200074 bec45250 00000004 bec45f62 00000010 bec45264
[   59.143792] df00: 00000005 bec4531c 0000000a b6d10040 00000001 c0200e70 ffffe000 c1546a80
[   59.153282] df20: 00000000 c0467268 c18fdf4c c1405048 c31b0c00 bec4528c 00000000 00000000
[   59.162787] df40: c18fdf94 c18fdf50 c0441e6c c0441c50 00000000 00000000 00000000 00000000
[   59.172269] df60: c18fdf94 c1405048 c0331394 c1405048 bec4531c 00000000 00000000 00000000
[   59.181763] df80: 00000058 c0200204 c18fdfa4 c18fdf98 c024a16c c0249f10 00000000 c18fdfa8
[   59.191250] dfa0: c0200040 c024a160 00000000 00000000 fee1dead 28121969 01234567 620bef00
[   59.200735] dfc0: 00000000 00000000 00000000 00000058 00000fff bec45be8 00000000 00476b80
[   59.210245] dfe0: 00488e3c bec45b68 004734a8 b6e4ca38 60000010 fee1dead 00000000 00000000
[   59.219759] Backtrace:
[   59.223546] [<c08c91cc>] (mdiobus_free) from [<c08c1898>] (devm_mdiobus_free+0x1c/0x20)
[   59.232909] [<c08c187c>] (devm_mdiobus_free) from [<c0866dac>] (release_nodes+0x1d0/0x220)
[   59.242551] [<c0866bdc>] (release_nodes) from [<c086715c>] (devres_release_all+0x40/0x60)
[   59.252132]  r10:c1aa6454 r9:c1095f3c r8:c32f8800 r7:00000000 r6:bf191010 r5:00000000
[   59.261338]  r4:c31c8000
[   59.265239] [<c086711c>] (devres_release_all) from [<c0862c7c>] (device_release_driver_internal+0x120/0x1c4)
[   59.276479]  r5:00000000 r4:c31c8000
[   59.281440] [<c0862b5c>] (device_release_driver_internal) from [<c0862d40>] (device_release_driver+0x20/0x24)
[   59.292802]  r7:00000000 r6:c14f7cf0 r5:c31c8000 r4:c1a42e30
[   59.299900] [<c0862d20>] (device_release_driver) from [<c08613dc>] (bus_remove_device+0xdc/0x108)
[   59.310267] [<c0861300>] (bus_remove_device) from [<c085c710>] (device_del+0x190/0x428)
[   59.319748]  r7:00000000 r6:c08ba6cc r5:00000a68 r4:c31c8000
[   59.326896] [<c085c580>] (device_del) from [<c08ba6a8>] (spi_unregister_device+0x44/0x68)
[   59.336583]  r10:c1aa6454 r9:c1095f3c r8:c155c018 r7:c160657c r6:c08ba6cc r5:00000000
[   59.345924]  r4:c31c8000
[   59.349971] [<c08ba664>] (spi_unregister_device) from [<c08ba6e4>] (__unregister+0x18/0x20)
[   59.359870]  r5:00000000 r4:00000000
[   59.364972] [<c08ba6cc>] (__unregister) from [<c085bc84>] (device_for_each_child+0x70/0xb4)
[   59.374899] [<c085bc14>] (device_for_each_child) from [<c08bab7c>] (spi_unregister_controller+0x3c/0x128)
[   59.385979]  r6:00000003 r5:c32f8800 r4:c32f8800
[   59.392086] [<c08bab40>] (spi_unregister_controller) from [<bf182554>] (bcm2835_spi_remove+0x34/0x6c [spi_bcm2835])
[   59.404000]  r7:c160657c r6:00000000 r5:c32f8800 r4:c32f8b80
[   59.411084] [<bf182520>] (bcm2835_spi_remove [spi_bcm2835]) from [<bf1825a8>] (bcm2835_spi_shutdown+0x1c/0x38 [spi_bc)
[   59.423755]  r7:c160657c r6:c1aa6410 r5:c1a6e810 r4:c1aa6400
[   59.430847] [<bf18258c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863dec>] (platform_drv_shutdown+0x2c/0x30)
[   59.442613]  r5:c1a6e810 r4:c1aa6414
[   59.447635] [<c0863dc0>] (platform_drv_shutdown) from [<c085fd9c>] (device_shutdown+0x19c/0x24c)
[   59.457932] [<c085fc00>] (device_shutdown) from [<c0249c84>] (kernel_restart_prepare+0x44/0x48)
[   59.468135]  r10:00000058 r9:c18fc000 r8:fee1dead r7:c140f510 r6:620bef00 r5:00000000
[   59.477470]  r4:00000000
[   59.481509] [<c0249c40>] (kernel_restart_prepare) from [<c0249d74>] (kernel_restart+0x1c/0x60)
[   59.491653] [<c0249d58>] (kernel_restart) from [<c024a018>] (__do_sys_reboot+0x114/0x1f8)
[   59.501359]  r5:00000000 r4:01234567
[   59.506447] [<c0249f04>] (__do_sys_reboot) from [<c024a16c>] (sys_reboot+0x18/0x1c)
[   59.515628]  r8:c0200204 r7:00000058 r6:00000000 r5:00000000 r4:00000000
[   59.523857] [<c024a154>] (sys_reboot) from [<c0200040>] (ret_fast_syscall+0x0/0x28)
[   59.533038] Exception stack(0xc18fdfa8 to 0xc18fdff0)
[   59.539607] dfa0:                   00000000 00000000 fee1dead 28121969 01234567 620bef00
[   59.549318] dfc0: 00000000 00000000 00000000 00000058 00000fff bec45be8 00000000 00476b80
[   59.559026] dfe0: 00488e3c bec45b68 004734a8 b6e4ca38
[   59.565596] Code: ebfe49f5 e89da800 ebed72a3 e89da800 (e7f001f2)
[   59.573246] ---[ end trace 7d800ce7b5664bb6 ]---
[   59.579413] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   59.588634] Rebooting in 10 seconds..

The concerning source code line 651 is in my case:

void mdiobus_free(struct mii_bus *bus)
{
	/* For compatibility with error handling in drivers. */
	if (bus->state == MDIOBUS_ALLOCATED) {
		kfree(bus);
		return;
	}

651<	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
	bus->state = MDIOBUS_RELEASED;

	put_device(&bus->dev);
}
EXPORT_SYMBOL(mdiobus_free);

I tested with both versions of your patchset, with the same result. I also tested
with a RP 5.14 kernel (the latest RP kernel) but I did not see the original issue
(i.e. the system hang) here for some reason.

I then tried to get the net-next kernel running on my system but without success so far. So for
now the result with the RP 5.10 is all I can offer. I hope that helps a bit nevertheless.

Regards,
Lino
Vladimir Oltean Sept. 18, 2021, 10:04 p.m. UTC | #33
On Sat, Sep 18, 2021 at 09:37:17PM +0200, Lino Sanfilippo wrote:
> Hi Vladimir,
> 
> On 15.09.21 at 07:42, Lino Sanfilippo wrote:
> > On 14.09.21 at 20:48, Vladimir Oltean wrote:
> >> On Mon, Sep 13, 2021 at 01:01:20PM +0200, Lino Sanfilippo wrote:
> >>>>>> Could you post the full kernel output? The picture you've posted is
> >>>>>> truncated and only shows a WARN_ON in rpi_firmware_transaction and is
> >>>>>> probably a symptom and not the issue (which is above and not shown).
> >>>>>>
> >>>>>
> >>>>> Unfortunately I dont see anything in the kernel log. The console output is all I get,
> >>>>> thats why I made the photo.
> >>>>
> >>>> To clarify, are you saying nothing above this line gets printed? Because
> >>>> the part of the log you've posted in the picture is pretty much
> >>>> unworkable:
> >>>>
> >>>> [   99.375389] [<bf0dc56c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863ca0>] (platform_drv_shutdown+0x2c/0x30)
> >>>>
> >>>> How do you access the device's serial console? Use a program with a
> >>>> scrollback buffer like GNU screen or something.
> >>>>
> >>>
> >>> Ah no, this is not over a serial console. This is what I see via hdmi. I do not have a working serial connection yet.
> >>> Sorry I know this trace part is not very useful, I will try to get a full dump.
> >>
> >> Lino, are you going to provide a kernel output so I could look at your new breakage?
> >> If you could set up a pstore logger with a ramoops region, you could
> >> dump the log after the fact. Or if HDMI is all you have, you could use
> >> an HDMI capture card to record it. Or just record the screen you're
> >> looking at, as long as you don't have very shaky hands, whatever...
> >>
> >
> > Yes, I will try to get something useful. I have already set up a serial connection
> > now. I still see the shutdown stopping with your patch but I have not seen the
> > kernel dump any more. I will try further and provide a dump as soon as I am successful.
> >
> 
> Sorry for the delay. I was finally able to do some tests and get a dump via the serial console.
> I tested with the latest Raspberry Pi kernel 5.10.y. Based on commit
> 4117cba235d24a7c4630dc38cb55cc80a04f5cf3. I applied your patches and got the following result
> at shutdown:
> 
> raspberrypi login: [   58.754533] ------------[ cut here ]------------
> [   58.760053] kernel BUG at drivers/net/phy/mdio_bus.c:651!
> [   58.766361] Internal error: Oops - BUG: 0 [#1] SMP ARM
> [   58.772376] Modules linked in: 8021q garp at24 tag_ksz tpm_tis_spi ksz9477_spi tpm_tis_core ksz9477 ksz_common tpm rts
> [   58.837539] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G         C        5.10.63-RP_PURE_510_VLADFIX+ #3
> [   58.848388] Hardware name: BCM2711
> [   58.852875] PC is at mdiobus_free+0x4c/0x50
> [   58.858143] LR is at devm_mdiobus_free+0x1c/0x20
> [   58.863853] pc : [<c08c9218>]    lr : [<c08c1898>]    psr: 80000013
> [   58.871212] sp : c18fdc38  ip : c18fdc48  fp : c18fdc44
> [   58.877505] r10: 00000000  r9 : c0867104  r8 : c18fdc5c
> [   58.883823] r7 : 00000013  r6 : c31c8000  r5 : c3a50000  r4 : c379db80
> [   58.891442] r3 : c2ab4000  r2 : 00000002  r1 : c379dbc0  r0 : c2ab4000
> [   58.899037] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [   58.907297] Control: 30c5383d  Table: 03ac92c0  DAC: 55555555
> [   58.914139] Process systemd-shutdow (pid: 1, stack limit = 0xff8113c1)
> [   58.921774] Stack: (0xc18fdc38 to 0xc18fe000)
> [   58.927285] dc20:                                                       c18fdc54 c18fdc48
> [   58.936601] dc40: c08c1898 c08c91d8 c18fdc94 c18fdc58 c0866dac c08c1888 c31c819c c3527180
> [   58.945921] dc60: c332d200 c1405048 c32f8800 c31c8000 00000000 bf191010 00000000 c32f8800
> [   58.955289] dc80: c1095f3c c1aa6454 c18fdcac c18fdc98 c086715c c0866be8 c31c8000 00000000
> [   58.964644] dca0: c18fdccc c18fdcb0 c0862c7c c0867128 c1a42e30 c31c8000 c14f7cf0 00000000
> [   58.974018] dcc0: c18fdcdc c18fdcd0 c0862d40 c0862b68 c18fdcfc c18fdce0 c08613dc c0862d2c
> [   58.983391] dce0: c31c8000 00000a68 c08ba6cc 00000000 c18fdd44 c18fdd00 c085c710 c086130c
> [   58.992778] dd00: c0331394 c0332604 60000013 c18fdd74 c3656294 c1405048 c31c8000 c31c8000
> [   59.002140] dd20: 00000000 c08ba6cc c160657c c155c018 c1095f3c c1aa6454 c18fdd5c c18fdd48
> [   59.011521] dd40: c08ba6a8 c085c58c 00000000 00000000 c18fdd6c c18fdd60 c08ba6e4 c08ba670
> [   59.020921] dd60: c18fdd9c c18fdd70 c085bc84 c08ba6d8 c18fdd8c c3656200 c3656394 c1405048
> [   59.030334] dd80: c18fdda4 c32f8800 c32f8800 00000003 c18fddbc c18fdda0 c08bab7c c085bc20
> [   59.039737] dda0: c32f8b80 c32f8800 00000000 c160657c c18fdddc c18fddc0 bf182554 c08bab4c
> [   59.049164] ddc0: c1aa6400 c1a6e810 c1aa6410 c160657c c18fddf4 c18fdde0 bf1825a8 bf18252c
> [   59.058602] dde0: c1aa6414 c1a6e810 c18fde04 c18fddf8 c0863dec bf182598 c18fde3c c18fde08
> [   59.068057] de00: c085fd9c c0863dcc c18fde3c c1095f2c c024865c 00000000 00000000 620bef00
> [   59.077487] de20: c140f510 fee1dead c18fc000 00000058 c18fde4c c18fde40 c0249c84 c085fc0c
> [   59.086920] de40: c18fde64 c18fde50 c0249d74 c0249c4c 01234567 00000000 c18fdf94 c18fde68
> [   59.096386] de60: c024a018 c0249d64 c18fded4 c31b0c00 00000024 c18fdf58 00000005 c0441cec
> [   59.105852] de80: c18fdec4 c18fde90 c0441b30 c049852c 00000000 c18fdea0 c073ad04 00000024
> [   59.115330] dea0: c31b0c00 c18fdf58 c18fded4 c31b0c00 00000005 00000000 c18fdf4c c18fdec8
> [   59.124821] dec0: c0441cec c0425cb0 c18fded0 c18fded4 00000000 00000005 00000000 00000024
> [   59.134317] dee0: c18fdeec 00000005 c0200074 bec45250 00000004 bec45f62 00000010 bec45264
> [   59.143792] df00: 00000005 bec4531c 0000000a b6d10040 00000001 c0200e70 ffffe000 c1546a80
> [   59.153282] df20: 00000000 c0467268 c18fdf4c c1405048 c31b0c00 bec4528c 00000000 00000000
> [   59.162787] df40: c18fdf94 c18fdf50 c0441e6c c0441c50 00000000 00000000 00000000 00000000
> [   59.172269] df60: c18fdf94 c1405048 c0331394 c1405048 bec4531c 00000000 00000000 00000000
> [   59.181763] df80: 00000058 c0200204 c18fdfa4 c18fdf98 c024a16c c0249f10 00000000 c18fdfa8
> [   59.191250] dfa0: c0200040 c024a160 00000000 00000000 fee1dead 28121969 01234567 620bef00
> [   59.200735] dfc0: 00000000 00000000 00000000 00000058 00000fff bec45be8 00000000 00476b80
> [   59.210245] dfe0: 00488e3c bec45b68 004734a8 b6e4ca38 60000010 fee1dead 00000000 00000000
> [   59.219759] Backtrace:
> [   59.223546] [<c08c91cc>] (mdiobus_free) from [<c08c1898>] (devm_mdiobus_free+0x1c/0x20)
> [   59.232909] [<c08c187c>] (devm_mdiobus_free) from [<c0866dac>] (release_nodes+0x1d0/0x220)
> [   59.242551] [<c0866bdc>] (release_nodes) from [<c086715c>] (devres_release_all+0x40/0x60)
> [   59.252132]  r10:c1aa6454 r9:c1095f3c r8:c32f8800 r7:00000000 r6:bf191010 r5:00000000
> [   59.261338]  r4:c31c8000
> [   59.265239] [<c086711c>] (devres_release_all) from [<c0862c7c>] (device_release_driver_internal+0x120/0x1c4)
> [   59.276479]  r5:00000000 r4:c31c8000
> [   59.281440] [<c0862b5c>] (device_release_driver_internal) from [<c0862d40>] (device_release_driver+0x20/0x24)
> [   59.292802]  r7:00000000 r6:c14f7cf0 r5:c31c8000 r4:c1a42e30
> [   59.299900] [<c0862d20>] (device_release_driver) from [<c08613dc>] (bus_remove_device+0xdc/0x108)
> [   59.310267] [<c0861300>] (bus_remove_device) from [<c085c710>] (device_del+0x190/0x428)
> [   59.319748]  r7:00000000 r6:c08ba6cc r5:00000a68 r4:c31c8000
> [   59.326896] [<c085c580>] (device_del) from [<c08ba6a8>] (spi_unregister_device+0x44/0x68)
> [   59.336583]  r10:c1aa6454 r9:c1095f3c r8:c155c018 r7:c160657c r6:c08ba6cc r5:00000000
> [   59.345924]  r4:c31c8000
> [   59.349971] [<c08ba664>] (spi_unregister_device) from [<c08ba6e4>] (__unregister+0x18/0x20)
> [   59.359870]  r5:00000000 r4:00000000
> [   59.364972] [<c08ba6cc>] (__unregister) from [<c085bc84>] (device_for_each_child+0x70/0xb4)
> [   59.374899] [<c085bc14>] (device_for_each_child) from [<c08bab7c>] (spi_unregister_controller+0x3c/0x128)
> [   59.385979]  r6:00000003 r5:c32f8800 r4:c32f8800
> [   59.392086] [<c08bab40>] (spi_unregister_controller) from [<bf182554>] (bcm2835_spi_remove+0x34/0x6c [spi_bcm2835])
> [   59.404000]  r7:c160657c r6:00000000 r5:c32f8800 r4:c32f8b80
> [   59.411084] [<bf182520>] (bcm2835_spi_remove [spi_bcm2835]) from [<bf1825a8>] (bcm2835_spi_shutdown+0x1c/0x38 [spi_bc)
> [   59.423755]  r7:c160657c r6:c1aa6410 r5:c1a6e810 r4:c1aa6400
> [   59.430847] [<bf18258c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863dec>] (platform_drv_shutdown+0x2c/0x30)
> [   59.442613]  r5:c1a6e810 r4:c1aa6414
> [   59.447635] [<c0863dc0>] (platform_drv_shutdown) from [<c085fd9c>] (device_shutdown+0x19c/0x24c)
> [   59.457932] [<c085fc00>] (device_shutdown) from [<c0249c84>] (kernel_restart_prepare+0x44/0x48)
> [   59.468135]  r10:00000058 r9:c18fc000 r8:fee1dead r7:c140f510 r6:620bef00 r5:00000000
> [   59.477470]  r4:00000000
> [   59.481509] [<c0249c40>] (kernel_restart_prepare) from [<c0249d74>] (kernel_restart+0x1c/0x60)
> [   59.491653] [<c0249d58>] (kernel_restart) from [<c024a018>] (__do_sys_reboot+0x114/0x1f8)
> [   59.501359]  r5:00000000 r4:01234567
> [   59.506447] [<c0249f04>] (__do_sys_reboot) from [<c024a16c>] (sys_reboot+0x18/0x1c)
> [   59.515628]  r8:c0200204 r7:00000058 r6:00000000 r5:00000000 r4:00000000
> [   59.523857] [<c024a154>] (sys_reboot) from [<c0200040>] (ret_fast_syscall+0x0/0x28)
> [   59.533038] Exception stack(0xc18fdfa8 to 0xc18fdff0)
> [   59.539607] dfa0:                   00000000 00000000 fee1dead 28121969 01234567 620bef00
> [   59.549318] dfc0: 00000000 00000000 00000000 00000058 00000fff bec45be8 00000000 00476b80
> [   59.559026] dfe0: 00488e3c bec45b68 004734a8 b6e4ca38
> [   59.565596] Code: ebfe49f5 e89da800 ebed72a3 e89da800 (e7f001f2)
> [   59.573246] ---[ end trace 7d800ce7b5664bb6 ]---
> [   59.579413] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [   59.588634] Rebooting in 10 seconds..
> 
> The concerning source code line 651 is in my case:
> 
> void mdiobus_free(struct mii_bus *bus)
> {
> 	/* For compatibility with error handling in drivers. */
> 	if (bus->state == MDIOBUS_ALLOCATED) {
> 		kfree(bus);
> 		return;
> 	}
> 
> 651<	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
> 	bus->state = MDIOBUS_RELEASED;
> 
> 	put_device(&bus->dev);
> }
> EXPORT_SYMBOL(mdiobus_free);
> 
> I tested with both versions of your patchset, with the same result. I also tested
> with a RP 5.14 kernel (the latest RP kernel) but I did not see the original issue
> (i.e. the system hang) here for some reason.
> 
> I then tried to get the net-next kernel running on my system but without success so far. So for
> now the result with the RP 5.10 is all I can offer. I hope that helps a bit nevertheless.

Thank you Lino, this is a very valuable report. I will send a v3 soon (not sure if today).
Vladimir Oltean Sept. 19, 2021, 12:29 a.m. UTC | #34
On Sun, Sep 19, 2021 at 01:04:12AM +0300, Vladimir Oltean wrote:
> On Sat, Sep 18, 2021 at 09:37:17PM +0200, Lino Sanfilippo wrote:
> > Hi Vladimir,
> > 
> > On 15.09.21 at 07:42, Lino Sanfilippo wrote:
> > > On 14.09.21 at 20:48, Vladimir Oltean wrote:
> > >> On Mon, Sep 13, 2021 at 01:01:20PM +0200, Lino Sanfilippo wrote:
> > >>>>>> Could you post the full kernel output? The picture you've posted is
> > >>>>>> truncated and only shows a WARN_ON in rpi_firmware_transaction and is
> > >>>>>> probably a symptom and not the issue (which is above and not shown).
> > >>>>>>
> > >>>>>
> > >>>>> Unfortunately I dont see anything in the kernel log. The console output is all I get,
> > >>>>> thats why I made the photo.
> > >>>>
> > >>>> To clarify, are you saying nothing above this line gets printed? Because
> > >>>> the part of the log you've posted in the picture is pretty much
> > >>>> unworkable:
> > >>>>
> > >>>> [   99.375389] [<bf0dc56c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863ca0>] (platform_drv_shutdown+0x2c/0x30)
> > >>>>
> > >>>> How do you access the device's serial console? Use a program with a
> > >>>> scrollback buffer like GNU screen or something.
> > >>>>
> > >>>
> > >>> Ah no, this is not over a serial console. This is what I see via hdmi. I do not have a working serial connection yet.
> > >>> Sorry I know this trace part is not very useful, I will try to get a full dump.
> > >>
> > >> Lino, are you going to provide a kernel output so I could look at your new breakage?
> > >> If you could set up a pstore logger with a ramoops region, you could
> > >> dump the log after the fact. Or if HDMI is all you have, you could use
> > >> an HDMI capture card to record it. Or just record the screen you're
> > >> looking at, as long as you don't have very shaky hands, whatever...
> > >>
> > >
> > > Yes, I will try to get something useful. I have already set up a serial connection
> > > now. I still see the shutdown stopping with your patch but I have not seen the
> > > kernel dump any more. I will try further and provide a dump as soon as I am successful.
> > >
> > 
> > Sorry for the delay. I was finally able to do some tests and get a dump via the serial console.
> > I tested with the latest Raspberry Pi kernel 5.10.y. Based on commit
> > 4117cba235d24a7c4630dc38cb55cc80a04f5cf3. I applied your patches and got the following result
> > at shutdown:
> > 
> > raspberrypi login: [   58.754533] ------------[ cut here ]------------
> > [   58.760053] kernel BUG at drivers/net/phy/mdio_bus.c:651!
> > [   58.766361] Internal error: Oops - BUG: 0 [#1] SMP ARM
> > [   58.772376] Modules linked in: 8021q garp at24 tag_ksz tpm_tis_spi ksz9477_spi tpm_tis_core ksz9477 ksz_common tpm rts
> > [   58.837539] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G         C        5.10.63-RP_PURE_510_VLADFIX+ #3
> > [   58.848388] Hardware name: BCM2711
> > [   58.852875] PC is at mdiobus_free+0x4c/0x50
> > [   58.858143] LR is at devm_mdiobus_free+0x1c/0x20
> > [   58.863853] pc : [<c08c9218>]    lr : [<c08c1898>]    psr: 80000013
> > [   58.871212] sp : c18fdc38  ip : c18fdc48  fp : c18fdc44
> > [   58.877505] r10: 00000000  r9 : c0867104  r8 : c18fdc5c
> > [   58.883823] r7 : 00000013  r6 : c31c8000  r5 : c3a50000  r4 : c379db80
> > [   58.891442] r3 : c2ab4000  r2 : 00000002  r1 : c379dbc0  r0 : c2ab4000
> > [   58.899037] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> > [   58.907297] Control: 30c5383d  Table: 03ac92c0  DAC: 55555555
> > [   58.914139] Process systemd-shutdow (pid: 1, stack limit = 0xff8113c1)
> > [   58.921774] Stack: (0xc18fdc38 to 0xc18fe000)
> > [   58.927285] dc20:                                                       c18fdc54 c18fdc48
> > [   58.936601] dc40: c08c1898 c08c91d8 c18fdc94 c18fdc58 c0866dac c08c1888 c31c819c c3527180
> > [   58.945921] dc60: c332d200 c1405048 c32f8800 c31c8000 00000000 bf191010 00000000 c32f8800
> > [   58.955289] dc80: c1095f3c c1aa6454 c18fdcac c18fdc98 c086715c c0866be8 c31c8000 00000000
> > [   58.964644] dca0: c18fdccc c18fdcb0 c0862c7c c0867128 c1a42e30 c31c8000 c14f7cf0 00000000
> > [   58.974018] dcc0: c18fdcdc c18fdcd0 c0862d40 c0862b68 c18fdcfc c18fdce0 c08613dc c0862d2c
> > [   58.983391] dce0: c31c8000 00000a68 c08ba6cc 00000000 c18fdd44 c18fdd00 c085c710 c086130c
> > [   58.992778] dd00: c0331394 c0332604 60000013 c18fdd74 c3656294 c1405048 c31c8000 c31c8000
> > [   59.002140] dd20: 00000000 c08ba6cc c160657c c155c018 c1095f3c c1aa6454 c18fdd5c c18fdd48
> > [   59.011521] dd40: c08ba6a8 c085c58c 00000000 00000000 c18fdd6c c18fdd60 c08ba6e4 c08ba670
> > [   59.020921] dd60: c18fdd9c c18fdd70 c085bc84 c08ba6d8 c18fdd8c c3656200 c3656394 c1405048
> > [   59.030334] dd80: c18fdda4 c32f8800 c32f8800 00000003 c18fddbc c18fdda0 c08bab7c c085bc20
> > [   59.039737] dda0: c32f8b80 c32f8800 00000000 c160657c c18fdddc c18fddc0 bf182554 c08bab4c
> > [   59.049164] ddc0: c1aa6400 c1a6e810 c1aa6410 c160657c c18fddf4 c18fdde0 bf1825a8 bf18252c
> > [   59.058602] dde0: c1aa6414 c1a6e810 c18fde04 c18fddf8 c0863dec bf182598 c18fde3c c18fde08
> > [   59.068057] de00: c085fd9c c0863dcc c18fde3c c1095f2c c024865c 00000000 00000000 620bef00
> > [   59.077487] de20: c140f510 fee1dead c18fc000 00000058 c18fde4c c18fde40 c0249c84 c085fc0c
> > [   59.086920] de40: c18fde64 c18fde50 c0249d74 c0249c4c 01234567 00000000 c18fdf94 c18fde68
> > [   59.096386] de60: c024a018 c0249d64 c18fded4 c31b0c00 00000024 c18fdf58 00000005 c0441cec
> > [   59.105852] de80: c18fdec4 c18fde90 c0441b30 c049852c 00000000 c18fdea0 c073ad04 00000024
> > [   59.115330] dea0: c31b0c00 c18fdf58 c18fded4 c31b0c00 00000005 00000000 c18fdf4c c18fdec8
> > [   59.124821] dec0: c0441cec c0425cb0 c18fded0 c18fded4 00000000 00000005 00000000 00000024
> > [   59.134317] dee0: c18fdeec 00000005 c0200074 bec45250 00000004 bec45f62 00000010 bec45264
> > [   59.143792] df00: 00000005 bec4531c 0000000a b6d10040 00000001 c0200e70 ffffe000 c1546a80
> > [   59.153282] df20: 00000000 c0467268 c18fdf4c c1405048 c31b0c00 bec4528c 00000000 00000000
> > [   59.162787] df40: c18fdf94 c18fdf50 c0441e6c c0441c50 00000000 00000000 00000000 00000000
> > [   59.172269] df60: c18fdf94 c1405048 c0331394 c1405048 bec4531c 00000000 00000000 00000000
> > [   59.181763] df80: 00000058 c0200204 c18fdfa4 c18fdf98 c024a16c c0249f10 00000000 c18fdfa8
> > [   59.191250] dfa0: c0200040 c024a160 00000000 00000000 fee1dead 28121969 01234567 620bef00
> > [   59.200735] dfc0: 00000000 00000000 00000000 00000058 00000fff bec45be8 00000000 00476b80
> > [   59.210245] dfe0: 00488e3c bec45b68 004734a8 b6e4ca38 60000010 fee1dead 00000000 00000000
> > [   59.219759] Backtrace:
> > [   59.223546] [<c08c91cc>] (mdiobus_free) from [<c08c1898>] (devm_mdiobus_free+0x1c/0x20)
> > [   59.232909] [<c08c187c>] (devm_mdiobus_free) from [<c0866dac>] (release_nodes+0x1d0/0x220)
> > [   59.242551] [<c0866bdc>] (release_nodes) from [<c086715c>] (devres_release_all+0x40/0x60)
> > [   59.252132]  r10:c1aa6454 r9:c1095f3c r8:c32f8800 r7:00000000 r6:bf191010 r5:00000000
> > [   59.261338]  r4:c31c8000
> > [   59.265239] [<c086711c>] (devres_release_all) from [<c0862c7c>] (device_release_driver_internal+0x120/0x1c4)
> > [   59.276479]  r5:00000000 r4:c31c8000
> > [   59.281440] [<c0862b5c>] (device_release_driver_internal) from [<c0862d40>] (device_release_driver+0x20/0x24)
> > [   59.292802]  r7:00000000 r6:c14f7cf0 r5:c31c8000 r4:c1a42e30
> > [   59.299900] [<c0862d20>] (device_release_driver) from [<c08613dc>] (bus_remove_device+0xdc/0x108)
> > [   59.310267] [<c0861300>] (bus_remove_device) from [<c085c710>] (device_del+0x190/0x428)
> > [   59.319748]  r7:00000000 r6:c08ba6cc r5:00000a68 r4:c31c8000
> > [   59.326896] [<c085c580>] (device_del) from [<c08ba6a8>] (spi_unregister_device+0x44/0x68)
> > [   59.336583]  r10:c1aa6454 r9:c1095f3c r8:c155c018 r7:c160657c r6:c08ba6cc r5:00000000
> > [   59.345924]  r4:c31c8000
> > [   59.349971] [<c08ba664>] (spi_unregister_device) from [<c08ba6e4>] (__unregister+0x18/0x20)
> > [   59.359870]  r5:00000000 r4:00000000
> > [   59.364972] [<c08ba6cc>] (__unregister) from [<c085bc84>] (device_for_each_child+0x70/0xb4)
> > [   59.374899] [<c085bc14>] (device_for_each_child) from [<c08bab7c>] (spi_unregister_controller+0x3c/0x128)
> > [   59.385979]  r6:00000003 r5:c32f8800 r4:c32f8800
> > [   59.392086] [<c08bab40>] (spi_unregister_controller) from [<bf182554>] (bcm2835_spi_remove+0x34/0x6c [spi_bcm2835])
> > [   59.404000]  r7:c160657c r6:00000000 r5:c32f8800 r4:c32f8b80
> > [   59.411084] [<bf182520>] (bcm2835_spi_remove [spi_bcm2835]) from [<bf1825a8>] (bcm2835_spi_shutdown+0x1c/0x38 [spi_bc)
> > [   59.423755]  r7:c160657c r6:c1aa6410 r5:c1a6e810 r4:c1aa6400
> > [   59.430847] [<bf18258c>] (bcm2835_spi_shutdown [spi_bcm2835]) from [<c0863dec>] (platform_drv_shutdown+0x2c/0x30)
> > [   59.442613]  r5:c1a6e810 r4:c1aa6414
> > [   59.447635] [<c0863dc0>] (platform_drv_shutdown) from [<c085fd9c>] (device_shutdown+0x19c/0x24c)
> > [   59.457932] [<c085fc00>] (device_shutdown) from [<c0249c84>] (kernel_restart_prepare+0x44/0x48)
> > [   59.468135]  r10:00000058 r9:c18fc000 r8:fee1dead r7:c140f510 r6:620bef00 r5:00000000
> > [   59.477470]  r4:00000000
> > [   59.481509] [<c0249c40>] (kernel_restart_prepare) from [<c0249d74>] (kernel_restart+0x1c/0x60)
> > [   59.491653] [<c0249d58>] (kernel_restart) from [<c024a018>] (__do_sys_reboot+0x114/0x1f8)
> > [   59.501359]  r5:00000000 r4:01234567
> > [   59.506447] [<c0249f04>] (__do_sys_reboot) from [<c024a16c>] (sys_reboot+0x18/0x1c)
> > [   59.515628]  r8:c0200204 r7:00000058 r6:00000000 r5:00000000 r4:00000000
> > [   59.523857] [<c024a154>] (sys_reboot) from [<c0200040>] (ret_fast_syscall+0x0/0x28)
> > [   59.533038] Exception stack(0xc18fdfa8 to 0xc18fdff0)
> > [   59.539607] dfa0:                   00000000 00000000 fee1dead 28121969 01234567 620bef00
> > [   59.549318] dfc0: 00000000 00000000 00000000 00000058 00000fff bec45be8 00000000 00476b80
> > [   59.559026] dfe0: 00488e3c bec45b68 004734a8 b6e4ca38
> > [   59.565596] Code: ebfe49f5 e89da800 ebed72a3 e89da800 (e7f001f2)
> > [   59.573246] ---[ end trace 7d800ce7b5664bb6 ]---
> > [   59.579413] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [   59.588634] Rebooting in 10 seconds..
> > 
> > The concerning source code line 651 is in my case:
> > 
> > void mdiobus_free(struct mii_bus *bus)
> > {
> > 	/* For compatibility with error handling in drivers. */
> > 	if (bus->state == MDIOBUS_ALLOCATED) {
> > 		kfree(bus);
> > 		return;
> > 	}
> > 
> > 651<	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
> > 	bus->state = MDIOBUS_RELEASED;
> > 
> > 	put_device(&bus->dev);
> > }
> > EXPORT_SYMBOL(mdiobus_free);
> > 
> > I tested with both versions of your patchset, with the same result. I also tested
> > with a RP 5.14 kernel (the latest RP kernel) but I did not see the original issue
> > (i.e. the system hang) here for some reason.
> > 
> > I then tried to get the net-next kernel running on my system but without success so far. So for
> > now the result with the RP 5.10 is all I can offer. I hope that helps a bit nevertheless.
> 
> Thank you Lino, this is a very valuable report. I will send a v3 soon (not sure if today).

Actually, no, I will not send a v3, because the fact that devres can now
call devm_mdiobus_free without devm_mdiobus_unregister has nothing to do
with this series and touches much more than DSA, it is an issue introduced by
commit ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()").

I will deal with it separately, the basic idea being that
devm_mdiobus_alloc + plain mdiobus_register is bonkers, and Bartosz
didn't care enough to fix the existing users of that pattern before he
just went ahead to make _devm_mdiobus_free stop calling mdiobus_unregister.

In fact, this patch really got me wondering at the time, was the rtl8366
driver so broken at the time?
https://patchwork.kernel.org/project/netdevbpf/patch/20210822193145.1312668-2-alvin@pqrs.dk/
No, it stems from the exact same patch from Bartosz, devres used to
unregister MDIO buses when freeing them, now it isn't.

So while I don't disagree with Bartosz' overall idea, the execution kinda sucks.
In your case, what happens is that your driver's ->shutdown method gets
called, and this (as discussed) means that your driver's ->remove method
will be a no-op (to avoid unregistering DSA structures twice). But since
SPI bus drivers which implement their own ->shutdown as ->remove are a
thing, the fact that you don't run anything on ->remove means you won't
unregister an MDIO bus structure that was allocated under devres. Even
worse, the ksz9477_spi driver does _not_ allocate any MDIO bus structure
under devres, but the DSA core does, it's that pesky ds->slave_mii_bus
for which DSA is too helpful for its own good. Unregistering that MDIO
bus happens only from the dsa_unregister_switch call path, not from
dsa_switch_shutdown. But even if your SPI device driver does a no-op on
->remove, it doesn't mean the device_del on it won't be called. And we
can't stop the devres callbacks from running, which inevitably means
that an MDIO bus structure which is still registered will get freed.

My personal conclusion is that either you go with devres all the way
(devm_mdiobus_alloc + devm_mdiobus_register) or with no devres all the
way (mdiobus_alloc + mdiobus_register), otherwise you've just signed up
to learning things you never really wanted to learn. The pattern of
devres for the mdiobus_alloc and then no devres for the mdiobus_register
is very old, which explains why devm_mdiobus_register was only added
very recently (between kernel v5.7 and v5.8). It used to be fine in the
sense that it worked, but now it's completely broken. Some poor soul
needs to audit that pattern across the whole kernel after Bartosz'
patch, and it looks like that somebody is me...