diff mbox series

Broken networking on iMX8QXP after suspend after upgrading from 5.10 to 6.1

Message ID 1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se (mailing list archive)
State RFC
Headers show
Series Broken networking on iMX8QXP after suspend after upgrading from 5.10 to 6.1 | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

John Ernberg Feb. 8, 2024, 1:16 p.m. UTC
Hi,

We just upgraded vendor kernel from 5.10 to 6.1 and ended up with broken
networking on our board  unless we bring the PHY up before the first
suspend of the system.

The link is brought up via external signal, so it is not guaranteed to
have been UP before the first system suspend.

We'd like to run the mainline kernel but we're not in a position to do so
yet. But we hope we can get some advice on this problem anyway.

We have a permanently powered Microchip LAN8700R (microchip_t1.c) connected
to an iMX8QXP (fec), to be able to wake the system via network if the link
is up.

This setup was working fine in 5.10.

The offending commit as far as we could bisect it is:
557d5dc83f68 ("net: fec: use mac-managed PHY PM")
And somewhat:
fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages 
PHY PM")

If the interface has not been brought UP before the system suspends we can
see this in dmesg:

     fec 5b040000.ethernet eth0: MDIO read timeout
     Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: 
dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
     Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: 
error -110

In this state it is impossible to bring the link up before removing all
power from the board and then plugging it in again, since the PHY is
permanently powered.

My understanding here is that since the link has never been UP,
fec_enet_open() has never executed, therefor mac_managed_pm is not true.
This in turn makes us take the normal PM flow.
Likewise in fec_resume() if the interface is not running, the MAC isn't
enabled because on the iMX8QXP the FEC is powered down in the suspend path
but never re-initialized and enabled in the resume path, so the MAC is
powered back up, but still disabled.

Adding the following seems to fix the issue, but I personally don't like
this, because we just allow the non-mac_managed_pm flow to run longer by
enabling the MAC again rather than letting the MAC do the PM as configured
in fec_enet_open().
What would be the correct thing to do here?

--------- >8 ------
  drivers/net/ethernet/freescale/fec_main.c | 2 ++
  1 file changed, 2 insertions(+)


--

Thanks! // John Ernberg

Comments

Wei Fang Feb. 19, 2024, 2:25 a.m. UTC | #1
Hi John,

> -----Original Message-----
> From: John Ernberg <john.ernberg@actia.se>
> Sent: 2024年2月8日 21:17
> To: netdev@vger.kernel.org
> Cc: Jonas Blixt <jonas.blixt@actia.se>; Wei Fang <wei.fang@nxp.com>;
> Shenwei Wang <shenwei.wang@nxp.com>; Clark Wang
> <xiaoning.wang@nxp.com>; Andrew Lunn <andrew@lunn.ch>; Heiner
> Kallweit <hkallweit1@gmail.com>; Russell King <linux@armlinux.org.uk>
> Subject: Broken networking on iMX8QXP after suspend after upgrading from
> 5.10 to 6.1
> 
> Hi,
> 
> We just upgraded vendor kernel from 5.10 to 6.1 and ended up with broken
> networking on our board  unless we bring the PHY up before the first
> suspend of the system.
> 
> The link is brought up via external signal, so it is not guaranteed to have been
> UP before the first system suspend.
> 
> We'd like to run the mainline kernel but we're not in a position to do so yet.
> But we hope we can get some advice on this problem anyway.
> 
> We have a permanently powered Microchip LAN8700R (microchip_t1.c)
> connected to an iMX8QXP (fec), to be able to wake the system via network if
> the link is up.
> 
> This setup was working fine in 5.10.
> 
> The offending commit as far as we could bisect it is:
> 557d5dc83f68 ("net: fec: use mac-managed PHY PM") And somewhat:
> fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages
> PHY PM")
> 
> If the interface has not been brought UP before the system suspends we can
> see this in dmesg:
> 
>      fec 5b040000.ethernet eth0: MDIO read timeout
>      Microchip LAN87xx T1 5b040000.ethernet-1:04: PM:
> dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
>      Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume:
> error -110
> 
> In this state it is impossible to bring the link up before removing all power
> from the board and then plugging it in again, since the PHY is permanently
> powered.
> 
> My understanding here is that since the link has never been UP,
> fec_enet_open() has never executed, therefor mac_managed_pm is not true.
> This in turn makes us take the normal PM flow.
> Likewise in fec_resume() if the interface is not running, the MAC isn't enabled
> because on the iMX8QXP the FEC is powered down in the suspend path but
> never re-initialized and enabled in the resume path, so the MAC is powered
> back up, but still disabled.
> 
> Adding the following seems to fix the issue, but I personally don't like this,
> because we just allow the non-mac_managed_pm flow to run longer by
> enabling the MAC again rather than letting the MAC do the PM as configured
> in fec_enet_open().
> What would be the correct thing to do here?

Sorry for the delayed response.
Have you tried setting mac_management_pm to true after mdiobus registration?
Just like below:

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 47693ba58ea4..449f56eea2e7 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2401,8 +2401,6 @@ static int fec_enet_mii_probe(struct net_device *ndev)
        fep->link = 0;
        fep->full_duplex = 0;

-       phy_dev->mac_managed_pm = true;
-
        phy_attached_info(phy_dev);

        return 0;
@@ -2415,10 +2413,12 @@ static int fec_enet_mii_init(struct platform_device *pdev)
        struct net_device *ndev = platform_get_drvdata(pdev);
        struct fec_enet_private *fep = netdev_priv(ndev);
        bool suppress_preamble = false;
+       struct phy_device *phydev;
        struct device_node *node;
        int err = -ENXIO;
        u32 mii_speed, holdtime;
        u32 bus_freq;
+       int addr;

        /*
         * The i.MX28 dual fec interfaces are not equal.
@@ -2533,6 +2533,13 @@ static int fec_enet_mii_init(struct platform_device *pdev)
                goto err_out_free_mdiobus;
        of_node_put(node);

+       /* find all the PHY devices on the bus and set mac_managed_pm to true*/
+       for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
+               phydev = mdiobus_get_phy(fep->mii_bus, addr);
+               if (phydev)
+                       phydev->mac_managed_pm = true;
+       }
+
        mii_cnt++;

> 
> --------- >8 ------
>   drivers/net/ethernet/freescale/fec_main.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index a9b61fcf9b5c..6be5f3883582 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -4690,6 +4690,8 @@ static int __maybe_unused fec_resume(struct
> device
> *dev)
>                  ret = fec_restore_mii_bus(ndev);
>                  if (ret < 0)
>                          return ret;
> +       } else {
> +               fec_restart(ndev);
>          }
>          rtnl_unlock();
> 
> --
> 
> Thanks! // John Ernberg
John Ernberg Feb. 28, 2024, 7:59 a.m. UTC | #2
Hi Wei,

On 2/19/24 03:25, Wei Fang wrote:
> Hi John,
> 
>> -----Original Message-----
>> From: John Ernberg <john.ernberg@actia.se>
>> Sent: 2024年2月8日 21:17
>> To: netdev@vger.kernel.org
>> Cc: Jonas Blixt <jonas.blixt@actia.se>; Wei Fang <wei.fang@nxp.com>;
>> Shenwei Wang <shenwei.wang@nxp.com>; Clark Wang
>> <xiaoning.wang@nxp.com>; Andrew Lunn <andrew@lunn.ch>; Heiner
>> Kallweit <hkallweit1@gmail.com>; Russell King <linux@armlinux.org.uk>
>> Subject: Broken networking on iMX8QXP after suspend after upgrading from
>> 5.10 to 6.1
>>
>> Hi,
>>
>> We just upgraded vendor kernel from 5.10 to 6.1 and ended up with broken
>> networking on our board  unless we bring the PHY up before the first
>> suspend of the system.
>>
>> The link is brought up via external signal, so it is not guaranteed to have been
>> UP before the first system suspend.
>>
>> We'd like to run the mainline kernel but we're not in a position to do so yet.
>> But we hope we can get some advice on this problem anyway.
>>
>> We have a permanently powered Microchip LAN8700R (microchip_t1.c)
>> connected to an iMX8QXP (fec), to be able to wake the system via network if
>> the link is up.
>>
>> This setup was working fine in 5.10.
>>
>> The offending commit as far as we could bisect it is:
>> 557d5dc83f68 ("net: fec: use mac-managed PHY PM") And somewhat:
>> fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages
>> PHY PM")
>>
>> If the interface has not been brought UP before the system suspends we can
>> see this in dmesg:
>>
>>       fec 5b040000.ethernet eth0: MDIO read timeout
>>       Microchip LAN87xx T1 5b040000.ethernet-1:04: PM:
>> dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
>>       Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume:
>> error -110
>>
>> In this state it is impossible to bring the link up before removing all power
>> from the board and then plugging it in again, since the PHY is permanently
>> powered.
>>
>> My understanding here is that since the link has never been UP,
>> fec_enet_open() has never executed, therefor mac_managed_pm is not true.
>> This in turn makes us take the normal PM flow.
>> Likewise in fec_resume() if the interface is not running, the MAC isn't enabled
>> because on the iMX8QXP the FEC is powered down in the suspend path but
>> never re-initialized and enabled in the resume path, so the MAC is powered
>> back up, but still disabled.
>>
>> Adding the following seems to fix the issue, but I personally don't like this,
>> because we just allow the non-mac_managed_pm flow to run longer by
>> enabling the MAC again rather than letting the MAC do the PM as configured
>> in fec_enet_open().
>> What would be the correct thing to do here?
> 
> Sorry for the delayed response.

I must equally apologize for the delayed response.

Your patch helped greatly finding the actual root cause of the problem
(which pre-dates 5.10):

f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO with 
polled IO")

How 5.10 worked for us is a mystery, because a suspend-resume cycle before
link up writes to MII_DATA register before fec_restart() is called, which
restores the MII_SPEED register, triggering the MII_EVENT quirk.

> Have you tried setting mac_management_pm to true after mdiobus registration?
> Just like below:

I have tested your patch and it does fix my issue, with your patch I also
realized a side-effect of mac_managed_pm in the FEC driver. The PHY will
never suspend due to the current implementation of fec_suspend() and
fec_resume().

phy_suspend() and phy_resume() are never called from FEC code.

May I pick up your patch with a signed-off from you? I would like to make
it a small series adding also suspend/resume of the PHY.

If you want to send it yourself instead, please pick up these tags:
Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
Closes: 
https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/
Reported-by: John Ernberg <john.ernberg@actia.se>
Tested-by: John Ernberg <john.ernberg@actia.se>

And then I send a separate patch with yours as a dependency.

Thanks! // John Ernberg
Wei Fang Feb. 28, 2024, 8:34 a.m. UTC | #3
> -----Original Message-----
> From: John Ernberg <john.ernberg@actia.se>
> Sent: 2024年2月28日 15:59
> To: Wei Fang <wei.fang@nxp.com>; netdev@vger.kernel.org
> Cc: Jonas Blixt <jonas.blixt@actia.se>; Shenwei Wang
> <shenwei.wang@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>; Andrew
> Lunn <andrew@lunn.ch>; Heiner Kallweit <hkallweit1@gmail.com>; Russell
> King <linux@armlinux.org.uk>
> Subject: Re: Broken networking on iMX8QXP after suspend after upgrading
> from 5.10 to 6.1
>
> >
> > Sorry for the delayed response.
>
> I must equally apologize for the delayed response.
>
> Your patch helped greatly finding the actual root cause of the problem
> (which pre-dates 5.10):
>
> f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO with polled
> IO")
>
> How 5.10 worked for us is a mystery, because a suspend-resume cycle before
> link up writes to MII_DATA register before fec_restart() is called, which
> restores the MII_SPEED register, triggering the MII_EVENT quirk.
>
> > Have you tried setting mac_management_pm to true after mdiobus
> registration?
> > Just like below:
>
> I have tested your patch and it does fix my issue, with your patch I also
> realized a side-effect of mac_managed_pm in the FEC driver. The PHY will
> never suspend due to the current implementation of fec_suspend() and
> fec_resume().
>
> phy_suspend() and phy_resume() are never called from FEC code.
>
> May I pick up your patch with a signed-off from you? I would like to make it a
> small series adding also suspend/resume of the PHY.
>

Yes, you can pick up my patch as long as you want. :)

> If you want to send it yourself instead, please pick up these tags:
> Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
> Closes:
> https://lore.k/
> ernel.org%2Fnetdev%2F1f45bdbe-eab1-4e59-8f24-add177590d27%40actia.s
> e%2F&data=05%7C02%7Cwei.fang%40nxp.com%7Cb0ae4b334bae44f50d34
> 08dc3833314f%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
> 8447039719541704%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&
> sdata=b2Q4eYhXuNHwRfG7kIIteIgUVtVxxo%2BAwqyyt8OdPiU%3D&reserved
> =0
> Reported-by: John Ernberg <john.ernberg@actia.se>
> Tested-by: John Ernberg <john.ernberg@actia.se>
>
> And then I send a separate patch with yours as a dependency.
>
> Thanks! // John Ernberg
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index a9b61fcf9b5c..6be5f3883582 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -4690,6 +4690,8 @@  static int __maybe_unused fec_resume(struct device 
*dev)
                 ret = fec_restore_mii_bus(ndev);
                 if (ret < 0)
                         return ret;
+       } else {
+               fec_restart(ndev);
         }
         rtnl_unlock();