Message ID | 20250402150859.1365568-2-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net,1/2] net: phy: move phy_link_change() prior to mdio_bus_phy_may_suspend() | expand |
On Wed, Apr 02, 2025 at 06:08:59PM +0300, Vladimir Oltean wrote: > DSA has 2 kinds of drivers: > > 1. Those who call dsa_switch_suspend() and dsa_switch_resume() from > their device PM ops: qca8k-8xxx, bcm_sf2, microchip ksz > 2. Those who don't: all others. The above methods should be optional. > > For type 1, dsa_switch_suspend() calls dsa_user_suspend() -> phylink_stop(), > and dsa_switch_resume() calls dsa_user_resume() -> phylink_start(). > These seem good candidates for setting mac_managed_pm = true because > that is essentially its definition, but that does not seem to be the > biggest problem for now, and is not what this change focuses on. > > Talking strictly about the 2nd category of drivers here, I have noticed > that these also trigger the > > WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY && > phydev->state != PHY_UP); > > from mdio_bus_phy_resume(), because the PHY state machine is running. > > It's running as a result of a previous dsa_user_open() -> ... -> > phylink_start() -> phy_start(), and AFAICS, mdio_bus_phy_suspend() was > supposed to have called phy_stop_machine(), but it didn't. So this is > why the PHY is in state PHY_NOLINK by the time mdio_bus_phy_resume() > runs. > > mdio_bus_phy_suspend() did not call phy_stop_machine() because for > phylink, the phydev->adjust_link function pointer is NULL. This seems a > technicality introduced by commit fddd91016d16 ("phylib: fix PAL state > machine restart on resume"). That commit was written before phylink > existed, and was intended to avoid crashing with consumer drivers which > don't use the PHY state machine - phylink does. I think this is a historical bug (as I believe I previously mentioned, suspend/resume support wasn't tested with phylink as none of the platforms it was developed against had suspend/resume support.) phylink should be no differnet from a MAC that uses phylib and supplies an adjust_link function. The exception is when mac_managed_pm has been set. Reading commit fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") which introduced mac_managed_pm, this flag should be set whenever a MAC driver causes phy_start() to be called via its resume path. That means for a phylink using MAC driver calling either phylink_start() or phylink_resume() from its resume method. Setting that flag will have the effect that, in those cases, mdio_bus_phy_suspend() and mdio_bus_phy_resume() become no-ops. Now, if the MAC driver does not call either of those two phylink functions, then mac_managed_pm should not be set, and the mdio bus PHY suspend/resume should happen in full - because a phylink driver should be no different from a phylib driver that's supplied an adjust_link callback. So yes, I think the principle of your patch is correct, and I agree that this is needed (just coming to the same conclusion from a different direction.) > +static bool phy_uses_state_machine(struct phy_device *phydev) > +{ > + if (phydev->phy_link_change == phy_link_change) > + return phydev->attached_dev && phydev->adjust_link; > + > + return phydev->phy_link_change; I think this can be simplified to: return phydev->phy_link_change != phy_link_change || (phydev->attached_dev && phydev->adjust_link); since phydev->phy_link_change should never be NULL (the hook exists specifically for phylink's usage, and is either set to phy_link_change or phylink_phy_change.) If it were to be NULL, then anything which calls phy_link_(up|down) will oops the kernel - not just the state machine, but cable testing, loopback, and changing EEE settings.
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index bd1aa58720a5..b01123a24283 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -256,6 +256,32 @@ static void phy_link_change(struct phy_device *phydev, bool up) phydev->mii_ts->link_state(phydev->mii_ts, phydev); } +/** + * phy_uses_state_machine - test whether consumer driver uses PAL state machine + * @phydev: the target PHY device structure + * + * Ultimately, this aims to indirectly determine whether the PHY is attached + * to a consumer which uses the state machine by calling phy_start() and + * phy_stop(). + * + * When the PHY driver consumer uses phylib, it must have previously called + * phy_connect_direct() or one of its derivatives, so that phy_prepare_link() + * has set up a hook for monitoring state changes. + * + * When the PHY driver is used by the MAC driver consumer through phylink (the + * only other provider of a phy_link_change() method), using the PHY state + * machine is not optional. + * + * Return: true if consumer calls phy_start() and phy_stop(), false otherwise. + */ +static bool phy_uses_state_machine(struct phy_device *phydev) +{ + if (phydev->phy_link_change == phy_link_change) + return phydev->attached_dev && phydev->adjust_link; + + return phydev->phy_link_change; +} + static bool mdio_bus_phy_may_suspend(struct phy_device *phydev) { struct device_driver *drv = phydev->mdio.dev.driver; @@ -322,7 +348,7 @@ static __maybe_unused int mdio_bus_phy_suspend(struct device *dev) * may call phy routines that try to grab the same lock, and that may * lead to a deadlock. */ - if (phydev->attached_dev && phydev->adjust_link) + if (phy_uses_state_machine(phydev)) phy_stop_machine(phydev); if (!mdio_bus_phy_may_suspend(phydev)) @@ -376,7 +402,7 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev) } } - if (phydev->attached_dev && phydev->adjust_link) + if (phy_uses_state_machine(phydev)) phy_start_machine(phydev); return 0;
DSA has 2 kinds of drivers: 1. Those who call dsa_switch_suspend() and dsa_switch_resume() from their device PM ops: qca8k-8xxx, bcm_sf2, microchip ksz 2. Those who don't: all others. The above methods should be optional. For type 1, dsa_switch_suspend() calls dsa_user_suspend() -> phylink_stop(), and dsa_switch_resume() calls dsa_user_resume() -> phylink_start(). These seem good candidates for setting mac_managed_pm = true because that is essentially its definition, but that does not seem to be the biggest problem for now, and is not what this change focuses on. Talking strictly about the 2nd category of drivers here, I have noticed that these also trigger the WARN_ON(phydev->state != PHY_HALTED && phydev->state != PHY_READY && phydev->state != PHY_UP); from mdio_bus_phy_resume(), because the PHY state machine is running. It's running as a result of a previous dsa_user_open() -> ... -> phylink_start() -> phy_start(), and AFAICS, mdio_bus_phy_suspend() was supposed to have called phy_stop_machine(), but it didn't. So this is why the PHY is in state PHY_NOLINK by the time mdio_bus_phy_resume() runs. mdio_bus_phy_suspend() did not call phy_stop_machine() because for phylink, the phydev->adjust_link function pointer is NULL. This seems a technicality introduced by commit fddd91016d16 ("phylib: fix PAL state machine restart on resume"). That commit was written before phylink existed, and was intended to avoid crashing with consumer drivers which don't use the PHY state machine - phylink does. Make the conditions dependent on the PHY device having a phydev->phy_link_change() implementation equal to the default phy_link_change() provided by phylib. Otherwise, just check that the custom phydev->phy_link_change() has been provided and is non-NULL. Phylink provides phylink_phy_change(). Thus, we will stop the state machine even for phylink-controlled PHYs when using the MDIO bus PM ops. Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Reported-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- v1->v2: - code movement split out - rename phy_has_attached_dev() to phy_uses_state_machine(), provide kernel-doc v1 at: https://lore.kernel.org/netdev/20250225153156.3589072-1-vladimir.oltean@nxp.com/ drivers/net/phy/phy_device.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)