Message ID | 20240105085242.1471050-1-claudiu.beznea.uj@bp.renesas.com (mailing list archive) |
---|---|
State | Accepted |
Commit | e398822c4751017fe401f57409488f5948d12fb5 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: phy: micrel: populate .soft_reset for KSZ9131 | expand |
On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote: > The order of PHY-related operations in ravb_open() is as follows: > ravb_open() -> > ravb_phy_start() -> > ravb_phy_init() -> > of_phy_connect() -> > phy_connect_direct() -> > phy_attach_direct() -> > phy_init_hw() -> > phydev->drv->soft_reset() > phydev->drv->config_init() > phydev->drv->config_intr() > phy_resume() > kszphy_resume() > > The order of PHY-related operations in ravb_close is as follows: > ravb_close() -> > phy_stop() -> > phy_suspend() -> > kszphy_suspend() -> > genphy_suspend() > // set BMCR_PDOWN bit in MII_BMCR Andrew, This looks wrong to me - shouldn't we be resuming the PHY before attempting to configure it?
Hi Claudiu, On Fri, 5 Jan 2024 10:52:42 +0200 Claudiu <claudiu.beznea@tuxon.dev> wrote: > From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> > > The RZ/G3S SMARC Module has 2 KSZ9131 PHYs. In this setup, the KSZ9131 PHY > is used with the ravb Ethernet driver. It has been discovered that when > bringing the Ethernet interface down/up continuously, e.g., with the > following sh script: > > $ while :; do ifconfig eth0 down; ifconfig eth0 up; done > > the link speed and duplex are wrong after interrupting the bring down/up > operation even though the Ethernet interface is up. To recover from this > state the following configuration sequence is necessary (executed > manually): > > $ ifconfig eth0 down > $ ifconfig eth0 up > > The behavior has been identified also on the Microchip SAMA7G5-EK board > which runs the macb driver and uses the same PHY. > > The order of PHY-related operations in ravb_open() is as follows: > ravb_open() -> > ravb_phy_start() -> > ravb_phy_init() -> > of_phy_connect() -> > phy_connect_direct() -> > phy_attach_direct() -> > phy_init_hw() -> > phydev->drv->soft_reset() > phydev->drv->config_init() > phydev->drv->config_intr() > phy_resume() > kszphy_resume() > > The order of PHY-related operations in ravb_close is as follows: > ravb_close() -> > phy_stop() -> > phy_suspend() -> > kszphy_suspend() -> > genphy_suspend() > // set BMCR_PDOWN bit in MII_BMCR > > In genphy_suspend() setting the BMCR_PDWN bit in MII_BMCR switches the PHY > to Software Power-Down (SPD) mode (according to the KSZ9131 datasheet). > Thus, when opening the interface after it has been previously closed (via > ravb_close()), the phydev->drv->config_init() and > phydev->drv->config_intr() reach the KSZ9131 PHY driver via the > ksz9131_config_init() and kszphy_config_intr() functions. > > KSZ9131 specifies that the MII management interface remains operational > during SPD (Software Power-Down), but (according to manual): > - Only access to the standard registers (0 through 31) is supported. > - Access to MMD address spaces other than MMD address space 1 is possible > if the spd_clock_gate_override bit is set. > - Access to MMD address space 1 is not possible. > > The spd_clock_gate_override bit is not used in the KSZ9131 driver. > > ksz9131_config_init() configures RGMII delay, pad skews and LEDs by > accessesing MMD registers other than those in address space 1. > > The datasheet for the KSZ9131 does not specify what happens if registers > from an unsupported address space are accessed while the PHY is in SPD. > > To fix the issue the .soft_reset method has been instantiated for KSZ9131, > too. This resets the PHY to the default state before doing any > configurations to it, thus switching it out of SPD. > > Fixes: bff5b4b37372 ("net: phy: micrel: add Microchip KSZ9131 initial driver") > Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> > --- > drivers/net/phy/micrel.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c > index 08e3915001c3..f31f03dd87dd 100644 > --- a/drivers/net/phy/micrel.c > +++ b/drivers/net/phy/micrel.c > @@ -4842,6 +4842,7 @@ static struct phy_driver ksphy_driver[] = { > .flags = PHY_POLL_CABLE_TEST, > .driver_data = &ksz9131_type, > .probe = kszphy_probe, > + .soft_reset = genphy_soft_reset, > .config_init = ksz9131_config_init, > .config_intr = kszphy_config_intr, > .config_aneg = ksz9131_config_aneg, This looks good to me. Thanks for the detailed analysis, Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Maxime
On Fri, Jan 05, 2024 at 09:43:22AM +0000, Russell King (Oracle) wrote: > On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote: > > The order of PHY-related operations in ravb_open() is as follows: > > ravb_open() -> > > ravb_phy_start() -> > > ravb_phy_init() -> > > of_phy_connect() -> > > phy_connect_direct() -> > > phy_attach_direct() -> > > phy_init_hw() -> > > phydev->drv->soft_reset() > > phydev->drv->config_init() > > phydev->drv->config_intr() > > phy_resume() > > kszphy_resume() > > > > The order of PHY-related operations in ravb_close is as follows: > > ravb_close() -> > > phy_stop() -> > > phy_suspend() -> > > kszphy_suspend() -> > > genphy_suspend() > > // set BMCR_PDOWN bit in MII_BMCR > > Andrew, > > This looks wrong to me - shouldn't we be resuming the PHY before > attempting to configure it? Hummm. The opposite of phy_stop() is phy_start(). So it would be the logical order to perform the resume as the first action of phy_start(), not phy_attach_direct(). In phy_connect_direct(), we don't need the PHY to be operational yet. That happens with phy_start(). The standard says: 22.2.4.1.5 Power down The PHY may be placed in a low-power consumption state by setting bit 0.11 to a logic one. Clearing bit 0.11 to zero allows normal operation. The specific behavior of a PHY in the power-down state is implementation specific. While in the power-down state, the PHY shall respond to management transactions. So i would say this PHY is broken, its not responding to all management transactions. So in that respect, Claudiu fix is correct. But i also somewhat agree with you, this looks wrong, but in a different way to how you see it. However, moving the phy_resume() to phy_start() seems a bit risky. So i'm not sure we should actually do that. Andrew
On Fri, 5 Jan 2024 15:36:29 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > On Fri, Jan 05, 2024 at 09:43:22AM +0000, Russell King (Oracle) wrote: > > On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote: > > > The order of PHY-related operations in ravb_open() is as follows: > > > ravb_open() -> > > > ravb_phy_start() -> > > > ravb_phy_init() -> > > > of_phy_connect() -> > > > phy_connect_direct() -> > > > phy_attach_direct() -> > > > phy_init_hw() -> > > > phydev->drv->soft_reset() > > > phydev->drv->config_init() > > > phydev->drv->config_intr() > > > phy_resume() > > > kszphy_resume() > > > > > > The order of PHY-related operations in ravb_close is as follows: > > > ravb_close() -> > > > phy_stop() -> > > > phy_suspend() -> > > > kszphy_suspend() -> > > > genphy_suspend() > > > // set BMCR_PDOWN bit in MII_BMCR > > > > Andrew, > > > > This looks wrong to me - shouldn't we be resuming the PHY before > > attempting to configure it? > > Hummm. The opposite of phy_stop() is phy_start(). So it would be the > logical order to perform the resume as the first action of > phy_start(), not phy_attach_direct(). > > In phy_connect_direct(), we don't need the PHY to be operational > yet. That happens with phy_start(). > > The standard says: > > 22.2.4.1.5 Power down > > The PHY may be placed in a low-power consumption state by setting > bit 0.11 to a logic one. Clearing bit 0.11 to zero allows normal > operation. The specific behavior of a PHY in the power-down state is > implementation specific. While in the power-down state, the PHY > shall respond to management transactions. > > So i would say this PHY is broken, its not responding to all > management transactions. So in that respect, Claudiu fix is correct. > > But i also somewhat agree with you, this looks wrong, but in a > different way to how you see it. However, moving the phy_resume() to > phy_start() seems a bit risky. So i'm not sure we should actually do > that. Looking at other PHYs similar to it like the 9031, the .soft_reset() was added to fix some similar issues : Issue : https://lore.kernel.org/netdev/a63ca542-db96-40ed-201d-59c609f565ce@gmail.com/ Fix : https://lore.kernel.org/netdev/6d3b1dce-7633-51a1-0556-97cd03304c2c@gmail.com/ We couldn't get a proper explanation back then. Could it be that they suffer from the same problem, but that it was more clearly documented for the 9131 ? Maxime
Hi, Andrew, Russell, On 05.01.2024 16:36, Andrew Lunn wrote: > On Fri, Jan 05, 2024 at 09:43:22AM +0000, Russell King (Oracle) wrote: >> On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote: >>> The order of PHY-related operations in ravb_open() is as follows: >>> ravb_open() -> >>> ravb_phy_start() -> >>> ravb_phy_init() -> >>> of_phy_connect() -> >>> phy_connect_direct() -> >>> phy_attach_direct() -> >>> phy_init_hw() -> >>> phydev->drv->soft_reset() >>> phydev->drv->config_init() >>> phydev->drv->config_intr() >>> phy_resume() >>> kszphy_resume() >>> >>> The order of PHY-related operations in ravb_close is as follows: >>> ravb_close() -> >>> phy_stop() -> >>> phy_suspend() -> >>> kszphy_suspend() -> >>> genphy_suspend() >>> // set BMCR_PDOWN bit in MII_BMCR >> >> Andrew, >> >> This looks wrong to me - shouldn't we be resuming the PHY before >> attempting to configure it? > > Hummm. The opposite of phy_stop() is phy_start(). So it would be the > logical order to perform the resume as the first action of > phy_start(), not phy_attach_direct(). > > In phy_connect_direct(), we don't need the PHY to be operational > yet. That happens with phy_start(). > > The standard says: > > 22.2.4.1.5 Power down > > The PHY may be placed in a low-power consumption state by setting > bit 0.11 to a logic one. Clearing bit 0.11 to zero allows normal > operation. The specific behavior of a PHY in the power-down state is > implementation specific. While in the power-down state, the PHY > shall respond to management transactions. > > So i would say this PHY is broken, its not responding to all > management transactions. So in that respect, Claudiu fix is correct. > > But i also somewhat agree with you, this looks wrong, but in a > different way to how you see it. However, moving the phy_resume() to > phy_start() seems a bit risky. So i'm not sure we should actually do > that. It's not clear to me if you both agree with this fix. Could you please let me know? Thank you, Claudiu Beznea > > Andrew
On Wed, Jan 10, 2024 at 03:20:19PM +0200, claudiu beznea wrote: > Hi, Andrew, Russell, > > On 05.01.2024 16:36, Andrew Lunn wrote: > > On Fri, Jan 05, 2024 at 09:43:22AM +0000, Russell King (Oracle) wrote: > >> On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote: > >>> The order of PHY-related operations in ravb_open() is as follows: > >>> ravb_open() -> > >>> ravb_phy_start() -> > >>> ravb_phy_init() -> > >>> of_phy_connect() -> > >>> phy_connect_direct() -> > >>> phy_attach_direct() -> > >>> phy_init_hw() -> > >>> phydev->drv->soft_reset() > >>> phydev->drv->config_init() > >>> phydev->drv->config_intr() > >>> phy_resume() > >>> kszphy_resume() > >>> > >>> The order of PHY-related operations in ravb_close is as follows: > >>> ravb_close() -> > >>> phy_stop() -> > >>> phy_suspend() -> > >>> kszphy_suspend() -> > >>> genphy_suspend() > >>> // set BMCR_PDOWN bit in MII_BMCR > >> > >> Andrew, > >> > >> This looks wrong to me - shouldn't we be resuming the PHY before > >> attempting to configure it? > > > > Hummm. The opposite of phy_stop() is phy_start(). So it would be the > > logical order to perform the resume as the first action of > > phy_start(), not phy_attach_direct(). > > > > In phy_connect_direct(), we don't need the PHY to be operational > > yet. That happens with phy_start(). > > > > The standard says: > > > > 22.2.4.1.5 Power down > > > > The PHY may be placed in a low-power consumption state by setting > > bit 0.11 to a logic one. Clearing bit 0.11 to zero allows normal > > operation. The specific behavior of a PHY in the power-down state is > > implementation specific. While in the power-down state, the PHY > > shall respond to management transactions. > > > > So i would say this PHY is broken, its not responding to all > > management transactions. So in that respect, Claudiu fix is correct. > > > > But i also somewhat agree with you, this looks wrong, but in a > > different way to how you see it. However, moving the phy_resume() to > > phy_start() seems a bit risky. So i'm not sure we should actually do > > that. > > It's not clear to me if you both agree with this fix. Could you please let > me know? Hi Claudiu I think this is a valid workaround for the broken hardware. Reviewed-by: Andrew Lunn <andrew@lunn.ch> There might be further discussion about if suspend and resume are being done at the correct time, but i think that is orthogonal. Andrew
Hello: This patch was applied to netdev/net.git (main) by David S. Miller <davem@davemloft.net>: On Fri, 5 Jan 2024 10:52:42 +0200 you wrote: > From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> > > The RZ/G3S SMARC Module has 2 KSZ9131 PHYs. In this setup, the KSZ9131 PHY > is used with the ravb Ethernet driver. It has been discovered that when > bringing the Ethernet interface down/up continuously, e.g., with the > following sh script: > > [...] Here is the summary with links: - [net] net: phy: micrel: populate .soft_reset for KSZ9131 https://git.kernel.org/netdev/net/c/e398822c4751 You are awesome, thank you!
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 08e3915001c3..f31f03dd87dd 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -4842,6 +4842,7 @@ static struct phy_driver ksphy_driver[] = { .flags = PHY_POLL_CABLE_TEST, .driver_data = &ksz9131_type, .probe = kszphy_probe, + .soft_reset = genphy_soft_reset, .config_init = ksz9131_config_init, .config_intr = kszphy_config_intr, .config_aneg = ksz9131_config_aneg,