diff mbox series

[RFC] net: phy: micrel: reset KSZ9x31 when resuming

Message ID 20240109205223.40219-1-wsa+renesas@sang-engineering.com (mailing list archive)
State Under Review
Delegated to: Geert Uytterhoeven
Headers show
Series [RFC] net: phy: micrel: reset KSZ9x31 when resuming | expand

Commit Message

Wolfram Sang Jan. 9, 2024, 8:52 p.m. UTC
On a Renesas Ebisu board, the KSZ9031 PHY is stalled after resuming if
the interface has not been brought up before suspending. If it had been
brought up before, phylib ensures that reset is asserted before
suspending. But if it had never been brought up, there is no instance
which could ensure that reset is asserted. And upon resume, the PHY is
in an unknown state without reset being asserted. To bring it back to a
known state, simply reset it when it is about to be resumed.

This likely also helps another issue [1] where a KSZ9131 can be powered
using regulators. After switching power on again in resume, a reset is
also needed.

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20211214121638.138784-4-philippe.schenker@toradex.com/

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---

This is a different solution to a problem I already tried to solve
here[2]. Back then, I added code to the MAC, but I now believe it should
be solved on PHY level. We never saw the problem with other PHYs.
Looking at [1], it seems that KSZ9x31 is also sensitive to being
powered down without reset being asserted. I know it is not a perfect
proof, but I guess these assumptions are all we have.

Philippe, Francesco: do you still have access to machines with this
issue? Could you kindly test if so?

Patch is based on 6.7. Looking forward for comments if this is the
correct layer to tackle the problem. Thanks!


[2] https://lore.kernel.org/all/20230321103357.18940-1-wsa+renesas@sang-engineering.com/

 drivers/net/phy/micrel.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Andrew Lunn Jan. 9, 2024, 9:04 p.m. UTC | #1
> +static int ksz9x31_resume(struct phy_device *phydev)
> +{
> +	phy_device_reset(phydev, 1);
> +	phy_device_reset(phydev, 0);
> +
> +	return kszphy_resume(phydev);
> +}
> +
>  static int kszphy_probe(struct phy_device *phydev)
>  {
>  	const struct kszphy_type *type = phydev->drv->driver_data;
> @@ -4778,7 +4786,7 @@ static struct phy_driver ksphy_driver[] = {
>  	.get_strings	= kszphy_get_strings,
>  	.get_stats	= kszphy_get_stats,
>  	.suspend	= kszphy_suspend,
> -	.resume		= kszphy_resume,
> +	.resume		= ksz9x31_resume,

Humm, i'm not so sure about this.

phy_resume() is called by mdio_bus_phy_resume(). That first does a
call to phy_init_hw(), which will perform a soft reset on the PHY,
call the drivers config_init() callback, and the config_intr()
callback. Then it calls phy_resume().

Does phy_resume() hitting it with a reset clear out the configuration
done by config_init() and the interrupt configuration performed by
config_intr()?

	Andrew
Francesco Dolcini Jan. 9, 2024, 11:28 p.m. UTC | #2
- Philippe, email address is no longer valid.

On Tue, Jan 09, 2024 at 09:52:22PM +0100, Wolfram Sang wrote:
> On a Renesas Ebisu board, the KSZ9031 PHY is stalled after resuming if
> the interface has not been brought up before suspending. If it had been
> brought up before, phylib ensures that reset is asserted before
> suspending. But if it had never been brought up, there is no instance
> which could ensure that reset is asserted. And upon resume, the PHY is
> in an unknown state without reset being asserted. To bring it back to a
> known state, simply reset it when it is about to be resumed.
> 
> This likely also helps another issue [1] where a KSZ9131 can be powered
> using regulators. After switching power on again in resume, a reset is
> also needed.
> 
> [1] https://patchwork.kernel.org/project/netdevbpf/patch/20211214121638.138784-4-philippe.schenker@toradex.com/
> 
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> ---
> 
> This is a different solution to a problem I already tried to solve
> here[2]. Back then, I added code to the MAC, but I now believe it should
> be solved on PHY level. We never saw the problem with other PHYs.
> Looking at [1], it seems that KSZ9x31 is also sensitive to being
> powered down without reset being asserted. I know it is not a perfect
> proof, but I guess these assumptions are all we have.
> 
> Philippe, Francesco: do you still have access to machines with this
> issue? Could you kindly test if so?

I have access, however
 - Philippe is long gone from Toradex and he was the one looking into
   this topic
 - we did solve the issue in a different way, e.g. we no longer
   power-off the phy in suspend

Therefore is not straightforward to provide valuable feedback to you
now.


> 
> Patch is based on 6.7. Looking forward for comments if this is the
> correct layer to tackle the problem. Thanks!
> 
> 
> [2] https://lore.kernel.org/all/20230321103357.18940-1-wsa+renesas@sang-engineering.com/
> 
>  drivers/net/phy/micrel.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> index 08e3915001c3..c38d7982c06c 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -1984,6 +1984,14 @@ static int kszphy_resume(struct phy_device *phydev)
>  	return 0;
>  }
>  
> +static int ksz9x31_resume(struct phy_device *phydev)
> +{
> +	phy_device_reset(phydev, 1);
> +	phy_device_reset(phydev, 0);

Is something like that fine?
Don't we need to reconfigure the ethernet phy completely on resume
if we do reset it? kszphy_config_reset() is taking care of something,
but I think that the phy being reset on resume is not handled
correctly.

Francesco
Wolfram Sang Jan. 17, 2024, 1:33 p.m. UTC | #3
Hi Francesco,

> - Philippe, email address is no longer valid.

OK, thanks for the heads up.

> Therefore is not straightforward to provide valuable feedback to you
> now.

Thanks for answering anyhow.

> > +static int ksz9x31_resume(struct phy_device *phydev)
> > +{
> > +	phy_device_reset(phydev, 1);
> > +	phy_device_reset(phydev, 0);
> 
> Is something like that fine?
> Don't we need to reconfigure the ethernet phy completely on resume
> if we do reset it? kszphy_config_reset() is taking care of something,
> but I think that the phy being reset on resume is not handled
> correctly.

If the interface is up before suspending, then suspend will assert the
reset-line. If the interface is disabled before suspending, then close
will assert the reset line. Deassertion will happen on resume/open.

So, unless I am overlooking something, my code does not really add
something new. It only makes sure that the reset line always gets
asserted before deasserting. Because in the case that the interface has
never been up before, there is no instance which could assert reset. Or,
at least, I could not find one.

Makes sense? Tests work fine here, at least.

All the best,

   Wolfram
Wolfram Sang Jan. 17, 2024, 1:59 p.m. UTC | #4
> Does phy_resume() hitting it with a reset clear out the configuration
> done by config_init() and the interrupt configuration performed by
> config_intr()?

Hmm, I should have answered your mail first. Instrumentation says you
are correct. Back to the drawing board :/
Francesco Dolcini Feb. 6, 2024, 5:26 p.m. UTC | #5
On Wed, Jan 17, 2024 at 02:33:53PM +0100, Wolfram Sang wrote:
> > > +static int ksz9x31_resume(struct phy_device *phydev)
> > > +{
> > > +	phy_device_reset(phydev, 1);
> > > +	phy_device_reset(phydev, 0);
> > 
> > Is something like that fine?
> > Don't we need to reconfigure the ethernet phy completely on resume
> > if we do reset it? kszphy_config_reset() is taking care of something,
> > but I think that the phy being reset on resume is not handled
> > correctly.
> 
> If the interface is up before suspending, then suspend will assert the
> reset-line. If the interface is disabled before suspending, then close
> will assert the reset line. Deassertion will happen on resume/open.
> 
> So, unless I am overlooking something, my code does not really add
> something new. It only makes sure that the reset line always gets
> asserted before deasserting. Because in the case that the interface has
> never been up before, there is no instance which could assert reset. Or,
> at least, I could not find one.
> 
> Makes sense? Tests work fine here, at least.

What I do not know, is what happen to any configuration that was done to
the phy before.

What if you have disabled gigabit ethernet from auto negotiation before
suspend, it will be enabled again after the phy get out of reset.

Is the ethernet phy subsystem taking care of this? Ensuring that this
configuration is restored into the phy?

I was starting to debug something around this a few days ago, with the
difference that in that case the reset was asserted/de-asserted by the
hardware and the end results was not really the expected one ...

Francesco
Andrew Lunn Feb. 6, 2024, 5:57 p.m. UTC | #6
On Tue, Feb 06, 2024 at 06:26:45PM +0100, Francesco Dolcini wrote:
> On Wed, Jan 17, 2024 at 02:33:53PM +0100, Wolfram Sang wrote:
> > > > +static int ksz9x31_resume(struct phy_device *phydev)
> > > > +{
> > > > +	phy_device_reset(phydev, 1);
> > > > +	phy_device_reset(phydev, 0);
> > > 
> > > Is something like that fine?
> > > Don't we need to reconfigure the ethernet phy completely on resume
> > > if we do reset it? kszphy_config_reset() is taking care of something,
> > > but I think that the phy being reset on resume is not handled
> > > correctly.
> > 
> > If the interface is up before suspending, then suspend will assert the
> > reset-line. If the interface is disabled before suspending, then close
> > will assert the reset line. Deassertion will happen on resume/open.
> > 
> > So, unless I am overlooking something, my code does not really add
> > something new. It only makes sure that the reset line always gets
> > asserted before deasserting. Because in the case that the interface has
> > never been up before, there is no instance which could assert reset. Or,
> > at least, I could not find one.
> > 
> > Makes sense? Tests work fine here, at least.
> 
> What I do not know, is what happen to any configuration that was done to
> the phy before.

I'm assuming here WoL was not enabled, so the PHY did actually
suspend.

mdio_bus_phy_suspend() calls phy_stop_machine() which will set the
state of the PHY to UP.

During resume mdio_bus_phy_resume() calls phy_init_hw(). That should
do a soft reset, call the config_init() callback, and configure
interrupts. After that phy_resume() is called and then
phy_state_machine(). Do to setting the state to UP, the state machine
will kick off auto-negotiation, which will cause any auto-neg
parameters to be written to the PHY.

> What if you have disabled gigabit ethernet from auto negotiation before
> suspend, it will be enabled again after the phy get out of reset.

If you have set in fixed mode, the wrongly named phy_config_aneg()
will set the fixed modes, same as it would set the auto-neg modes. So
they should be preserved over suspend/resume.
 
	Andrew
Francesco Dolcini Feb. 6, 2024, 7:09 p.m. UTC | #7
On Tue, Feb 06, 2024 at 06:57:37PM +0100, Andrew Lunn wrote:
> On Tue, Feb 06, 2024 at 06:26:45PM +0100, Francesco Dolcini wrote:
> > On Wed, Jan 17, 2024 at 02:33:53PM +0100, Wolfram Sang wrote:
> > > Makes sense? Tests work fine here, at least.
> > 
> > What I do not know, is what happen to any configuration that was done to
> > the phy before.
> 
> I'm assuming here WoL was not enabled, so the PHY did actually
> suspend.
> 
> mdio_bus_phy_suspend() calls phy_stop_machine() which will set the
> state of the PHY to UP.
> 
> During resume mdio_bus_phy_resume() calls phy_init_hw(). That should
> do a soft reset, call the config_init() callback, and configure
> interrupts. After that phy_resume() is called and then
> phy_state_machine(). Do to setting the state to UP, the state machine
> will kick off auto-negotiation, which will cause any auto-neg
> parameters to be written to the PHY.
> 
> > What if you have disabled gigabit ethernet from auto negotiation before
> > suspend, it will be enabled again after the phy get out of reset.
> 
> If you have set in fixed mode, the wrongly named phy_config_aneg()
> will set the fixed modes, same as it would set the auto-neg modes. So
> they should be preserved over suspend/resume.

Thanks for the detailed explanation Andrew.

What if the configuration was done using ethtool?

Francesco
Andrew Lunn Feb. 6, 2024, 7:19 p.m. UTC | #8
> What if the configuration was done using ethtool?

The MAC driver when handling ethtool requests should call
phy_ethtool_ksettings_set(). That manipulates phydev->advertising,
phydev->autoneg etc. These are the values passed by phy_config_aneg()
to the PHY driver.

   Andrew
diff mbox series

Patch

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 08e3915001c3..c38d7982c06c 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -1984,6 +1984,14 @@  static int kszphy_resume(struct phy_device *phydev)
 	return 0;
 }
 
+static int ksz9x31_resume(struct phy_device *phydev)
+{
+	phy_device_reset(phydev, 1);
+	phy_device_reset(phydev, 0);
+
+	return kszphy_resume(phydev);
+}
+
 static int kszphy_probe(struct phy_device *phydev)
 {
 	const struct kszphy_type *type = phydev->drv->driver_data;
@@ -4778,7 +4786,7 @@  static struct phy_driver ksphy_driver[] = {
 	.get_strings	= kszphy_get_strings,
 	.get_stats	= kszphy_get_stats,
 	.suspend	= kszphy_suspend,
-	.resume		= kszphy_resume,
+	.resume		= ksz9x31_resume,
 	.cable_test_start	= ksz9x31_cable_test_start,
 	.cable_test_get_status	= ksz9x31_cable_test_get_status,
 }, {
@@ -4851,7 +4859,7 @@  static struct phy_driver ksphy_driver[] = {
 	.get_strings	= kszphy_get_strings,
 	.get_stats	= kszphy_get_stats,
 	.suspend	= kszphy_suspend,
-	.resume		= kszphy_resume,
+	.resume		= ksz9x31_resume,
 	.cable_test_start	= ksz9x31_cable_test_start,
 	.cable_test_get_status	= ksz9x31_cable_test_get_status,
 	.get_features	= ksz9477_get_features,