diff mbox series

[net] net: phy: micrel: populate .soft_reset for KSZ9131

Message ID 20240105085242.1471050-1-claudiu.beznea.uj@bp.renesas.com (mailing list archive)
State Accepted
Commit e398822c4751017fe401f57409488f5948d12fb5
Delegated to: Netdev Maintainers
Headers show
Series [net] net: phy: micrel: populate .soft_reset for KSZ9131 | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1113 this patch: 1113
netdev/cc_maintainers success CCed 0 of 0 maintainers
netdev/build_clang success Errors and warnings before: 1140 this patch: 1140
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1140 this patch: 1140
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Claudiu Jan. 5, 2024, 8:52 a.m. UTC
From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

The RZ/G3S SMARC Module has 2 KSZ9131 PHYs. In this setup, the KSZ9131 PHY
is used with the ravb Ethernet driver. It has been discovered that when
bringing the Ethernet interface down/up continuously, e.g., with the
following sh script:

$ while :; do ifconfig eth0 down; ifconfig eth0 up; done

the link speed and duplex are wrong after interrupting the bring down/up
operation even though the Ethernet interface is up. To recover from this
state the following configuration sequence is necessary (executed
manually):

$ ifconfig eth0 down
$ ifconfig eth0 up

The behavior has been identified also on the Microchip SAMA7G5-EK board
which runs the macb driver and uses the same PHY.

The order of PHY-related operations in ravb_open() is as follows:
ravb_open() ->
  ravb_phy_start() ->
    ravb_phy_init() ->
      of_phy_connect() ->
        phy_connect_direct() ->
	  phy_attach_direct() ->
	    phy_init_hw() ->
	      phydev->drv->soft_reset()
	      phydev->drv->config_init()
	      phydev->drv->config_intr()
	    phy_resume()
	      kszphy_resume()

The order of PHY-related operations in ravb_close is as follows:
ravb_close() ->
  phy_stop() ->
    phy_suspend() ->
      kszphy_suspend() ->
        genphy_suspend()
	  // set BMCR_PDOWN bit in MII_BMCR

In genphy_suspend() setting the BMCR_PDWN bit in MII_BMCR switches the PHY
to Software Power-Down (SPD) mode (according to the KSZ9131 datasheet).
Thus, when opening the interface after it has been  previously closed (via
ravb_close()), the phydev->drv->config_init() and
phydev->drv->config_intr() reach the KSZ9131 PHY driver via the
ksz9131_config_init() and kszphy_config_intr() functions.

KSZ9131 specifies that the MII management interface remains operational
during SPD (Software Power-Down), but (according to manual):
- Only access to the standard registers (0 through 31) is supported.
- Access to MMD address spaces other than MMD address space 1 is possible
  if the spd_clock_gate_override bit is set.
- Access to MMD address space 1 is not possible.

The spd_clock_gate_override bit is not used in the KSZ9131 driver.

ksz9131_config_init() configures RGMII delay, pad skews and LEDs by
accessesing MMD registers other than those in address space 1.

The datasheet for the KSZ9131 does not specify what happens if registers
from an unsupported address space are accessed while the PHY is in SPD.

To fix the issue the .soft_reset method has been instantiated for KSZ9131,
too. This resets the PHY to the default state before doing any
configurations to it, thus switching it out of SPD.

Fixes: bff5b4b37372 ("net: phy: micrel: add Microchip KSZ9131 initial driver")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
---
 drivers/net/phy/micrel.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Russell King (Oracle) Jan. 5, 2024, 9:43 a.m. UTC | #1
On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote:
> The order of PHY-related operations in ravb_open() is as follows:
> ravb_open() ->
>   ravb_phy_start() ->
>     ravb_phy_init() ->
>       of_phy_connect() ->
>         phy_connect_direct() ->
> 	  phy_attach_direct() ->
> 	    phy_init_hw() ->
> 	      phydev->drv->soft_reset()
> 	      phydev->drv->config_init()
> 	      phydev->drv->config_intr()
> 	    phy_resume()
> 	      kszphy_resume()
> 
> The order of PHY-related operations in ravb_close is as follows:
> ravb_close() ->
>   phy_stop() ->
>     phy_suspend() ->
>       kszphy_suspend() ->
>         genphy_suspend()
> 	  // set BMCR_PDOWN bit in MII_BMCR

Andrew,

This looks wrong to me - shouldn't we be resuming the PHY before
attempting to configure it?
Maxime Chevallier Jan. 5, 2024, 9:43 a.m. UTC | #2
Hi Claudiu,

On Fri,  5 Jan 2024 10:52:42 +0200
Claudiu <claudiu.beznea@tuxon.dev> wrote:

> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> 
> The RZ/G3S SMARC Module has 2 KSZ9131 PHYs. In this setup, the KSZ9131 PHY
> is used with the ravb Ethernet driver. It has been discovered that when
> bringing the Ethernet interface down/up continuously, e.g., with the
> following sh script:
> 
> $ while :; do ifconfig eth0 down; ifconfig eth0 up; done
> 
> the link speed and duplex are wrong after interrupting the bring down/up
> operation even though the Ethernet interface is up. To recover from this
> state the following configuration sequence is necessary (executed
> manually):
> 
> $ ifconfig eth0 down
> $ ifconfig eth0 up
> 
> The behavior has been identified also on the Microchip SAMA7G5-EK board
> which runs the macb driver and uses the same PHY.
> 
> The order of PHY-related operations in ravb_open() is as follows:
> ravb_open() ->
>   ravb_phy_start() ->
>     ravb_phy_init() ->
>       of_phy_connect() ->
>         phy_connect_direct() ->
> 	  phy_attach_direct() ->
> 	    phy_init_hw() ->
> 	      phydev->drv->soft_reset()
> 	      phydev->drv->config_init()
> 	      phydev->drv->config_intr()
> 	    phy_resume()
> 	      kszphy_resume()
> 
> The order of PHY-related operations in ravb_close is as follows:
> ravb_close() ->
>   phy_stop() ->
>     phy_suspend() ->
>       kszphy_suspend() ->
>         genphy_suspend()
> 	  // set BMCR_PDOWN bit in MII_BMCR
> 
> In genphy_suspend() setting the BMCR_PDWN bit in MII_BMCR switches the PHY
> to Software Power-Down (SPD) mode (according to the KSZ9131 datasheet).
> Thus, when opening the interface after it has been  previously closed (via
> ravb_close()), the phydev->drv->config_init() and
> phydev->drv->config_intr() reach the KSZ9131 PHY driver via the
> ksz9131_config_init() and kszphy_config_intr() functions.
> 
> KSZ9131 specifies that the MII management interface remains operational
> during SPD (Software Power-Down), but (according to manual):
> - Only access to the standard registers (0 through 31) is supported.
> - Access to MMD address spaces other than MMD address space 1 is possible
>   if the spd_clock_gate_override bit is set.
> - Access to MMD address space 1 is not possible.
> 
> The spd_clock_gate_override bit is not used in the KSZ9131 driver.
> 
> ksz9131_config_init() configures RGMII delay, pad skews and LEDs by
> accessesing MMD registers other than those in address space 1.
> 
> The datasheet for the KSZ9131 does not specify what happens if registers
> from an unsupported address space are accessed while the PHY is in SPD.
> 
> To fix the issue the .soft_reset method has been instantiated for KSZ9131,
> too. This resets the PHY to the default state before doing any
> configurations to it, thus switching it out of SPD.
> 
> Fixes: bff5b4b37372 ("net: phy: micrel: add Microchip KSZ9131 initial driver")
> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> ---
>  drivers/net/phy/micrel.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> index 08e3915001c3..f31f03dd87dd 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -4842,6 +4842,7 @@ static struct phy_driver ksphy_driver[] = {
>  	.flags		= PHY_POLL_CABLE_TEST,
>  	.driver_data	= &ksz9131_type,
>  	.probe		= kszphy_probe,
> +	.soft_reset	= genphy_soft_reset,
>  	.config_init	= ksz9131_config_init,
>  	.config_intr	= kszphy_config_intr,
>  	.config_aneg	= ksz9131_config_aneg,

This looks good to me. Thanks for the detailed analysis,

Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Maxime
Andrew Lunn Jan. 5, 2024, 2:36 p.m. UTC | #3
On Fri, Jan 05, 2024 at 09:43:22AM +0000, Russell King (Oracle) wrote:
> On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote:
> > The order of PHY-related operations in ravb_open() is as follows:
> > ravb_open() ->
> >   ravb_phy_start() ->
> >     ravb_phy_init() ->
> >       of_phy_connect() ->
> >         phy_connect_direct() ->
> > 	  phy_attach_direct() ->
> > 	    phy_init_hw() ->
> > 	      phydev->drv->soft_reset()
> > 	      phydev->drv->config_init()
> > 	      phydev->drv->config_intr()
> > 	    phy_resume()
> > 	      kszphy_resume()
> > 
> > The order of PHY-related operations in ravb_close is as follows:
> > ravb_close() ->
> >   phy_stop() ->
> >     phy_suspend() ->
> >       kszphy_suspend() ->
> >         genphy_suspend()
> > 	  // set BMCR_PDOWN bit in MII_BMCR
> 
> Andrew,
> 
> This looks wrong to me - shouldn't we be resuming the PHY before
> attempting to configure it?

Hummm. The opposite of phy_stop() is phy_start(). So it would be the
logical order to perform the resume as the first action of
phy_start(), not phy_attach_direct().

In phy_connect_direct(), we don't need the PHY to be operational
yet. That happens with phy_start().

The standard says:

  22.2.4.1.5 Power down

  The PHY may be placed in a low-power consumption state by setting
  bit 0.11 to a logic one. Clearing bit 0.11 to zero allows normal
  operation. The specific behavior of a PHY in the power-down state is
  implementation specific. While in the power-down state, the PHY
  shall respond to management transactions.

So i would say this PHY is broken, its not responding to all
management transactions. So in that respect, Claudiu fix is correct.

But i also somewhat agree with you, this looks wrong, but in a
different way to how you see it. However, moving the phy_resume() to
phy_start() seems a bit risky. So i'm not sure we should actually do
that.

	Andrew
Maxime Chevallier Jan. 5, 2024, 3:53 p.m. UTC | #4
On Fri, 5 Jan 2024 15:36:29 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> On Fri, Jan 05, 2024 at 09:43:22AM +0000, Russell King (Oracle) wrote:
> > On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote:  
> > > The order of PHY-related operations in ravb_open() is as follows:
> > > ravb_open() ->
> > >   ravb_phy_start() ->
> > >     ravb_phy_init() ->
> > >       of_phy_connect() ->
> > >         phy_connect_direct() ->
> > > 	  phy_attach_direct() ->
> > > 	    phy_init_hw() ->
> > > 	      phydev->drv->soft_reset()
> > > 	      phydev->drv->config_init()
> > > 	      phydev->drv->config_intr()
> > > 	    phy_resume()
> > > 	      kszphy_resume()
> > > 
> > > The order of PHY-related operations in ravb_close is as follows:
> > > ravb_close() ->
> > >   phy_stop() ->
> > >     phy_suspend() ->
> > >       kszphy_suspend() ->
> > >         genphy_suspend()
> > > 	  // set BMCR_PDOWN bit in MII_BMCR  
> > 
> > Andrew,
> > 
> > This looks wrong to me - shouldn't we be resuming the PHY before
> > attempting to configure it?  
> 
> Hummm. The opposite of phy_stop() is phy_start(). So it would be the
> logical order to perform the resume as the first action of
> phy_start(), not phy_attach_direct().
> 
> In phy_connect_direct(), we don't need the PHY to be operational
> yet. That happens with phy_start().
> 
> The standard says:
> 
>   22.2.4.1.5 Power down
> 
>   The PHY may be placed in a low-power consumption state by setting
>   bit 0.11 to a logic one. Clearing bit 0.11 to zero allows normal
>   operation. The specific behavior of a PHY in the power-down state is
>   implementation specific. While in the power-down state, the PHY
>   shall respond to management transactions.
> 
> So i would say this PHY is broken, its not responding to all
> management transactions. So in that respect, Claudiu fix is correct.
> 
> But i also somewhat agree with you, this looks wrong, but in a
> different way to how you see it. However, moving the phy_resume() to
> phy_start() seems a bit risky. So i'm not sure we should actually do
> that.

Looking at other PHYs similar to it like the 9031, the .soft_reset()
was added to fix some similar issues :

Issue :
https://lore.kernel.org/netdev/a63ca542-db96-40ed-201d-59c609f565ce@gmail.com/

Fix :
https://lore.kernel.org/netdev/6d3b1dce-7633-51a1-0556-97cd03304c2c@gmail.com/

We couldn't get a proper explanation back then. Could it be that they
suffer from the same problem, but that it was more clearly documented
for the 9131 ?

Maxime
Claudiu Jan. 10, 2024, 1:20 p.m. UTC | #5
Hi, Andrew, Russell,

On 05.01.2024 16:36, Andrew Lunn wrote:
> On Fri, Jan 05, 2024 at 09:43:22AM +0000, Russell King (Oracle) wrote:
>> On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote:
>>> The order of PHY-related operations in ravb_open() is as follows:
>>> ravb_open() ->
>>>   ravb_phy_start() ->
>>>     ravb_phy_init() ->
>>>       of_phy_connect() ->
>>>         phy_connect_direct() ->
>>> 	  phy_attach_direct() ->
>>> 	    phy_init_hw() ->
>>> 	      phydev->drv->soft_reset()
>>> 	      phydev->drv->config_init()
>>> 	      phydev->drv->config_intr()
>>> 	    phy_resume()
>>> 	      kszphy_resume()
>>>
>>> The order of PHY-related operations in ravb_close is as follows:
>>> ravb_close() ->
>>>   phy_stop() ->
>>>     phy_suspend() ->
>>>       kszphy_suspend() ->
>>>         genphy_suspend()
>>> 	  // set BMCR_PDOWN bit in MII_BMCR
>>
>> Andrew,
>>
>> This looks wrong to me - shouldn't we be resuming the PHY before
>> attempting to configure it?
> 
> Hummm. The opposite of phy_stop() is phy_start(). So it would be the
> logical order to perform the resume as the first action of
> phy_start(), not phy_attach_direct().
> 
> In phy_connect_direct(), we don't need the PHY to be operational
> yet. That happens with phy_start().
> 
> The standard says:
> 
>   22.2.4.1.5 Power down
> 
>   The PHY may be placed in a low-power consumption state by setting
>   bit 0.11 to a logic one. Clearing bit 0.11 to zero allows normal
>   operation. The specific behavior of a PHY in the power-down state is
>   implementation specific. While in the power-down state, the PHY
>   shall respond to management transactions.
> 
> So i would say this PHY is broken, its not responding to all
> management transactions. So in that respect, Claudiu fix is correct.
> 
> But i also somewhat agree with you, this looks wrong, but in a
> different way to how you see it. However, moving the phy_resume() to
> phy_start() seems a bit risky. So i'm not sure we should actually do
> that.

It's not clear to me if you both agree with this fix. Could you please let
me know?

Thank you,
Claudiu Beznea

> 
> 	Andrew
Andrew Lunn Jan. 12, 2024, 1:30 a.m. UTC | #6
On Wed, Jan 10, 2024 at 03:20:19PM +0200, claudiu beznea wrote:
> Hi, Andrew, Russell,
> 
> On 05.01.2024 16:36, Andrew Lunn wrote:
> > On Fri, Jan 05, 2024 at 09:43:22AM +0000, Russell King (Oracle) wrote:
> >> On Fri, Jan 05, 2024 at 10:52:42AM +0200, Claudiu wrote:
> >>> The order of PHY-related operations in ravb_open() is as follows:
> >>> ravb_open() ->
> >>>   ravb_phy_start() ->
> >>>     ravb_phy_init() ->
> >>>       of_phy_connect() ->
> >>>         phy_connect_direct() ->
> >>> 	  phy_attach_direct() ->
> >>> 	    phy_init_hw() ->
> >>> 	      phydev->drv->soft_reset()
> >>> 	      phydev->drv->config_init()
> >>> 	      phydev->drv->config_intr()
> >>> 	    phy_resume()
> >>> 	      kszphy_resume()
> >>>
> >>> The order of PHY-related operations in ravb_close is as follows:
> >>> ravb_close() ->
> >>>   phy_stop() ->
> >>>     phy_suspend() ->
> >>>       kszphy_suspend() ->
> >>>         genphy_suspend()
> >>> 	  // set BMCR_PDOWN bit in MII_BMCR
> >>
> >> Andrew,
> >>
> >> This looks wrong to me - shouldn't we be resuming the PHY before
> >> attempting to configure it?
> > 
> > Hummm. The opposite of phy_stop() is phy_start(). So it would be the
> > logical order to perform the resume as the first action of
> > phy_start(), not phy_attach_direct().
> > 
> > In phy_connect_direct(), we don't need the PHY to be operational
> > yet. That happens with phy_start().
> > 
> > The standard says:
> > 
> >   22.2.4.1.5 Power down
> > 
> >   The PHY may be placed in a low-power consumption state by setting
> >   bit 0.11 to a logic one. Clearing bit 0.11 to zero allows normal
> >   operation. The specific behavior of a PHY in the power-down state is
> >   implementation specific. While in the power-down state, the PHY
> >   shall respond to management transactions.
> > 
> > So i would say this PHY is broken, its not responding to all
> > management transactions. So in that respect, Claudiu fix is correct.
> > 
> > But i also somewhat agree with you, this looks wrong, but in a
> > different way to how you see it. However, moving the phy_resume() to
> > phy_start() seems a bit risky. So i'm not sure we should actually do
> > that.
> 
> It's not clear to me if you both agree with this fix. Could you please let
> me know?

Hi Claudiu

I think this is a valid workaround for the broken hardware.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

There might be further discussion about if suspend and resume are
being done at the correct time, but i think that is orthogonal.

    Andrew
patchwork-bot+netdevbpf@kernel.org Jan. 12, 2024, 11:10 a.m. UTC | #7
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Fri,  5 Jan 2024 10:52:42 +0200 you wrote:
> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> 
> The RZ/G3S SMARC Module has 2 KSZ9131 PHYs. In this setup, the KSZ9131 PHY
> is used with the ravb Ethernet driver. It has been discovered that when
> bringing the Ethernet interface down/up continuously, e.g., with the
> following sh script:
> 
> [...]

Here is the summary with links:
  - [net] net: phy: micrel: populate .soft_reset for KSZ9131
    https://git.kernel.org/netdev/net/c/e398822c4751

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 08e3915001c3..f31f03dd87dd 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -4842,6 +4842,7 @@  static struct phy_driver ksphy_driver[] = {
 	.flags		= PHY_POLL_CABLE_TEST,
 	.driver_data	= &ksz9131_type,
 	.probe		= kszphy_probe,
+	.soft_reset	= genphy_soft_reset,
 	.config_init	= ksz9131_config_init,
 	.config_intr	= kszphy_config_intr,
 	.config_aneg	= ksz9131_config_aneg,