diff mbox series

[2/2] net: dsa: microchip: Provide Module 4 KSZ9477 errata (DS80000754C)

Message ID 20230824154827.166274-2-lukma@denx.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [1/2] net: dsa: microchip: KSZ9477: Provide functions to access MMD registers | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1332 this patch: 1330
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 1356 this patch: 1353
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1355 this patch: 1353
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 52 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Lukasz Majewski Aug. 24, 2023, 3:48 p.m. UTC
The KSZ9477 errata points out the link up/down problem when EEE is enabled
in the device to which the KSZ9477 tries to auto negotiate.

The suggested workaround is to clear advertisement EEE registers
(accessed as per port MMD one).

Signed-off-by: Lukasz Majewski <lukma@denx.de>
---
 drivers/net/dsa/microchip/ksz9477.c | 40 ++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

Comments

Florian Fainelli Aug. 24, 2023, 3:54 p.m. UTC | #1
On 8/24/2023 8:48 AM, Lukasz Majewski wrote:
> The KSZ9477 errata points out the link up/down problem when EEE is enabled
> in the device to which the KSZ9477 tries to auto negotiate.
> 
> The suggested workaround is to clear advertisement EEE registers
> (accessed as per port MMD one).
> 
> Signed-off-by: Lukasz Majewski <lukma@denx.de>
> ---
>   drivers/net/dsa/microchip/ksz9477.c | 40 ++++++++++++++++++++++++++++-
>   1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
> index cb6aa7c668a8..563f497ba656 100644
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -1128,6 +1128,44 @@ int ksz9477_enable_stp_addr(struct ksz_device *dev)
>   	return 0;
>   }
>   
> +static int ksz9477_errata(struct dsa_switch *ds)
> +{
> +	struct ksz_device *dev = ds->priv;
> +	u16 val;
> +	int p;
> +
> +	/* KSZ9477 Errata DS80000754C
> +	 *
> +	 * Module 4: Energy Efficient Ethernet (EEE) feature select must be
> +	 * manually disabled
> +	 *   The EEE feature is enabled by default, but it is not fully
> +	 *   operational. It must be manually disabled through register
> +	 *   controls. If not disabled, the PHY ports can auto-negotiate
> +	 *   to enable EEE, and this feature can cause link drops when linked
> +	 *   to another device supporting EEE.
> +	 *
> +	 *   Only PHY ports (dsa user) [0-4] need to have the EEE advertisement
> +	 *   bits cleared.
> +	 */
> +
> +	for (p = 0; p < ds->num_ports; p++) {
> +		if (!dsa_is_user_port(ds, p))
> +			continue;
> +
> +		ksz9477_port_mmd_read(dev, p, MMD_DEVICE_ID_EEE_ADV,
> +				      MMD_EEE_ADV, &val, 1);
> +
> +		pr_err("%s: PORT: %d val: 0x%x pc: %d\n", __func__, p, val,
> +		       ds->num_ports);

Left over debugging?

> +
> +		val &= ~(EEE_ADV_100MBIT | EEE_ADV_1GBIT);
> +		ksz9477_port_mmd_write(dev, p, MMD_DEVICE_ID_EEE_ADV,
> +				       MMD_EEE_ADV, &val, 1);
> +	}
> +
> +	return 0;

You don't propagate any error, so make this return void?
Tristram.Ha@microchip.com Aug. 25, 2023, 1:12 a.m. UTC | #2
> +static int ksz9477_errata(struct dsa_switch *ds)
> +{
> +       struct ksz_device *dev = ds->priv;
> +       u16 val;
> +       int p;
> +
> +       /* KSZ9477 Errata DS80000754C
> +        *
> +        * Module 4: Energy Efficient Ethernet (EEE) feature select must be
> +        * manually disabled
> +        *   The EEE feature is enabled by default, but it is not fully
> +        *   operational. It must be manually disabled through register
> +        *   controls. If not disabled, the PHY ports can auto-negotiate
> +        *   to enable EEE, and this feature can cause link drops when linked
> +        *   to another device supporting EEE.
> +        *
> +        *   Only PHY ports (dsa user) [0-4] need to have the EEE advertisement
> +        *   bits cleared.
> +        */
> +
> +       for (p = 0; p < ds->num_ports; p++) {
> +               if (!dsa_is_user_port(ds, p))
> +                       continue;
> +
> +               ksz9477_port_mmd_read(dev, p, MMD_DEVICE_ID_EEE_ADV,
> +                                     MMD_EEE_ADV, &val, 1);
> +
> +               pr_err("%s: PORT: %d val: 0x%x pc: %d\n", __func__, p, val,
> +                      ds->num_ports);
> +
> +               val &= ~(EEE_ADV_100MBIT | EEE_ADV_1GBIT);
> +               ksz9477_port_mmd_write(dev, p, MMD_DEVICE_ID_EEE_ADV,
> +                                      MMD_EEE_ADV, &val, 1);
> +       }
> +
> +       return 0;
> +}
> +
>  int ksz9477_setup(struct dsa_switch *ds)
>  {
>         struct ksz_device *dev = ds->priv;
> @@ -1157,7 +1195,7 @@ int ksz9477_setup(struct dsa_switch *ds)
>         /* enable global MIB counter freeze function */
>         ksz_cfg(dev, REG_SW_MAC_CTRL_6, SW_MIB_COUNTER_FREEZE, true);
> 
> -       return 0;
> +       return ksz9477_errata(ds);
>  }

I would prefer to execute the code in ksz9477_config_cpu_port(), as at
the end there is already a loop to do something to each port.  The check
to disable EEE or not should be dev->info->internal_phy[port], as one of
the user ports can be RGMII or SGMII, which does not have a PHY that can
be accessed inside the switch.

As the EEE register value is simply 6 it should be enough to just set
the register to zero.  If so we do not need to add back those
ksz9477_port_mmd_setup functions and just use ksz_pwrite16() to write to
the MMD register.
Lukasz Majewski Aug. 25, 2023, 7:42 a.m. UTC | #3
On Thu, 24 Aug 2023 08:54:37 -0700
Florian Fainelli <f.fainelli@gmail.com> wrote:

> On 8/24/2023 8:48 AM, Lukasz Majewski wrote:
> > The KSZ9477 errata points out the link up/down problem when EEE is
> > enabled in the device to which the KSZ9477 tries to auto negotiate.
> > 
> > The suggested workaround is to clear advertisement EEE registers
> > (accessed as per port MMD one).
> > 
> > Signed-off-by: Lukasz Majewski <lukma@denx.de>
> > ---
> >   drivers/net/dsa/microchip/ksz9477.c | 40
> > ++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1
> > deletion(-)
> > 
> > diff --git a/drivers/net/dsa/microchip/ksz9477.c
> > b/drivers/net/dsa/microchip/ksz9477.c index
> > cb6aa7c668a8..563f497ba656 100644 ---
> > a/drivers/net/dsa/microchip/ksz9477.c +++
> > b/drivers/net/dsa/microchip/ksz9477.c @@ -1128,6 +1128,44 @@ int
> > ksz9477_enable_stp_addr(struct ksz_device *dev) return 0;
> >   }
> >   
> > +static int ksz9477_errata(struct dsa_switch *ds)
> > +{
> > +	struct ksz_device *dev = ds->priv;
> > +	u16 val;
> > +	int p;
> > +
> > +	/* KSZ9477 Errata DS80000754C
> > +	 *
> > +	 * Module 4: Energy Efficient Ethernet (EEE) feature
> > select must be
> > +	 * manually disabled
> > +	 *   The EEE feature is enabled by default, but it is not
> > fully
> > +	 *   operational. It must be manually disabled through
> > register
> > +	 *   controls. If not disabled, the PHY ports can
> > auto-negotiate
> > +	 *   to enable EEE, and this feature can cause link drops
> > when linked
> > +	 *   to another device supporting EEE.
> > +	 *
> > +	 *   Only PHY ports (dsa user) [0-4] need to have the EEE
> > advertisement
> > +	 *   bits cleared.
> > +	 */
> > +
> > +	for (p = 0; p < ds->num_ports; p++) {
> > +		if (!dsa_is_user_port(ds, p))
> > +			continue;
> > +
> > +		ksz9477_port_mmd_read(dev, p,
> > MMD_DEVICE_ID_EEE_ADV,
> > +				      MMD_EEE_ADV, &val, 1);
> > +
> > +		pr_err("%s: PORT: %d val: 0x%x pc: %d\n",
> > __func__, p, val,
> > +		       ds->num_ports);  
> 
> Left over debugging?
> 

Yes, correct - I will fix it.

> > +
> > +		val &= ~(EEE_ADV_100MBIT | EEE_ADV_1GBIT);
> > +		ksz9477_port_mmd_write(dev, p,
> > MMD_DEVICE_ID_EEE_ADV,
> > +				       MMD_EEE_ADV, &val, 1);
> > +	}
> > +
> > +	return 0;  
> 
> You don't propagate any error, so make this return void?

I will fix this too.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Lukasz Majewski Aug. 25, 2023, 8:39 a.m. UTC | #4
Hi Tristram,

> > +static int ksz9477_errata(struct dsa_switch *ds)
> > +{
> > +       struct ksz_device *dev = ds->priv;
> > +       u16 val;
> > +       int p;
> > +
> > +       /* KSZ9477 Errata DS80000754C
> > +        *
> > +        * Module 4: Energy Efficient Ethernet (EEE) feature select
> > must be
> > +        * manually disabled
> > +        *   The EEE feature is enabled by default, but it is not
> > fully
> > +        *   operational. It must be manually disabled through
> > register
> > +        *   controls. If not disabled, the PHY ports can
> > auto-negotiate
> > +        *   to enable EEE, and this feature can cause link drops
> > when linked
> > +        *   to another device supporting EEE.
> > +        *
> > +        *   Only PHY ports (dsa user) [0-4] need to have the EEE
> > advertisement
> > +        *   bits cleared.
> > +        */
> > +
> > +       for (p = 0; p < ds->num_ports; p++) {
> > +               if (!dsa_is_user_port(ds, p))
> > +                       continue;
> > +
> > +               ksz9477_port_mmd_read(dev, p, MMD_DEVICE_ID_EEE_ADV,
> > +                                     MMD_EEE_ADV, &val, 1);
> > +
> > +               pr_err("%s: PORT: %d val: 0x%x pc: %d\n", __func__,
> > p, val,
> > +                      ds->num_ports);
> > +
> > +               val &= ~(EEE_ADV_100MBIT | EEE_ADV_1GBIT);
> > +               ksz9477_port_mmd_write(dev, p,
> > MMD_DEVICE_ID_EEE_ADV,
> > +                                      MMD_EEE_ADV, &val, 1);
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  int ksz9477_setup(struct dsa_switch *ds)
> >  {
> >         struct ksz_device *dev = ds->priv;
> > @@ -1157,7 +1195,7 @@ int ksz9477_setup(struct dsa_switch *ds)
> >         /* enable global MIB counter freeze function */
> >         ksz_cfg(dev, REG_SW_MAC_CTRL_6, SW_MIB_COUNTER_FREEZE,
> > true);
> > 
> > -       return 0;
> > +       return ksz9477_errata(ds);
> >  }  
> 
> I would prefer to execute the code in ksz9477_config_cpu_port(), as at
> the end there is already a loop to do something to each port. 

Just some explanation of the taken approach:

1. I've followed already in-mainline code for ksz8795.c
(ksz8_handle_global_errata(ds)) which is executed in ksz8_setup
function. 

2. I do believe, that separate "errata" function would be more
readable, as KSZ9477 has many more erratas to be added.

> The
> check to disable EEE or not should be dev->info->internal_phy[port],
> as one of the user ports can be RGMII or SGMII, which does not have a
> PHY that can be accessed inside the switch.

Yes, this would be better solution. Thanks for the suggestion.

> 
> As the EEE register value is simply 6 it should be enough to just set
> the register to zero.  If so we do not need to add back those
> ksz9477_port_mmd_setup functions and just use ksz_pwrite16() to write
> to the MMD register.
> 

IMHO adding functions to MMD modification would facilitate further
development (for example LED setup).

However, if we only plan to fix this errata, then the MMD access
functions are not required.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Florian Fainelli Aug. 25, 2023, 3:26 p.m. UTC | #5
On 8/25/2023 1:39 AM, Lukasz Majewski wrote:
> Hi Tristram,
> 
>>> +static int ksz9477_errata(struct dsa_switch *ds)
>>> +{
>>> +       struct ksz_device *dev = ds->priv;
>>> +       u16 val;
>>> +       int p;
>>> +
>>> +       /* KSZ9477 Errata DS80000754C
>>> +        *
>>> +        * Module 4: Energy Efficient Ethernet (EEE) feature select
>>> must be
>>> +        * manually disabled
>>> +        *   The EEE feature is enabled by default, but it is not
>>> fully
>>> +        *   operational. It must be manually disabled through
>>> register
>>> +        *   controls. If not disabled, the PHY ports can
>>> auto-negotiate
>>> +        *   to enable EEE, and this feature can cause link drops
>>> when linked
>>> +        *   to another device supporting EEE.
>>> +        *
>>> +        *   Only PHY ports (dsa user) [0-4] need to have the EEE
>>> advertisement
>>> +        *   bits cleared.
>>> +        */
>>> +
>>> +       for (p = 0; p < ds->num_ports; p++) {
>>> +               if (!dsa_is_user_port(ds, p))
>>> +                       continue;
>>> +
>>> +               ksz9477_port_mmd_read(dev, p, MMD_DEVICE_ID_EEE_ADV,
>>> +                                     MMD_EEE_ADV, &val, 1);
>>> +
>>> +               pr_err("%s: PORT: %d val: 0x%x pc: %d\n", __func__,
>>> p, val,
>>> +                      ds->num_ports);
>>> +
>>> +               val &= ~(EEE_ADV_100MBIT | EEE_ADV_1GBIT);
>>> +               ksz9477_port_mmd_write(dev, p,
>>> MMD_DEVICE_ID_EEE_ADV,
>>> +                                      MMD_EEE_ADV, &val, 1);
>>> +       }
>>> +
>>> +       return 0;
>>> +}
>>> +
>>>   int ksz9477_setup(struct dsa_switch *ds)
>>>   {
>>>          struct ksz_device *dev = ds->priv;
>>> @@ -1157,7 +1195,7 @@ int ksz9477_setup(struct dsa_switch *ds)
>>>          /* enable global MIB counter freeze function */
>>>          ksz_cfg(dev, REG_SW_MAC_CTRL_6, SW_MIB_COUNTER_FREEZE,
>>> true);
>>>
>>> -       return 0;
>>> +       return ksz9477_errata(ds);
>>>   }
>>
>> I would prefer to execute the code in ksz9477_config_cpu_port(), as at
>> the end there is already a loop to do something to each port.
> 
> Just some explanation of the taken approach:
> 
> 1. I've followed already in-mainline code for ksz8795.c
> (ksz8_handle_global_errata(ds)) which is executed in ksz8_setup
> function.
> 
> 2. I do believe, that separate "errata" function would be more
> readable, as KSZ9477 has many more erratas to be added.
> 
>> The
>> check to disable EEE or not should be dev->info->internal_phy[port],
>> as one of the user ports can be RGMII or SGMII, which does not have a
>> PHY that can be accessed inside the switch.
> 
> Yes, this would be better solution. Thanks for the suggestion.
> 
>>
>> As the EEE register value is simply 6 it should be enough to just set
>> the register to zero.  If so we do not need to add back those
>> ksz9477_port_mmd_setup functions and just use ksz_pwrite16() to write
>> to the MMD register.
>>
> 
> IMHO adding functions to MMD modification would facilitate further
> development (for example LED setup).

We already have some KSZ9477 specific initialization done in the Micrel 
PHY driver under drivers/net/phy/micrel.c, can we converge on the PHY 
driver which has a reasonable amount of infrastructure for dealing with 
workarounds, indirect or direct MMD accesses etc.?
Tristram.Ha@microchip.com Aug. 25, 2023, 6:48 p.m. UTC | #6
> > IMHO adding functions to MMD modification would facilitate further
> > development (for example LED setup).
> 
> We already have some KSZ9477 specific initialization done in the Micrel
> PHY driver under drivers/net/phy/micrel.c, can we converge on the PHY
> driver which has a reasonable amount of infrastructure for dealing with
> workarounds, indirect or direct MMD accesses etc.?

Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893 switches
are special and only used inside those switches.  Putting all the switch
related code in Micrel PHY driver does not really help.  When the switch
is reset all those PHY registers need to be set again, but the PHY driver
only executes those code during PHY initialization.  I do not know if
there is a good way to tell the PHY to re-initialize again.
There is also a typo in one of those registers when the code was moved
to the Micrel PHY driver.  A fix will be needed.
Vladimir Oltean Aug. 26, 2023, 10:49 a.m. UTC | #7
On Fri, Aug 25, 2023 at 06:48:41PM +0000, Tristram.Ha@microchip.com wrote:
> > > IMHO adding functions to MMD modification would facilitate further
> > > development (for example LED setup).
> > 
> > We already have some KSZ9477 specific initialization done in the Micrel
> > PHY driver under drivers/net/phy/micrel.c, can we converge on the PHY
> > driver which has a reasonable amount of infrastructure for dealing with
> > workarounds, indirect or direct MMD accesses etc.?
> 
> Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893 switches
> are special and only used inside those switches.  Putting all the switch
> related code in Micrel PHY driver does not really help.  When the switch
> is reset all those PHY registers need to be set again, but the PHY driver
> only executes those code during PHY initialization.  I do not know if
> there is a good way to tell the PHY to re-initialize again.

Suppose there was a method to tell the PHY driver to re-initialize itself.
What would be the key points in which the DSA switch driver would need
to trigger that method? Where is the switch reset at runtime?
Lukasz Majewski Aug. 29, 2023, 8:35 a.m. UTC | #8
Hi Vladimir,

> On Fri, Aug 25, 2023 at 06:48:41PM +0000, Tristram.Ha@microchip.com
> wrote:
> > > > IMHO adding functions to MMD modification would facilitate
> > > > further development (for example LED setup).  
> > > 
> > > We already have some KSZ9477 specific initialization done in the
> > > Micrel PHY driver under drivers/net/phy/micrel.c, can we converge
> > > on the PHY driver which has a reasonable amount of infrastructure
> > > for dealing with workarounds, indirect or direct MMD accesses
> > > etc.?  
> > 
> > Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893
> > switches are special and only used inside those switches.  Putting
> > all the switch related code in Micrel PHY driver does not really
> > help.  When the switch is reset all those PHY registers need to be
> > set again, but the PHY driver only executes those code during PHY
> > initialization.  I do not know if there is a good way to tell the
> > PHY to re-initialize again.  
> 
> Suppose there was a method to tell the PHY driver to re-initialize
> itself. What would be the key points in which the DSA switch driver
> would need to trigger that method? Where is the switch reset at
> runtime?

Tristam has explained why adding the internal switch PHY errata to
generic PHY code is not optimal.

If adding MMD generic code is a problem - then I'm fine with just
clearing proper bits with just two indirect writes in the
drivers/net/dsa/microchip/ksz9477.c

I would also prefer to keep the separate ksz9477_errata() function, so
we could add other errata code there.

Just informative - without this patch the KSZ9477-EVB board's network
is useless when the other peer has EEE enabled by default (like almost
all non managed ETH switches).



Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Vladimir Oltean Aug. 29, 2023, 10:18 a.m. UTC | #9
Hi Lukasz,

On Tue, Aug 29, 2023 at 10:35:33AM +0200, Lukasz Majewski wrote:
> Hi Vladimir,
> 
> > On Fri, Aug 25, 2023 at 06:48:41PM +0000, Tristram.Ha@microchip.com
> > wrote:
> > > > > IMHO adding functions to MMD modification would facilitate
> > > > > further development (for example LED setup).  
> > > > 
> > > > We already have some KSZ9477 specific initialization done in the
> > > > Micrel PHY driver under drivers/net/phy/micrel.c, can we converge
> > > > on the PHY driver which has a reasonable amount of infrastructure
> > > > for dealing with workarounds, indirect or direct MMD accesses
> > > > etc.?  
> > > 
> > > Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893
> > > switches are special and only used inside those switches.  Putting
> > > all the switch related code in Micrel PHY driver does not really
> > > help.  When the switch is reset all those PHY registers need to be
> > > set again, but the PHY driver only executes those code during PHY
> > > initialization.  I do not know if there is a good way to tell the
> > > PHY to re-initialize again.  
> > 
> > Suppose there was a method to tell the PHY driver to re-initialize
> > itself. What would be the key points in which the DSA switch driver
> > would need to trigger that method? Where is the switch reset at
> > runtime?
> 
> Tristam has explained why adding the internal switch PHY errata to
> generic PHY code is not optimal.

Yes, and I didn't understand that explanation, so I asked a
clarification question.

> If adding MMD generic code is a problem - then I'm fine with just
> clearing proper bits with just two indirect writes in the
> drivers/net/dsa/microchip/ksz9477.c
> 
> I would also prefer to keep the separate ksz9477_errata() function, so
> we could add other errata code there.
> 
> Just informative - without this patch the KSZ9477-EVB board's network
> is useless when the other peer has EEE enabled by default (like almost
> all non managed ETH switches).

No, adding direct PHY MMD access code to the ksz9477 switch driver is
not even the biggest problem - even though, IIUC, the "workaround" to
disable EEE advertisement could be moved to ksz9477_get_features() in
drivers/net/phy/micrel.c, where phydev->supported_eee could be cleared.

The biggest problem that I see is that Oleksij Rempel has "just" added
EEE support to the KSZ9477 earlier this year, with an ack from Arun
Ramadoss: 69d3b36ca045 ("net: dsa: microchip: enable EEE support").
I'm not understanding why the erratum wasn't a discussion topic then.

I am currently on vacation and won't be able to look very deeply into
the problem, but IIUC, your patch undoes that work, and so, it needs an
ACK from Oleksij.
Lukasz Majewski Aug. 29, 2023, 11:24 a.m. UTC | #10
Hi Vladimir,

> Hi Lukasz,
> 
> On Tue, Aug 29, 2023 at 10:35:33AM +0200, Lukasz Majewski wrote:
> > Hi Vladimir,
> >   
> > > On Fri, Aug 25, 2023 at 06:48:41PM +0000,
> > > Tristram.Ha@microchip.com wrote:  
> > > > > > IMHO adding functions to MMD modification would facilitate
> > > > > > further development (for example LED setup).    
> > > > > 
> > > > > We already have some KSZ9477 specific initialization done in
> > > > > the Micrel PHY driver under drivers/net/phy/micrel.c, can we
> > > > > converge on the PHY driver which has a reasonable amount of
> > > > > infrastructure for dealing with workarounds, indirect or
> > > > > direct MMD accesses etc.?    
> > > > 
> > > > Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893
> > > > switches are special and only used inside those switches.
> > > > Putting all the switch related code in Micrel PHY driver does
> > > > not really help.  When the switch is reset all those PHY
> > > > registers need to be set again, but the PHY driver only
> > > > executes those code during PHY initialization.  I do not know
> > > > if there is a good way to tell the PHY to re-initialize again.
> > > >   
> > > 
> > > Suppose there was a method to tell the PHY driver to re-initialize
> > > itself. What would be the key points in which the DSA switch
> > > driver would need to trigger that method? Where is the switch
> > > reset at runtime?  
> > 
> > Tristam has explained why adding the internal switch PHY errata to
> > generic PHY code is not optimal.  
> 
> Yes, and I didn't understand that explanation, so I asked a
> clarification question.

Ok. Let's wait for Tristram's answer.

> 
> > If adding MMD generic code is a problem - then I'm fine with just
> > clearing proper bits with just two indirect writes in the
> > drivers/net/dsa/microchip/ksz9477.c
> > 
> > I would also prefer to keep the separate ksz9477_errata() function,
> > so we could add other errata code there.
> > 
> > Just informative - without this patch the KSZ9477-EVB board's
> > network is useless when the other peer has EEE enabled by default
> > (like almost all non managed ETH switches).  
> 
> No, adding direct PHY MMD access code to the ksz9477 switch driver is
> not even the biggest problem - even though, IIUC, the "workaround" to
> disable EEE advertisement could be moved to ksz9477_get_features() in
> drivers/net/phy/micrel.c, where phydev->supported_eee could be
> cleared.

To be even more interesting (after looking into the PHY micrel.c code):
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804

The errata from this patch is already present.

The issue is that ksz9477_config_init() (drivers/net/phy/micrel.c) is
executed AFTER generic phy_probe():
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phy_device.c#L3256
in which the EEE advertisement registers are read.

Hence, those registers needs to be cleared earlier - as I do in
ksz9477_setup() in drivers/net/dsa/microchip/ksz9477.

Here the precedence matters ...

> 
> The biggest problem that I see is that Oleksij Rempel has "just" added
> EEE support to the KSZ9477 earlier this year, with an ack from Arun
> Ramadoss: 69d3b36ca045 ("net: dsa: microchip: enable EEE support").
> I'm not understanding why the erratum wasn't a discussion topic then.

+1

> 
> I am currently on vacation and won't be able to look very deeply into
> the problem, but IIUC, your patch undoes that work, and so, it needs
> an ACK from Oleksij.

Ok.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Oleksij Rempel Aug. 29, 2023, 11:47 a.m. UTC | #11
Hi Lukasz,

On Tue, Aug 29, 2023 at 01:24:29PM +0200, Lukasz Majewski wrote:
> Hi Vladimir,
> 
> > Hi Lukasz,
> > 
> > On Tue, Aug 29, 2023 at 10:35:33AM +0200, Lukasz Majewski wrote:
> > > Hi Vladimir,
> > >   
> > > > On Fri, Aug 25, 2023 at 06:48:41PM +0000,
> > > > Tristram.Ha@microchip.com wrote:  
> > > > > > > IMHO adding functions to MMD modification would facilitate
> > > > > > > further development (for example LED setup).    
> > > > > > 
> > > > > > We already have some KSZ9477 specific initialization done in
> > > > > > the Micrel PHY driver under drivers/net/phy/micrel.c, can we
> > > > > > converge on the PHY driver which has a reasonable amount of
> > > > > > infrastructure for dealing with workarounds, indirect or
> > > > > > direct MMD accesses etc.?    
> > > > > 
> > > > > Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893
> > > > > switches are special and only used inside those switches.
> > > > > Putting all the switch related code in Micrel PHY driver does
> > > > > not really help.  When the switch is reset all those PHY
> > > > > registers need to be set again, but the PHY driver only
> > > > > executes those code during PHY initialization.  I do not know
> > > > > if there is a good way to tell the PHY to re-initialize again.
> > > > >   
> > > > 
> > > > Suppose there was a method to tell the PHY driver to re-initialize
> > > > itself. What would be the key points in which the DSA switch
> > > > driver would need to trigger that method? Where is the switch
> > > > reset at runtime?  
> > > 
> > > Tristam has explained why adding the internal switch PHY errata to
> > > generic PHY code is not optimal.  
> > 
> > Yes, and I didn't understand that explanation, so I asked a
> > clarification question.
> 
> Ok. Let's wait for Tristram's answer.
> 
> > 
> > > If adding MMD generic code is a problem - then I'm fine with just
> > > clearing proper bits with just two indirect writes in the
> > > drivers/net/dsa/microchip/ksz9477.c
> > > 
> > > I would also prefer to keep the separate ksz9477_errata() function,
> > > so we could add other errata code there.
> > > 
> > > Just informative - without this patch the KSZ9477-EVB board's
> > > network is useless when the other peer has EEE enabled by default
> > > (like almost all non managed ETH switches).  
> > 
> > No, adding direct PHY MMD access code to the ksz9477 switch driver is
> > not even the biggest problem - even though, IIUC, the "workaround" to
> > disable EEE advertisement could be moved to ksz9477_get_features() in
> > drivers/net/phy/micrel.c, where phydev->supported_eee could be
> > cleared.
> 
> To be even more interesting (after looking into the PHY micrel.c code):
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804
> 
> The errata from this patch is already present.
> 
> The issue is that ksz9477_config_init() (drivers/net/phy/micrel.c) is
> executed AFTER generic phy_probe():
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phy_device.c#L3256
> in which the EEE advertisement registers are read.
> 
> Hence, those registers needs to be cleared earlier - as I do in
> ksz9477_setup() in drivers/net/dsa/microchip/ksz9477.
> 
> Here the precedence matters ...
> > 
> > The biggest problem that I see is that Oleksij Rempel has "just" added
> > EEE support to the KSZ9477 earlier this year, with an ack from Arun
> > Ramadoss: 69d3b36ca045 ("net: dsa: microchip: enable EEE support").
> > I'm not understanding why the erratum wasn't a discussion topic then.
> 
> +1

As this erratum states:  "this feature _can_ cause link drops".
For example I was indeed able to have EEE relates issue between this
switch and a link partner with AR8035 PHY. Following patch addressing
this issue:
https://lore.kernel.org/all/20230327142202.3754446-8-o.rempel@pengutronix.de/
So, in this case KSZ9477 was not the bad side.

Since this erratum do not describe exact cause of this issue or specific
link partners where this functionality is not working, I would prefer to
give the user the freedom of choice.

The same issue we have with Pause Frame support. It is not always a good
choice, but user has freedom to configure it.

Today I wont to create a test setup with different EEE capable link
partners on one side and KSZ9477 on other side and let it run some days.
Just to make sure.

Beside, are you able to reproduce this issue?

Regards,
Oleksij
Lukasz Majewski Aug. 29, 2023, 12:38 p.m. UTC | #12
Hi Oleksij,

> Hi Lukasz,
> 
> On Tue, Aug 29, 2023 at 01:24:29PM +0200, Lukasz Majewski wrote:
> > Hi Vladimir,
> >   
> > > Hi Lukasz,
> > > 
> > > On Tue, Aug 29, 2023 at 10:35:33AM +0200, Lukasz Majewski wrote:  
> > > > Hi Vladimir,
> > > >     
> > > > > On Fri, Aug 25, 2023 at 06:48:41PM +0000,
> > > > > Tristram.Ha@microchip.com wrote:    
> > > > > > > > IMHO adding functions to MMD modification would
> > > > > > > > facilitate further development (for example LED setup).
> > > > > > > >      
> > > > > > > 
> > > > > > > We already have some KSZ9477 specific initialization done
> > > > > > > in the Micrel PHY driver under drivers/net/phy/micrel.c,
> > > > > > > can we converge on the PHY driver which has a reasonable
> > > > > > > amount of infrastructure for dealing with workarounds,
> > > > > > > indirect or direct MMD accesses etc.?      
> > > > > > 
> > > > > > Actually the internal PHY used in the
> > > > > > KSZ9897/KSZ9477/KSZ9893 switches are special and only used
> > > > > > inside those switches. Putting all the switch related code
> > > > > > in Micrel PHY driver does not really help.  When the switch
> > > > > > is reset all those PHY registers need to be set again, but
> > > > > > the PHY driver only executes those code during PHY
> > > > > > initialization.  I do not know if there is a good way to
> > > > > > tell the PHY to re-initialize again. 
> > > > > 
> > > > > Suppose there was a method to tell the PHY driver to
> > > > > re-initialize itself. What would be the key points in which
> > > > > the DSA switch driver would need to trigger that method?
> > > > > Where is the switch reset at runtime?    
> > > > 
> > > > Tristam has explained why adding the internal switch PHY errata
> > > > to generic PHY code is not optimal.    
> > > 
> > > Yes, and I didn't understand that explanation, so I asked a
> > > clarification question.  
> > 
> > Ok. Let's wait for Tristram's answer.
> >   
> > >   
> > > > If adding MMD generic code is a problem - then I'm fine with
> > > > just clearing proper bits with just two indirect writes in the
> > > > drivers/net/dsa/microchip/ksz9477.c
> > > > 
> > > > I would also prefer to keep the separate ksz9477_errata()
> > > > function, so we could add other errata code there.
> > > > 
> > > > Just informative - without this patch the KSZ9477-EVB board's
> > > > network is useless when the other peer has EEE enabled by
> > > > default (like almost all non managed ETH switches).    
> > > 
> > > No, adding direct PHY MMD access code to the ksz9477 switch
> > > driver is not even the biggest problem - even though, IIUC, the
> > > "workaround" to disable EEE advertisement could be moved to
> > > ksz9477_get_features() in drivers/net/phy/micrel.c, where
> > > phydev->supported_eee could be cleared.  
> > 
> > To be even more interesting (after looking into the PHY micrel.c
> > code):
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804
> > 
> > The errata from this patch is already present.
> > 
> > The issue is that ksz9477_config_init() (drivers/net/phy/micrel.c)
> > is executed AFTER generic phy_probe():
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phy_device.c#L3256
> > in which the EEE advertisement registers are read.
> > 
> > Hence, those registers needs to be cleared earlier - as I do in
> > ksz9477_setup() in drivers/net/dsa/microchip/ksz9477.
> > 
> > Here the precedence matters ...  
> > > 
> > > The biggest problem that I see is that Oleksij Rempel has "just"
> > > added EEE support to the KSZ9477 earlier this year, with an ack
> > > from Arun Ramadoss: 69d3b36ca045 ("net: dsa: microchip: enable
> > > EEE support"). I'm not understanding why the erratum wasn't a
> > > discussion topic then.  
> > 
> > +1  
> 
> As this erratum states:  "this feature _can_ cause link drops".
> For example I was indeed able to have EEE relates issue between this
> switch and a link partner with AR8035 PHY. Following patch addressing
> this issue:
> https://lore.kernel.org/all/20230327142202.3754446-8-o.rempel@pengutronix.de/
> So, in this case KSZ9477 was not the bad side.
> 

The errata: http://ww1.microchip.com/downloads/jp/DeviceDoc/jp599888.pdf

Module 4, "End user implications":
--------8<----------
If the link partner is not known, or if the link partner is EEE
capable, then the EEE feature should be manually disabled to avoid link
drop problems.
-------->8----------

> Since this erratum do not describe exact cause of this issue

IMHO, it does - "The EEE feature is enabled by default, but it is not
fully operational. "

It looks like some silicon issue - which in details is probably only
known to Micrel/Microchip.

> or
> specific link partners where this functionality is not working, I
> would prefer to give the user the freedom of choice.

The problem is that - the user - would encounter broken network when
connected to per advertising EEE. 

Hence, I would prefer to apply the Errata and then somebody, who would
like to enable EEE can try if it works for him.

IMHO, code to fix erratas shall be added unconditionally, without any
"freedom of choice".

> 
> The same issue we have with Pause Frame support. It is not always a
> good choice, but user has freedom to configure it.
> 
> Today I wont to create a test setup with different EEE capable link
> partners on one side and KSZ9477 on other side and let it run some
> days. Just to make sure.
> 
> Beside, are you able to reproduce this issue?
> 

Yes, I can reproduce the issue. I do use two Microchip's development
boards (KSZ9477-EVB [1]) connected together to test HSR as well as
communication with HOST PC.

The network on this board without this patch is not usable (continually
I do encounter link up/downs).

Another test scenario is to connect this board to non-managed ETH
switch (which shall have the EEE advertised by default).


Please be also aware, that this errata fix is (implicitly I think)
already present in the kernel:
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804

However, the execution order of PHY/DSA functions with newest mainline
makes it not working any more (I've described it in details in the
earlier mail to Vladimir).

> Regards,
> Oleksij

Links:
[1] - https://www.microchip.com/en-us/development-tool/evb-ksz9477-1


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Oleksij Rempel Aug. 29, 2023, 2:42 p.m. UTC | #13
On Tue, Aug 29, 2023 at 02:38:29PM +0200, Lukasz Majewski wrote:
> Hi Oleksij,

...

> Hence, I would prefer to apply the Errata and then somebody, who would
> like to enable EEE can try if it works for him.

ok.

> IMHO, code to fix erratas shall be added unconditionally, without any
> "freedom of choic

This claim is not consistent with the patch. To make it without ability
to enable EEE, you will need to clear all eee_supported bits.
If this HW is really so broken, then it is the we how it should be
fixed.

> > Beside, are you able to reproduce this issue?
> 
> Yes, I can reproduce the issue. I do use two Microchip's development
> boards (KSZ9477-EVB [1]) connected together to test HSR as well as
> communication with HOST PC.

I use KSZ9477-EVB as well.

> The network on this board without this patch is not usable (continually
> I do encounter link up/downs).

My test setup runs currently about two hours. It had 4 link drops on LAN3 and
none on other ports. Swapping cables connected to LAN2 and LAN3 still let the
LAN3 sometimes drop the connection. So far, for example LAN2 works stable and
this is probably the reason why I have not seen this issue before.
After disabling EEE on LAN3 I start getting drops on LAN2.

> Please be also aware, that this errata fix is (implicitly I think)
> already present in the kernel:
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804
> 
> However, the execution order of PHY/DSA functions with newest mainline
> makes it not working any more (I've described it in details in the
> earlier mail to Vladimir).

Ok, since it was already not advertised by default, I have nothing against
having default policy to not advertise EEE for this switch.

On other hand, since this functionality is not listed as supported by the
KSZ9477 datasheet (No word about IEEE 802.3az Energy Efficient Ethernet (EEE))
compared to KSZ8565R datasheet (where EEE support is listed) and it is
confirmed to work not stable enough, then it should be disabled properly.
The phydev->supported_eee should be cleared. See ksz9477_get_features().


Regards,
Oleksij
Lukasz Majewski Aug. 29, 2023, 3:29 p.m. UTC | #14
Hi Oleksij,

> On Tue, Aug 29, 2023 at 02:38:29PM +0200, Lukasz Majewski wrote:
> > Hi Oleksij,  
> 
> ...
> 
> > Hence, I would prefer to apply the Errata and then somebody, who
> > would like to enable EEE can try if it works for him.  
> 
> ok.
> 
> > IMHO, code to fix erratas shall be added unconditionally, without
> > any "freedom of choic  
> 
> This claim is not consistent with the patch. To make it without
> ability to enable EEE, you will need to clear all eee_supported bits.
> If this HW is really so broken, then it is the we how it should be
> fixed.
> 
> > > Beside, are you able to reproduce this issue?  
> > 
> > Yes, I can reproduce the issue. I do use two Microchip's development
> > boards (KSZ9477-EVB [1]) connected together to test HSR as well as
> > communication with HOST PC.  
> 
> I use KSZ9477-EVB as well.
> 
> > The network on this board without this patch is not usable
> > (continually I do encounter link up/downs).  
> 
> My test setup runs currently about two hours. It had 4 link drops on
> LAN3 and none on other ports. Swapping cables connected to LAN2 and
> LAN3 still let the LAN3 sometimes drop the connection. So far, for
> example LAN2 works stable and this is probably the reason why I have
> not seen this issue before. After disabling EEE on LAN3 I start
> getting drops on LAN2.
> 

Ok.

> > Please be also aware, that this errata fix is (implicitly I think)
> > already present in the kernel:
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804
> > 
> > However, the execution order of PHY/DSA functions with newest
> > mainline makes it not working any more (I've described it in
> > details in the earlier mail to Vladimir).  
> 
> Ok, since it was already not advertised by default, I have nothing
> against having default policy to not advertise EEE for this switch.
> 

Ok.

> On other hand, since this functionality is not listed as supported by
> the KSZ9477 datasheet (No word about IEEE 802.3az Energy Efficient
> Ethernet (EEE)) compared to KSZ8565R datasheet (where EEE support is
> listed) and it is confirmed to work not stable enough, then it should
> be disabled properly.

I've described this problem in more details here:
https://lore.kernel.org/lkml/20230829132429.529283be@wsk/

-------->8---------
The issue is that ksz9477_config_init() (drivers/net/phy/micrel.c) is
executed AFTER generic phy_probe():
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phy_device.c#L3256
in which the EEE advertisement registers are read.

Hence, those registers needs to be cleared earlier - as I do in
ksz9477_setup() in drivers/net/dsa/microchip/ksz9477.

Here the precedence matters ...
----------8<-------------

> The phydev->supported_eee should be cleared.
> See ksz9477_get_features().
> 

Removing the linkmod_and() from ksz9477_get_features():
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1408

doesn't help.

It looks like it is done too late (please read the above e-mail). We
would need to disable the eee support at all for this switch IC or
apply the original version of this patch (I mean clear in-KSZ9477 EEE
advertisement register early).

> 
> Regards,
> Oleksij




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Oleksij Rempel Aug. 29, 2023, 5:12 p.m. UTC | #15
On Tue, Aug 29, 2023 at 05:29:13PM +0200, Lukasz Majewski wrote:
> Hi Oleksij,
> > On other hand, since this functionality is not listed as supported by
> > the KSZ9477 datasheet (No word about IEEE 802.3az Energy Efficient
> > Ethernet (EEE)) compared to KSZ8565R datasheet (where EEE support is
> > listed) and it is confirmed to work not stable enough, then it should
> > be disabled properly.
> 
> I've described this problem in more details here:
> https://lore.kernel.org/lkml/20230829132429.529283be@wsk/
> 
> -------->8---------
> The issue is that ksz9477_config_init() (drivers/net/phy/micrel.c) is
> executed AFTER generic phy_probe():
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phy_device.c#L3256
> in which the EEE advertisement registers are read.
> 
> Hence, those registers needs to be cleared earlier - as I do in
> ksz9477_setup() in drivers/net/dsa/microchip/ksz9477.
> 
> Here the precedence matters ...
> ----------8<-------------
> 
> > The phydev->supported_eee should be cleared.
> > See ksz9477_get_features().
> > 
> 
> Removing the linkmod_and() from ksz9477_get_features():
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1408
> 
> doesn't help.

Yes, removing linkmod_and() will not should not help. I said, "The
phydev->supported_eee should be cleared."

For example like this:
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -1400,6 +1400,7 @@ static int ksz9131_config_aneg(struct phy_device *phydev)
 
 static int ksz9477_get_features(struct phy_device *phydev)
 {
+       __ETHTOOL_DECLARE_LINK_MODE_MASK(zero) = { 0, };
        int ret;
 
        ret = genphy_read_abilities(phydev);
@@ -1413,7 +1414,7 @@ static int ksz9477_get_features(struct phy_device *phydev)
         * KSZ8563R should have 100BaseTX/Full only.
         */
        linkmode_and(phydev->supported_eee, phydev->supported,
-                    PHY_EEE_CAP1_FEATURES);
+                    zero);
 
        return 0;
 }

You will need to clear it only on KSZ9477 variants with this bug.
This change is tested and it works on my KSZ9477-EVB.

Regards,
Oleksij
Tristram.Ha@microchip.com Aug. 29, 2023, 9:57 p.m. UTC | #16
> On Fri, Aug 25, 2023 at 06:48:41PM +0000, Tristram.Ha@microchip.com wrote:
> > > > IMHO adding functions to MMD modification would facilitate further
> > > > development (for example LED setup).
> > >
> > > We already have some KSZ9477 specific initialization done in the Micrel
> > > PHY driver under drivers/net/phy/micrel.c, can we converge on the PHY
> > > driver which has a reasonable amount of infrastructure for dealing with
> > > workarounds, indirect or direct MMD accesses etc.?
> >
> > Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893 switches
> > are special and only used inside those switches.  Putting all the switch
> > related code in Micrel PHY driver does not really help.  When the switch
> > is reset all those PHY registers need to be set again, but the PHY driver
> > only executes those code during PHY initialization.  I do not know if
> > there is a good way to tell the PHY to re-initialize again.
> 
> Suppose there was a method to tell the PHY driver to re-initialize itself.
> What would be the key points in which the DSA switch driver would need
> to trigger that method? Where is the switch reset at runtime?

Currently the DSA switch driver loads independently and is then
controlled by the main DSA driver.  The switch is reset during
initialization, and later the PHYs are initialized.  I was talking
hypothetically that the switch may need to be reset to correct some
hardware problems, but then there may be no good way to tell the PHYs to
re-initialize.
Florian Fainelli Aug. 29, 2023, 10 p.m. UTC | #17
On 8/29/23 14:57, Tristram.Ha@microchip.com wrote:
>> On Fri, Aug 25, 2023 at 06:48:41PM +0000, Tristram.Ha@microchip.com wrote:
>>>>> IMHO adding functions to MMD modification would facilitate further
>>>>> development (for example LED setup).
>>>>
>>>> We already have some KSZ9477 specific initialization done in the Micrel
>>>> PHY driver under drivers/net/phy/micrel.c, can we converge on the PHY
>>>> driver which has a reasonable amount of infrastructure for dealing with
>>>> workarounds, indirect or direct MMD accesses etc.?
>>>
>>> Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893 switches
>>> are special and only used inside those switches.  Putting all the switch
>>> related code in Micrel PHY driver does not really help.  When the switch
>>> is reset all those PHY registers need to be set again, but the PHY driver
>>> only executes those code during PHY initialization.  I do not know if
>>> there is a good way to tell the PHY to re-initialize again.
>>
>> Suppose there was a method to tell the PHY driver to re-initialize itself.
>> What would be the key points in which the DSA switch driver would need
>> to trigger that method? Where is the switch reset at runtime?
> 
> Currently the DSA switch driver loads independently and is then
> controlled by the main DSA driver.  The switch is reset during
> initialization, and later the PHYs are initialized.  I was talking
> hypothetically that the switch may need to be reset to correct some
> hardware problems, but then there may be no good way to tell the PHYs to
> re-initialize.

There is phy_init_hw() which will do just that.
Tristram.Ha@microchip.com Aug. 29, 2023, 10:23 p.m. UTC | #18
> Yes, removing linkmod_and() will not should not help. I said, "The
> phydev->supported_eee should be cleared."
> 
> For example like this:
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -1400,6 +1400,7 @@ static int ksz9131_config_aneg(struct phy_device *phydev)
> 
>  static int ksz9477_get_features(struct phy_device *phydev)
>  {
> +       __ETHTOOL_DECLARE_LINK_MODE_MASK(zero) = { 0, };
>         int ret;
> 
>         ret = genphy_read_abilities(phydev);
> @@ -1413,7 +1414,7 @@ static int ksz9477_get_features(struct phy_device
> *phydev)
>          * KSZ8563R should have 100BaseTX/Full only.
>          */
>         linkmode_and(phydev->supported_eee, phydev->supported,
> -                    PHY_EEE_CAP1_FEATURES);
> +                    zero);
> 
>         return 0;
>  }
> 
> You will need to clear it only on KSZ9477 variants with this bug.
> This change is tested and it works on my KSZ9477-EVB.
 
I think this is best for disabling EEE support.
I think before some customers asked for Ethtool EEE support not because
they want to use it but to disable it because of link instability.
KSZ9893/KSZ9563 switches should have the same problem.  The EEE problem
depends on the link partner.  For example my laptop does not have problem
even though EEE is enabled, although I am not sure if EEE is really
active.  The problem here is just using two KSZ9477 switches and
programming those PHY setup values and enabling EEE will make the link
unstable.  Management decided to disable EEE feature to avoid customer
support issues.
Another issue is EEE should be disabled when using 1588 PTP.
Oleksij Rempel Aug. 30, 2023, 6:16 a.m. UTC | #19
On Tue, Aug 29, 2023 at 10:23:26PM +0000, Tristram.Ha@microchip.com wrote:
> > Yes, removing linkmod_and() will not should not help. I said, "The
> > phydev->supported_eee should be cleared."
> > 
> > For example like this:
> > --- a/drivers/net/phy/micrel.c
> > +++ b/drivers/net/phy/micrel.c
> > @@ -1400,6 +1400,7 @@ static int ksz9131_config_aneg(struct phy_device *phydev)
> > 
> >  static int ksz9477_get_features(struct phy_device *phydev)
> >  {
> > +       __ETHTOOL_DECLARE_LINK_MODE_MASK(zero) = { 0, };
> >         int ret;
> > 
> >         ret = genphy_read_abilities(phydev);
> > @@ -1413,7 +1414,7 @@ static int ksz9477_get_features(struct phy_device
> > *phydev)
> >          * KSZ8563R should have 100BaseTX/Full only.
> >          */
> >         linkmode_and(phydev->supported_eee, phydev->supported,
> > -                    PHY_EEE_CAP1_FEATURES);
> > +                    zero);
> > 
> >         return 0;
> >  }
> > 
> > You will need to clear it only on KSZ9477 variants with this bug.
> > This change is tested and it works on my KSZ9477-EVB.
>  
> I think this is best for disabling EEE support.
> I think before some customers asked for Ethtool EEE support not because
> they want to use it but to disable it because of link instability.
> KSZ9893/KSZ9563 switches should have the same problem.  The EEE problem
> depends on the link partner.  For example my laptop does not have problem
> even though EEE is enabled, although I am not sure if EEE is really
> active.  The problem here is just using two KSZ9477 switches and
> programming those PHY setup values and enabling EEE will make the link
> unstable.  Management decided to disable EEE feature to avoid customer
> support issues.
> Another issue is EEE should be disabled when using 1588 PTP.
> 

I'd like to share my thoughts on the concerns raised:

- KSZ9477 & EEE: Disabling EEE for the KSZ9477 makes sense, especially since
the datasheet doesn't list it as supported.

- EEE Support for KSZ9893 & KSZ9563: The datasheets for the KSZ9893 indicate
support for EEE. The erratum recommends making adjustments to the "transmit
Refresh Time for Waketx to meet the IEEE Refresh Time specification" instead of
turning it off completely. The same applies to KSZ9563. We should consider
these adjustments.

- General Experience with KSZ Chips*: From my experience with these chips,
irrespective of the EEE functionality, only the 1000/full and 100/full link
modes tend to be stable enough.

- Responsibility to Customers and Certifications: As a part of the product
supply chain, I believe in our commitment to honesty with our customers. When
we select components for end products, we trust their listed features. For
instance, designing for ENERGY STAR certification requires that all
copper-based physical network ports in an LNE product must be compliant with
IEEE 802.3 Clause 78, which mandates Energy Efficient Ethernet. If Microchip
promotes a KSZ* chip with EEE as a feature, it becomes a cornerstone of the end
product. Negating a key functionality, which was sold and then incorporated
into the product, could risk non-compliance with certification criteria.

- Consistency in Product Representation: If the overarching company decision is
to disable the EEE feature for all chips to preempt potential customer support
issues, our product information should mirror this change. It's crucial that we
neither misrepresent nor over-promise on features. Your deeper insights, given
your involvement with Microchip's strategy, would be most enlightening.

Regards,
Oleksij
Lukasz Majewski Aug. 30, 2023, 8:13 a.m. UTC | #20
Hi Oleksij,

> On Tue, Aug 29, 2023 at 10:23:26PM +0000, Tristram.Ha@microchip.com
> wrote:
> > > Yes, removing linkmod_and() will not should not help. I said, "The
> > > phydev->supported_eee should be cleared."
> > > 
> > > For example like this:
> > > --- a/drivers/net/phy/micrel.c
> > > +++ b/drivers/net/phy/micrel.c
> > > @@ -1400,6 +1400,7 @@ static int ksz9131_config_aneg(struct
> > > phy_device *phydev)
> > > 
> > >  static int ksz9477_get_features(struct phy_device *phydev)
> > >  {
> > > +       __ETHTOOL_DECLARE_LINK_MODE_MASK(zero) = { 0, };
> > >         int ret;
> > > 
> > >         ret = genphy_read_abilities(phydev);
> > > @@ -1413,7 +1414,7 @@ static int ksz9477_get_features(struct
> > > phy_device *phydev)
> > >          * KSZ8563R should have 100BaseTX/Full only.
> > >          */
> > >         linkmode_and(phydev->supported_eee, phydev->supported,
> > > -                    PHY_EEE_CAP1_FEATURES);
> > > +                    zero);
> > > 
> > >         return 0;
> > >  }
> > > 
> > > You will need to clear it only on KSZ9477 variants with this bug.
> > > This change is tested and it works on my KSZ9477-EVB.  
> >  
> > I think this is best for disabling EEE support.
> > I think before some customers asked for Ethtool EEE support not
> > because they want to use it but to disable it because of link
> > instability. KSZ9893/KSZ9563 switches should have the same problem.
> >  The EEE problem depends on the link partner.  For example my
> > laptop does not have problem even though EEE is enabled, although I
> > am not sure if EEE is really active.  The problem here is just
> > using two KSZ9477 switches and programming those PHY setup values
> > and enabling EEE will make the link unstable.  Management decided
> > to disable EEE feature to avoid customer support issues.
> > Another issue is EEE should be disabled when using 1588 PTP.
> >   
> 
> I'd like to share my thoughts on the concerns raised:
> 
> - KSZ9477 & EEE: Disabling EEE for the KSZ9477 makes sense,
> especially since the datasheet doesn't list it as supported.
> 

+1

I will prepare proper patch

> - EEE Support for KSZ9893 & KSZ9563: The datasheets for the KSZ9893
> indicate support for EEE. The erratum recommends making adjustments
> to the "transmit Refresh Time for Waketx to meet the IEEE Refresh
> Time specification" instead of turning it off completely. The same
> applies to KSZ9563. We should consider these adjustments.
> 
> - General Experience with KSZ Chips*: From my experience with these
> chips, irrespective of the EEE functionality, only the 1000/full and
> 100/full link modes tend to be stable enough.
> 
> - Responsibility to Customers and Certifications: As a part of the
> product supply chain, I believe in our commitment to honesty with our
> customers. When we select components for end products, we trust their
> listed features. For instance, designing for ENERGY STAR
> certification requires that all copper-based physical network ports
> in an LNE product must be compliant with IEEE 802.3 Clause 78, which
> mandates Energy Efficient Ethernet. If Microchip promotes a KSZ* chip
> with EEE as a feature, it becomes a cornerstone of the end product.
> Negating a key functionality, which was sold and then incorporated
> into the product, could risk non-compliance with certification
> criteria.
> 
> - Consistency in Product Representation: If the overarching company
> decision is to disable the EEE feature for all chips to preempt
> potential customer support issues, our product information should
> mirror this change. It's crucial that we neither misrepresent nor
> over-promise on features. Your deeper insights, given your
> involvement with Microchip's strategy, would be most enlightening.
> 

For the last two - at least the KSZ9477 Errata was clear about the
issue. Other (switch IC) vendors have even problems to admit that
something is wrong with their silicon design...

> Regards,
> Oleksij




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
diff mbox series

Patch

diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
index cb6aa7c668a8..563f497ba656 100644
--- a/drivers/net/dsa/microchip/ksz9477.c
+++ b/drivers/net/dsa/microchip/ksz9477.c
@@ -1128,6 +1128,44 @@  int ksz9477_enable_stp_addr(struct ksz_device *dev)
 	return 0;
 }
 
+static int ksz9477_errata(struct dsa_switch *ds)
+{
+	struct ksz_device *dev = ds->priv;
+	u16 val;
+	int p;
+
+	/* KSZ9477 Errata DS80000754C
+	 *
+	 * Module 4: Energy Efficient Ethernet (EEE) feature select must be
+	 * manually disabled
+	 *   The EEE feature is enabled by default, but it is not fully
+	 *   operational. It must be manually disabled through register
+	 *   controls. If not disabled, the PHY ports can auto-negotiate
+	 *   to enable EEE, and this feature can cause link drops when linked
+	 *   to another device supporting EEE.
+	 *
+	 *   Only PHY ports (dsa user) [0-4] need to have the EEE advertisement
+	 *   bits cleared.
+	 */
+
+	for (p = 0; p < ds->num_ports; p++) {
+		if (!dsa_is_user_port(ds, p))
+			continue;
+
+		ksz9477_port_mmd_read(dev, p, MMD_DEVICE_ID_EEE_ADV,
+				      MMD_EEE_ADV, &val, 1);
+
+		pr_err("%s: PORT: %d val: 0x%x pc: %d\n", __func__, p, val,
+		       ds->num_ports);
+
+		val &= ~(EEE_ADV_100MBIT | EEE_ADV_1GBIT);
+		ksz9477_port_mmd_write(dev, p, MMD_DEVICE_ID_EEE_ADV,
+				       MMD_EEE_ADV, &val, 1);
+	}
+
+	return 0;
+}
+
 int ksz9477_setup(struct dsa_switch *ds)
 {
 	struct ksz_device *dev = ds->priv;
@@ -1157,7 +1195,7 @@  int ksz9477_setup(struct dsa_switch *ds)
 	/* enable global MIB counter freeze function */
 	ksz_cfg(dev, REG_SW_MAC_CTRL_6, SW_MIB_COUNTER_FREEZE, true);
 
-	return 0;
+	return ksz9477_errata(ds);
 }
 
 u32 ksz9477_get_port_addr(int port, int offset)