diff mbox series

[net] ixgbe: set X550 MDIO speed before talking to PHY

Message ID 81be24c4-a7e4-0761-abf4-204f4849b6eb@lynx.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] ixgbe: set X550 MDIO speed before talking to PHY | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 1 maintainers not CCed: anthony.l.nguyen@intel.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Cyril Novikov Oct. 29, 2021, 1:03 a.m. UTC
The MDIO bus speed must be initialized before talking to the PHY the first
time in order to avoid talking to it using a speed that the PHY doesn't
support.

This fixes HW initialization error -17 (IXGBE_ERR_PHY_ADDR_INVALID) on
Denverton CPUs (a.k.a. the Atom C3000 family) on ports with a 10Gb network
plugged in. On those devices, HLREG0[MDCSPD] resets to 1, which combined
with the 10Gb network results in a 24MHz MDIO speed, which is apparently
too fast for the connected PHY. PHY register reads over MDIO bus return
garbage, leading to initialization failure.

Signed-off-by: Cyril Novikov <cnovikov@lynx.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 3 +++
 1 file changed, 3 insertions(+)

Reproduced with Linux kernel 4.19 and 5.15-rc7. Can be reproduced using
the following setup:

* Use an Atom C3000 family system with at least one X550 LAN on the SoC
* Disable PXE or other BIOS network initialization if possible
  (the interface must not be initialized before Linux boots)
* Connect a live 10Gb Ethernet cable to an X550 port
* Power cycle (not reset, doesn't always work) the system and boot Linux
* Observe: ixgbe interfaces w/ 10GbE cables plugged in fail with error -17

Comments

Paul Menzel Oct. 29, 2021, 6:47 a.m. UTC | #1
Dear Cyril,


On 29.10.21 03:03, Cyril Novikov wrote:
> The MDIO bus speed must be initialized before talking to the PHY the first
> time in order to avoid talking to it using a speed that the PHY doesn't
> support.
> 
> This fixes HW initialization error -17 (IXGBE_ERR_PHY_ADDR_INVALID) on
> Denverton CPUs (a.k.a. the Atom C3000 family) on ports with a 10Gb network
> plugged in. On those devices, HLREG0[MDCSPD] resets to 1, which combined
> with the 10Gb network results in a 24MHz MDIO speed, which is apparently
> too fast for the connected PHY. PHY register reads over MDIO bus return
> garbage, leading to initialization failure.

Maybe add a Fixes tag?

> Signed-off-by: Cyril Novikov <cnovikov@lynx.com>
> ---
>   drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> Reproduced with Linux kernel 4.19 and 5.15-rc7. Can be reproduced using
> the following setup:
> 
> * Use an Atom C3000 family system with at least one X550 LAN on the SoC
> * Disable PXE or other BIOS network initialization if possible
>    (the interface must not be initialized before Linux boots)
> * Connect a live 10Gb Ethernet cable to an X550 port
> * Power cycle (not reset, doesn't always work) the system and boot Linux
> * Observe: ixgbe interfaces w/ 10GbE cables plugged in fail with error -17

Why not add that to the commit message?

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
> index 9724ffb16518..e4b50c7781ff 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
> @@ -3405,6 +3405,9 @@ static s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw)
>   	/* flush pending Tx transactions */
>   	ixgbe_clear_tx_pending(hw);
>   
> +	/* set MDIO speed before talking to the PHY in case it's the 1st time */
> +	ixgbe_set_mdio_speed(hw);
> +
>   	/* PHY ops must be identified and initialized prior to reset */
>   	status = hw->phy.ops.init(hw);
>   	if (status == IXGBE_ERR_SFP_NOT_SUPPORTED ||
> 

Is `ixgbe_set_mdio_speed(hw)` at the end of the function then still needed?


Kind regards,

Paul
Cyril Novikov Oct. 29, 2021, 11:06 p.m. UTC | #2
On 10/28/2021 11:47 PM, Paul Menzel wrote:
> Dear Cyril,
> 
> 
> On 29.10.21 03:03, Cyril Novikov wrote:
>> The MDIO bus speed must be initialized before talking to the PHY the 
>> first
>> time in order to avoid talking to it using a speed that the PHY doesn't
>> support.
>>
>> This fixes HW initialization error -17 (IXGBE_ERR_PHY_ADDR_INVALID) on
>> Denverton CPUs (a.k.a. the Atom C3000 family) on ports with a 10Gb 
>> network
>> plugged in. On those devices, HLREG0[MDCSPD] resets to 1, which combined
>> with the 10Gb network results in a 24MHz MDIO speed, which is apparently
>> too fast for the connected PHY. PHY register reads over MDIO bus return
>> garbage, leading to initialization failure.
> 
> Maybe add a Fixes tag?

This is my first patch submission for Linux kernel. What I read about 
the Fixes tag says it identifies a previous commit that had introduced 
the bug. I have no idea which commit introduced this bug. We saw it in 
4.19 which probably means the bug was always there and is not a 
regression. It's also quite possible the original commit was correct for 
the hardware existing at that time and it only started behaving 
incorrectly with new hardware, so it wasn't actually a bug at the time 
it was submitted. I also don't have the capability or time to bisect 
this problem.

>> Signed-off-by: Cyril Novikov <cnovikov@lynx.com>
>> ---
>>   drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> Reproduced with Linux kernel 4.19 and 5.15-rc7. Can be reproduced using
>> the following setup:
>>
>> * Use an Atom C3000 family system with at least one X550 LAN on the SoC
>> * Disable PXE or other BIOS network initialization if possible
>>    (the interface must not be initialized before Linux boots)
>> * Connect a live 10Gb Ethernet cable to an X550 port
>> * Power cycle (not reset, doesn't always work) the system and boot Linux
>> * Observe: ixgbe interfaces w/ 10GbE cables plugged in fail with error 
>> -17
> 
> Why not add that to the commit message?

I wasn't sure if the reproduction scenario belonged to the commit 
message, and have no problem adding it if you believe it does.

>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c 
>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
>> index 9724ffb16518..e4b50c7781ff 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
>> @@ -3405,6 +3405,9 @@ static s32 ixgbe_reset_hw_X550em(struct ixgbe_hw 
>> *hw)
>>       /* flush pending Tx transactions */
>>       ixgbe_clear_tx_pending(hw);
>> +    /* set MDIO speed before talking to the PHY in case it's the 1st 
>> time */
>> +    ixgbe_set_mdio_speed(hw);
>> +
>>       /* PHY ops must be identified and initialized prior to reset */
>>       status = hw->phy.ops.init(hw);
>>       if (status == IXGBE_ERR_SFP_NOT_SUPPORTED ||
>>
> 
> Is `ixgbe_set_mdio_speed(hw)` at the end of the function then still needed?

The code between the two calls issues a global reset to the MAC and 
optionally the link, depending on some flags. That may reset the MDIO 
speed back to the wrong value or, according to the comments in the code, 
may reset the PHY and result in renegotiation and a different link 
speed. So, the MDIO speed setting may require an adjustment. Even if it 
actually doesn't at the moment, doing the second call makes the code 
robust to future software and hardware changes.

--
Cyril
Andrew Lunn Oct. 30, 2021, 5:28 p.m. UTC | #3
On Fri, Oct 29, 2021 at 04:06:26PM -0700, Cyril Novikov wrote:
> On 10/28/2021 11:47 PM, Paul Menzel wrote:
> > Dear Cyril,
> > 
> > 
> > On 29.10.21 03:03, Cyril Novikov wrote:
> > > The MDIO bus speed must be initialized before talking to the PHY the
> > > first
> > > time in order to avoid talking to it using a speed that the PHY doesn't
> > > support.
> > > 
> > > This fixes HW initialization error -17 (IXGBE_ERR_PHY_ADDR_INVALID) on
> > > Denverton CPUs (a.k.a. the Atom C3000 family) on ports with a 10Gb
> > > network
> > > plugged in. On those devices, HLREG0[MDCSPD] resets to 1, which combined
> > > with the 10Gb network results in a 24MHz MDIO speed, which is apparently
> > > too fast for the connected PHY. PHY register reads over MDIO bus return
> > > garbage, leading to initialization failure.
> > 
> > Maybe add a Fixes tag?
> 
> This is my first patch submission for Linux kernel.

Welcome to the community.

> What I read about the
> Fixes tag says it identifies a previous commit that had introduced the bug.
> I have no idea which commit introduced this bug. We saw it in 4.19 which
> probably means the bug was always there and is not a regression. It's also
> quite possible the original commit was correct for the hardware existing at
> that time and it only started behaving incorrectly with new hardware, so it
> wasn't actually a bug at the time it was submitted. I also don't have the
> capability or time to bisect this problem.

From how you describe it, i assume the issue is present for any 10G
links? git blame suggests:

e84db7272798e (Mark Rustad         2016-04-01 12:18:30 -0700 3357) static void ixgbe_set_mdio_speed(struct ixgbe_hw *hw)
e84db7272798e (Mark Rustad         2016-04-01 12:18:30 -0700 3358) {
e84db7272798e (Mark Rustad         2016-04-01 12:18:30 -0700 3359)      u32 hlreg0;
e84db7272798e (Mark Rustad         2016-04-01 12:18:30 -0700 3360) 
e84db7272798e (Mark Rustad         2016-04-01 12:18:30 -0700 3361)      switch (hw->device_id) {
e84db7272798e (Mark Rustad         2016-04-01 12:18:30 -0700 3362)      case IXGBE_DEV_ID_X550EM_X_10G_T:
a83c27e79068c (Don Skidmore        2016-08-17 17:34:07 -0400 3363)      case IXGBE_DEV_ID_X550EM_A_SGMII:
a83c27e79068c (Don Skidmore        2016-08-17 17:34:07 -0400 3364)      case IXGBE_DEV_ID_X550EM_A_SGMII_L:
92ed84300718d (Don Skidmore        2016-08-17 20:34:40 -0400 3365)      case IXGBE_DEV_ID_X550EM_A_10G_T:

commit e84db7272798ed8abb2760a3fcd9c6d89abf99a5
Author: Mark Rustad <mark.d.rustad@intel.com>
Date:   Fri Apr 1 12:18:30 2016 -0700

    ixgbe: Introduce function to control MDIO speed
    
    Move code that controls MDIO speed into a new function because
    there will be more MACs that need the control.
    
    Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
    Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

So the fixes would be

Fixes: e84db7272798 ("ixgbe: Introduce function to control MDIO speed")

> > > Signed-off-by: Cyril Novikov <cnovikov@lynx.com>
> > > ---
> > >   drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > Reproduced with Linux kernel 4.19 and 5.15-rc7. Can be reproduced using
> > > the following setup:
> > > 
> > > * Use an Atom C3000 family system with at least one X550 LAN on the SoC
> > > * Disable PXE or other BIOS network initialization if possible
> > >    (the interface must not be initialized before Linux boots)
> > > * Connect a live 10Gb Ethernet cable to an X550 port
> > > * Power cycle (not reset, doesn't always work) the system and boot Linux
> > > * Observe: ixgbe interfaces w/ 10GbE cables plugged in fail with
> > > error -17
> > 
> > Why not add that to the commit message?
> 
> I wasn't sure if the reproduction scenario belonged to the commit message,
> and have no problem adding it if you believe it does.

> > 
> > Is `ixgbe_set_mdio_speed(hw)` at the end of the function then still needed?
> 
> The code between the two calls issues a global reset to the MAC and
> optionally the link, depending on some flags. That may reset the MDIO speed
> back to the wrong value or, according to the comments in the code, may reset
> the PHY and result in renegotiation and a different link speed. So, the MDIO
> speed setting may require an adjustment. Even if it actually doesn't at the
> moment, doing the second call makes the code robust to future software and
> hardware changes.

This is useful information to put in the commit message.

When writing commit messages, try to also think from the perspective
of the person doing the review. What questions are the reviewers
likely to ask, and can those questions be answered in the commit
message, rather than having them asked on the list?

Another use case of the commit message is when it turns out a change
causes a regression. It happens sometimes, and including information
about how you tested your change can be useful for helping fix the
regression. It allows whoever is fixing the regression to also test
your case, or at least something similar.

So in general, more information in the commit messages is better than
less.

     Andrew
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index 9724ffb16518..e4b50c7781ff 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -3405,6 +3405,9 @@  static s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw)
 	/* flush pending Tx transactions */
 	ixgbe_clear_tx_pending(hw);
 
+	/* set MDIO speed before talking to the PHY in case it's the 1st time */
+	ixgbe_set_mdio_speed(hw);
+
 	/* PHY ops must be identified and initialized prior to reset */
 	status = hw->phy.ops.init(hw);
 	if (status == IXGBE_ERR_SFP_NOT_SUPPORTED ||