diff mbox series

[RESEND,net-next,v3,3/4] net: phy: aquantia: wait for the GLOBAL_CFG to start returning real values

Message ID 20240708075023.14893-4-brgl@bgdev.pl (mailing list archive)
State Accepted
Commit 708405f3e56e3bee086f46a43398ee300c36a1d8
Delegated to: Netdev Maintainers
Headers show
Series net: phy: aquantia: enable support for aqr115c | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 816 this patch: 816
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: ansuelsmth@gmail.com
netdev/build_clang success Errors and warnings before: 821 this patch: 821
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 821 this patch: 821
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-08--15-00 (tests: 694)

Commit Message

Bartosz Golaszewski July 8, 2024, 7:50 a.m. UTC
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

When the PHY is first coming up (or resuming from suspend), it's
possible that although the FW status shows as running, we still see
zeroes in the GLOBAL_CFG set of registers and cannot determine available
modes. Since all models support 10M, add a poll and wait the config to
become available.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
---
 drivers/net/phy/aquantia/aquantia_main.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Jon Hunter July 18, 2024, 12:23 p.m. UTC | #1
Hi Bartosz,

On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> When the PHY is first coming up (or resuming from suspend), it's
> possible that although the FW status shows as running, we still see
> zeroes in the GLOBAL_CFG set of registers and cannot determine available
> modes. Since all models support 10M, add a poll and wait the config to
> become available.
> 
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> ---
>   drivers/net/phy/aquantia/aquantia_main.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
> index 974795bd0860..2c8ba2725a91 100644
> --- a/drivers/net/phy/aquantia/aquantia_main.c
> +++ b/drivers/net/phy/aquantia/aquantia_main.c
> @@ -652,7 +652,13 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
>   	unsigned long *possible = phydev->possible_interfaces;
>   	unsigned int serdes_mode, rate_adapt;
>   	phy_interface_t interface;
> -	int i, val;
> +	int i, val, ret;
> +
> +	ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
> +					VEND1_GLOBAL_CFG_10M, val, val != 0,
> +					1000, 100000, false);
> +	if (ret)
> +		return ret;
>   
>   	/* Walk the media-speed configuration registers to determine which
>   	 * host-side serdes modes may be used by the PHY depending on the


With the current -next and mainline we are seeing the following issue on
our Tegra234 Jetson AGX Orin platform ...

  Aquantia AQR113C stmmac-0:00: aqr107_fill_interface_modes failed: -110
  tegra-mgbe 6800000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -110)


We have tracked it down to this change and looks like our PHY does not
support 10M ...

$ ethtool eth0
Settings for eth0:
         Supported ports: [  ]
         Supported link modes:   100baseT/Full
                                 1000baseT/Full
                                 10000baseT/Full
                                 1000baseKX/Full
                                 10000baseKX4/Full
                                 10000baseKR/Full
                                 2500baseT/Full
                                 5000baseT/Full

The following fixes this for this platform ...

diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
index d12e35374231..0b2db486d8bd 100644
--- a/drivers/net/phy/aquantia/aquantia_main.c
+++ b/drivers/net/phy/aquantia/aquantia_main.c
@@ -656,7 +656,7 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
         int i, val, ret;
  
         ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
-                                       VEND1_GLOBAL_CFG_10M, val, val != 0,
+                                       VEND1_GLOBAL_CFG_100M, val, val != 0,
                                         1000, 100000, false);
         if (ret)
                 return ret;


However, I am not sure if this is guaranteed to work for all?

On a related note, we had also found an issue with this PHY where
the PHY reset is not working quite correctly. What we see is that
when polling for the firmware ID in aqr107_wait_reset_complete()
is that value in the firware ID registers transitions from 0 to
0xffff and then to the firmware ID. We have been testing the
following change to fix this ...

diff --git a/drivers/net/phy/aquantia/aquantia.h b/drivers/net/phy/aquantia/aquantia.h
index 2465345081f8..278e3b167c58 100644
--- a/drivers/net/phy/aquantia/aquantia.h
+++ b/drivers/net/phy/aquantia/aquantia.h
@@ -20,6 +20,7 @@
  #define VEND1_GLOBAL_FW_ID                     0x0020
  #define VEND1_GLOBAL_FW_ID_MAJOR               GENMASK(15, 8)
  #define VEND1_GLOBAL_FW_ID_MINOR               GENMASK(7, 0)
+#define VEND1_GLOBAL_FW_ID_MASK                        GENMASK(15, 0)
  
  #define VEND1_GLOBAL_MAILBOX_INTERFACE1                        0x0200
  #define VEND1_GLOBAL_MAILBOX_INTERFACE1_EXECUTE                BIT(15)
diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
index 0b2db486d8bd..5023fd70050d 100644
--- a/drivers/net/phy/aquantia/aquantia_main.c
+++ b/drivers/net/phy/aquantia/aquantia_main.c
@@ -447,7 +447,9 @@ int aqr_wait_reset_complete(struct phy_device *phydev)
         int val;
  
         return phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
-                                        VEND1_GLOBAL_FW_ID, val, val != 0,
+                                        VEND1_GLOBAL_FW_ID, val,
+                                        ((val & VEND1_GLOBAL_FW_ID_MASK) != 0 &&
+                                        (val & VEND1_GLOBAL_FW_ID_MASK) != VEND1_GLOBAL_FW_ID_MASK),
                                          20000, 2000000, false);
  }

I have not tried the resume use-case, but curious if this may
also help here?

Cheers
Jon
Bartosz Golaszewski July 18, 2024, 1:04 p.m. UTC | #2
On Thu, Jul 18, 2024 at 2:23 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Bartosz,
>
> On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> >
> > When the PHY is first coming up (or resuming from suspend), it's
> > possible that although the FW status shows as running, we still see
> > zeroes in the GLOBAL_CFG set of registers and cannot determine available
> > modes. Since all models support 10M, add a poll and wait the config to
> > become available.
> >
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > ---
> >   drivers/net/phy/aquantia/aquantia_main.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
> > index 974795bd0860..2c8ba2725a91 100644
> > --- a/drivers/net/phy/aquantia/aquantia_main.c
> > +++ b/drivers/net/phy/aquantia/aquantia_main.c
> > @@ -652,7 +652,13 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
> >       unsigned long *possible = phydev->possible_interfaces;
> >       unsigned int serdes_mode, rate_adapt;
> >       phy_interface_t interface;
> > -     int i, val;
> > +     int i, val, ret;
> > +
> > +     ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
> > +                                     VEND1_GLOBAL_CFG_10M, val, val != 0,
> > +                                     1000, 100000, false);
> > +     if (ret)
> > +             return ret;
> >
> >       /* Walk the media-speed configuration registers to determine which
> >        * host-side serdes modes may be used by the PHY depending on the
>
>
> With the current -next and mainline we are seeing the following issue on
> our Tegra234 Jetson AGX Orin platform ...
>
>   Aquantia AQR113C stmmac-0:00: aqr107_fill_interface_modes failed: -110
>   tegra-mgbe 6800000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -110)
>
>
> We have tracked it down to this change and looks like our PHY does not
> support 10M ...
>
> $ ethtool eth0
> Settings for eth0:
>          Supported ports: [  ]
>          Supported link modes:   100baseT/Full
>                                  1000baseT/Full
>                                  10000baseT/Full
>                                  1000baseKX/Full
>                                  10000baseKX4/Full
>                                  10000baseKR/Full
>                                  2500baseT/Full
>                                  5000baseT/Full
>
> The following fixes this for this platform ...
>
> diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
> index d12e35374231..0b2db486d8bd 100644
> --- a/drivers/net/phy/aquantia/aquantia_main.c
> +++ b/drivers/net/phy/aquantia/aquantia_main.c
> @@ -656,7 +656,7 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
>          int i, val, ret;
>
>          ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
> -                                       VEND1_GLOBAL_CFG_10M, val, val != 0,
> +                                       VEND1_GLOBAL_CFG_100M, val, val != 0,
>                                          1000, 100000, false);
>          if (ret)
>                  return ret;
>
>
> However, I am not sure if this is guaranteed to work for all?

Ah cr*p. No, I don't think it is. We should take the first supported
mode for a given PHY I think.

>
> On a related note, we had also found an issue with this PHY where
> the PHY reset is not working quite correctly. What we see is that
> when polling for the firmware ID in aqr107_wait_reset_complete()
> is that value in the firware ID registers transitions from 0 to
> 0xffff and then to the firmware ID. We have been testing the
> following change to fix this ...
>
> diff --git a/drivers/net/phy/aquantia/aquantia.h b/drivers/net/phy/aquantia/aquantia.h
> index 2465345081f8..278e3b167c58 100644
> --- a/drivers/net/phy/aquantia/aquantia.h
> +++ b/drivers/net/phy/aquantia/aquantia.h
> @@ -20,6 +20,7 @@
>   #define VEND1_GLOBAL_FW_ID                     0x0020
>   #define VEND1_GLOBAL_FW_ID_MAJOR               GENMASK(15, 8)
>   #define VEND1_GLOBAL_FW_ID_MINOR               GENMASK(7, 0)
> +#define VEND1_GLOBAL_FW_ID_MASK                        GENMASK(15, 0)
>
>   #define VEND1_GLOBAL_MAILBOX_INTERFACE1                        0x0200
>   #define VEND1_GLOBAL_MAILBOX_INTERFACE1_EXECUTE                BIT(15)
> diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
> index 0b2db486d8bd..5023fd70050d 100644
> --- a/drivers/net/phy/aquantia/aquantia_main.c
> +++ b/drivers/net/phy/aquantia/aquantia_main.c
> @@ -447,7 +447,9 @@ int aqr_wait_reset_complete(struct phy_device *phydev)
>          int val;
>
>          return phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
> -                                        VEND1_GLOBAL_FW_ID, val, val != 0,
> +                                        VEND1_GLOBAL_FW_ID, val,
> +                                        ((val & VEND1_GLOBAL_FW_ID_MASK) != 0 &&
> +                                        (val & VEND1_GLOBAL_FW_ID_MASK) != VEND1_GLOBAL_FW_ID_MASK),
>                                           20000, 2000000, false);
>   }
>
> I have not tried the resume use-case, but curious if this may
> also help here?
>

Interesting. Unfortunately this doesn't help with the suspend/resume
use-case if I revert my offending patch.

Bart

> Cheers
> Jon
>
> --
> nvpublic
Bartosz Golaszewski July 18, 2024, 1:29 p.m. UTC | #3
On Thu, Jul 18, 2024 at 3:04 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> On Thu, Jul 18, 2024 at 2:23 PM Jon Hunter <jonathanh@nvidia.com> wrote:
> >
> >
> > With the current -next and mainline we are seeing the following issue on
> > our Tegra234 Jetson AGX Orin platform ...
> >
> >   Aquantia AQR113C stmmac-0:00: aqr107_fill_interface_modes failed: -110
> >   tegra-mgbe 6800000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -110)
> >
> >
> > We have tracked it down to this change and looks like our PHY does not
> > support 10M ...
> >
> > $ ethtool eth0
> > Settings for eth0:
> >          Supported ports: [  ]
> >          Supported link modes:   100baseT/Full
> >                                  1000baseT/Full
> >                                  10000baseT/Full
> >                                  1000baseKX/Full
> >                                  10000baseKX4/Full
> >                                  10000baseKR/Full
> >                                  2500baseT/Full
> >                                  5000baseT/Full
> >
> > The following fixes this for this platform ...
> >
> > diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
> > index d12e35374231..0b2db486d8bd 100644
> > --- a/drivers/net/phy/aquantia/aquantia_main.c
> > +++ b/drivers/net/phy/aquantia/aquantia_main.c
> > @@ -656,7 +656,7 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
> >          int i, val, ret;
> >
> >          ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
> > -                                       VEND1_GLOBAL_CFG_10M, val, val != 0,
> > +                                       VEND1_GLOBAL_CFG_100M, val, val != 0,
> >                                          1000, 100000, false);
> >          if (ret)
> >                  return ret;
> >
> >
> > However, I am not sure if this is guaranteed to work for all?
>
> Ah cr*p. No, I don't think it is. We should take the first supported
> mode for a given PHY I think.
>

TBH I only observed the issue on AQR115C. I don't have any other model
to test with. Is it fine to fix it by implementing
aqr115_fill_interface_modes() that would first wait for this register
to return non-0 and then call aqr107_fill_interface_modes()?

Bartosz
Jon Hunter July 18, 2024, 2:08 p.m. UTC | #4
On 18/07/2024 14:29, Bartosz Golaszewski wrote:
> On Thu, Jul 18, 2024 at 3:04 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>>
>> On Thu, Jul 18, 2024 at 2:23 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>>>
>>>
>>> With the current -next and mainline we are seeing the following issue on
>>> our Tegra234 Jetson AGX Orin platform ...
>>>
>>>    Aquantia AQR113C stmmac-0:00: aqr107_fill_interface_modes failed: -110
>>>    tegra-mgbe 6800000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -110)
>>>
>>>
>>> We have tracked it down to this change and looks like our PHY does not
>>> support 10M ...
>>>
>>> $ ethtool eth0
>>> Settings for eth0:
>>>           Supported ports: [  ]
>>>           Supported link modes:   100baseT/Full
>>>                                   1000baseT/Full
>>>                                   10000baseT/Full
>>>                                   1000baseKX/Full
>>>                                   10000baseKX4/Full
>>>                                   10000baseKR/Full
>>>                                   2500baseT/Full
>>>                                   5000baseT/Full
>>>
>>> The following fixes this for this platform ...
>>>
>>> diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
>>> index d12e35374231..0b2db486d8bd 100644
>>> --- a/drivers/net/phy/aquantia/aquantia_main.c
>>> +++ b/drivers/net/phy/aquantia/aquantia_main.c
>>> @@ -656,7 +656,7 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
>>>           int i, val, ret;
>>>
>>>           ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
>>> -                                       VEND1_GLOBAL_CFG_10M, val, val != 0,
>>> +                                       VEND1_GLOBAL_CFG_100M, val, val != 0,
>>>                                           1000, 100000, false);
>>>           if (ret)
>>>                   return ret;
>>>
>>>
>>> However, I am not sure if this is guaranteed to work for all?
>>
>> Ah cr*p. No, I don't think it is. We should take the first supported
>> mode for a given PHY I think.
>>
> 
> TBH I only observed the issue on AQR115C. I don't have any other model
> to test with. Is it fine to fix it by implementing
> aqr115_fill_interface_modes() that would first wait for this register
> to return non-0 and then call aqr107_fill_interface_modes()?

I am doing a bit more testing. We have seen a few issues with this PHY 
driver and so I am wondering if we also need something similar for the 
AQR113C variant too.

Interestingly, the product brief for these PHYs [0] do show that both 
the AQR113C and AQR115C both support 10M. So I wonder if it is our 
ethernet controller that is not supporting 10M? I will check on this too.

Jon

[0] 
https://www.marvell.com/content/dam/marvell/en/public-collateral/transceivers/marvell-phys-transceivers-aqrate-gen4-product-brief.pdf
Bartosz Golaszewski July 18, 2024, 2:13 p.m. UTC | #5
On Thu, Jul 18, 2024 at 4:08 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
> On 18/07/2024 14:29, Bartosz Golaszewski wrote:
> > On Thu, Jul 18, 2024 at 3:04 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >>
> >> On Thu, Jul 18, 2024 at 2:23 PM Jon Hunter <jonathanh@nvidia.com> wrote:
> >>>
> >>>
> >>> With the current -next and mainline we are seeing the following issue on
> >>> our Tegra234 Jetson AGX Orin platform ...
> >>>
> >>>    Aquantia AQR113C stmmac-0:00: aqr107_fill_interface_modes failed: -110
> >>>    tegra-mgbe 6800000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -110)
> >>>
> >>>
> >>> We have tracked it down to this change and looks like our PHY does not
> >>> support 10M ...
> >>>
> >>> $ ethtool eth0
> >>> Settings for eth0:
> >>>           Supported ports: [  ]
> >>>           Supported link modes:   100baseT/Full
> >>>                                   1000baseT/Full
> >>>                                   10000baseT/Full
> >>>                                   1000baseKX/Full
> >>>                                   10000baseKX4/Full
> >>>                                   10000baseKR/Full
> >>>                                   2500baseT/Full
> >>>                                   5000baseT/Full
> >>>
> >>> The following fixes this for this platform ...
> >>>
> >>> diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
> >>> index d12e35374231..0b2db486d8bd 100644
> >>> --- a/drivers/net/phy/aquantia/aquantia_main.c
> >>> +++ b/drivers/net/phy/aquantia/aquantia_main.c
> >>> @@ -656,7 +656,7 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
> >>>           int i, val, ret;
> >>>
> >>>           ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
> >>> -                                       VEND1_GLOBAL_CFG_10M, val, val != 0,
> >>> +                                       VEND1_GLOBAL_CFG_100M, val, val != 0,
> >>>                                           1000, 100000, false);
> >>>           if (ret)
> >>>                   return ret;
> >>>
> >>>
> >>> However, I am not sure if this is guaranteed to work for all?
> >>
> >> Ah cr*p. No, I don't think it is. We should take the first supported
> >> mode for a given PHY I think.
> >>
> >
> > TBH I only observed the issue on AQR115C. I don't have any other model
> > to test with. Is it fine to fix it by implementing
> > aqr115_fill_interface_modes() that would first wait for this register
> > to return non-0 and then call aqr107_fill_interface_modes()?
>
> I am doing a bit more testing. We have seen a few issues with this PHY
> driver and so I am wondering if we also need something similar for the
> AQR113C variant too.
>
> Interestingly, the product brief for these PHYs [0] do show that both
> the AQR113C and AQR115C both support 10M. So I wonder if it is our
> ethernet controller that is not supporting 10M? I will check on this too.
>

Oh you have an 113c? I didn't get this. Yeah, weird, all docs say it
should support 10M. In fact all AQR PHYs should hence my initial
change.

Bart

> Jon
>
> [0]
> https://www.marvell.com/content/dam/marvell/en/public-collateral/transceivers/marvell-phys-transceivers-aqrate-gen4-product-brief.pdf
>
>
> --
> nvpublic
Jon Hunter July 18, 2024, 2:49 p.m. UTC | #6
On 18/07/2024 15:13, Bartosz Golaszewski wrote:
> On Thu, Jul 18, 2024 at 4:08 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>>
>>
>> On 18/07/2024 14:29, Bartosz Golaszewski wrote:
>>> On Thu, Jul 18, 2024 at 3:04 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>>>>
>>>> On Thu, Jul 18, 2024 at 2:23 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>>>>>
>>>>>
>>>>> With the current -next and mainline we are seeing the following issue on
>>>>> our Tegra234 Jetson AGX Orin platform ...
>>>>>
>>>>>     Aquantia AQR113C stmmac-0:00: aqr107_fill_interface_modes failed: -110
>>>>>     tegra-mgbe 6800000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -110)
>>>>>
>>>>>
>>>>> We have tracked it down to this change and looks like our PHY does not
>>>>> support 10M ...
>>>>>
>>>>> $ ethtool eth0
>>>>> Settings for eth0:
>>>>>            Supported ports: [  ]
>>>>>            Supported link modes:   100baseT/Full
>>>>>                                    1000baseT/Full
>>>>>                                    10000baseT/Full
>>>>>                                    1000baseKX/Full
>>>>>                                    10000baseKX4/Full
>>>>>                                    10000baseKR/Full
>>>>>                                    2500baseT/Full
>>>>>                                    5000baseT/Full
>>>>>
>>>>> The following fixes this for this platform ...
>>>>>
>>>>> diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
>>>>> index d12e35374231..0b2db486d8bd 100644
>>>>> --- a/drivers/net/phy/aquantia/aquantia_main.c
>>>>> +++ b/drivers/net/phy/aquantia/aquantia_main.c
>>>>> @@ -656,7 +656,7 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
>>>>>            int i, val, ret;
>>>>>
>>>>>            ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
>>>>> -                                       VEND1_GLOBAL_CFG_10M, val, val != 0,
>>>>> +                                       VEND1_GLOBAL_CFG_100M, val, val != 0,
>>>>>                                            1000, 100000, false);
>>>>>            if (ret)
>>>>>                    return ret;
>>>>>
>>>>>
>>>>> However, I am not sure if this is guaranteed to work for all?
>>>>
>>>> Ah cr*p. No, I don't think it is. We should take the first supported
>>>> mode for a given PHY I think.
>>>>
>>>
>>> TBH I only observed the issue on AQR115C. I don't have any other model
>>> to test with. Is it fine to fix it by implementing
>>> aqr115_fill_interface_modes() that would first wait for this register
>>> to return non-0 and then call aqr107_fill_interface_modes()?
>>
>> I am doing a bit more testing. We have seen a few issues with this PHY
>> driver and so I am wondering if we also need something similar for the
>> AQR113C variant too.
>>
>> Interestingly, the product brief for these PHYs [0] do show that both
>> the AQR113C and AQR115C both support 10M. So I wonder if it is our
>> ethernet controller that is not supporting 10M? I will check on this too.
>>
> 
> Oh you have an 113c? I didn't get this. Yeah, weird, all docs say it
> should support 10M. In fact all AQR PHYs should hence my initial
> change.


Yes we have an AQR113C. I agree it should support this, but for whatever 
reason this is not advertised. I do see that 10M is advertised as 
supported by the network ...

  Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                       100baseT/Half 100baseT/Full
                                       1000baseT/Full

My PC that is on the same network supports 10M, but just not this Tegra 
device. I am checking to see if this is expected for this device.

Jon
Bartosz Golaszewski July 18, 2024, 2:59 p.m. UTC | #7
On Thu, Jul 18, 2024 at 4:49 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
> On 18/07/2024 15:13, Bartosz Golaszewski wrote:
> > On Thu, Jul 18, 2024 at 4:08 PM Jon Hunter <jonathanh@nvidia.com> wrote:
> >>
> >>
> >> On 18/07/2024 14:29, Bartosz Golaszewski wrote:
> >>> On Thu, Jul 18, 2024 at 3:04 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >>>>
> >>>> On Thu, Jul 18, 2024 at 2:23 PM Jon Hunter <jonathanh@nvidia.com> wrote:
> >>>>>
> >>>>>
> >>>>> With the current -next and mainline we are seeing the following issue on
> >>>>> our Tegra234 Jetson AGX Orin platform ...
> >>>>>
> >>>>>     Aquantia AQR113C stmmac-0:00: aqr107_fill_interface_modes failed: -110
> >>>>>     tegra-mgbe 6800000.ethernet eth0: __stmmac_open: Cannot attach to PHY (error: -110)
> >>>>>
> >>>>>
> >>>>> We have tracked it down to this change and looks like our PHY does not
> >>>>> support 10M ...
> >>>>>
> >>>>> $ ethtool eth0
> >>>>> Settings for eth0:
> >>>>>            Supported ports: [  ]
> >>>>>            Supported link modes:   100baseT/Full
> >>>>>                                    1000baseT/Full
> >>>>>                                    10000baseT/Full
> >>>>>                                    1000baseKX/Full
> >>>>>                                    10000baseKX4/Full
> >>>>>                                    10000baseKR/Full
> >>>>>                                    2500baseT/Full
> >>>>>                                    5000baseT/Full
> >>>>>
> >>>>> The following fixes this for this platform ...
> >>>>>
> >>>>> diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
> >>>>> index d12e35374231..0b2db486d8bd 100644
> >>>>> --- a/drivers/net/phy/aquantia/aquantia_main.c
> >>>>> +++ b/drivers/net/phy/aquantia/aquantia_main.c
> >>>>> @@ -656,7 +656,7 @@ static int aqr107_fill_interface_modes(struct phy_device *phydev)
> >>>>>            int i, val, ret;
> >>>>>
> >>>>>            ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
> >>>>> -                                       VEND1_GLOBAL_CFG_10M, val, val != 0,
> >>>>> +                                       VEND1_GLOBAL_CFG_100M, val, val != 0,
> >>>>>                                            1000, 100000, false);
> >>>>>            if (ret)
> >>>>>                    return ret;
> >>>>>
> >>>>>
> >>>>> However, I am not sure if this is guaranteed to work for all?
> >>>>
> >>>> Ah cr*p. No, I don't think it is. We should take the first supported
> >>>> mode for a given PHY I think.
> >>>>
> >>>
> >>> TBH I only observed the issue on AQR115C. I don't have any other model
> >>> to test with. Is it fine to fix it by implementing
> >>> aqr115_fill_interface_modes() that would first wait for this register
> >>> to return non-0 and then call aqr107_fill_interface_modes()?
> >>
> >> I am doing a bit more testing. We have seen a few issues with this PHY
> >> driver and so I am wondering if we also need something similar for the
> >> AQR113C variant too.
> >>
> >> Interestingly, the product brief for these PHYs [0] do show that both
> >> the AQR113C and AQR115C both support 10M. So I wonder if it is our
> >> ethernet controller that is not supporting 10M? I will check on this too.
> >>
> >
> > Oh you have an 113c? I didn't get this. Yeah, weird, all docs say it
> > should support 10M. In fact all AQR PHYs should hence my initial
> > change.
>
>
> Yes we have an AQR113C. I agree it should support this, but for whatever
> reason this is not advertised. I do see that 10M is advertised as
> supported by the network ...
>
>   Link partner advertised link modes:  10baseT/Half 10baseT/Full
>                                        100baseT/Half 100baseT/Full
>                                        1000baseT/Full
>
> My PC that is on the same network supports 10M, but just not this Tegra
> device. I am checking to see if this is expected for this device.
>

I sent a patch for you to test. I think that even if it doesn't fully
fix the issue you're observing, it's worth picking it up as it reduces
the impact of the workaround I introduced.

I'll be off next week so I'm sending it quickly with the hope it will be useful.

Bart
Jon Hunter July 18, 2024, 5:42 p.m. UTC | #8
On 18/07/2024 15:59, Bartosz Golaszewski wrote:

...

>>>>> TBH I only observed the issue on AQR115C. I don't have any other model
>>>>> to test with. Is it fine to fix it by implementing
>>>>> aqr115_fill_interface_modes() that would first wait for this register
>>>>> to return non-0 and then call aqr107_fill_interface_modes()?
>>>>
>>>> I am doing a bit more testing. We have seen a few issues with this PHY
>>>> driver and so I am wondering if we also need something similar for the
>>>> AQR113C variant too.
>>>>
>>>> Interestingly, the product brief for these PHYs [0] do show that both
>>>> the AQR113C and AQR115C both support 10M. So I wonder if it is our
>>>> ethernet controller that is not supporting 10M? I will check on this too.
>>>>
>>>
>>> Oh you have an 113c? I didn't get this. Yeah, weird, all docs say it
>>> should support 10M. In fact all AQR PHYs should hence my initial
>>> change.
>>
>>
>> Yes we have an AQR113C. I agree it should support this, but for whatever
>> reason this is not advertised. I do see that 10M is advertised as
>> supported by the network ...
>>
>>    Link partner advertised link modes:  10baseT/Half 10baseT/Full
>>                                         100baseT/Half 100baseT/Full
>>                                         1000baseT/Full
>>
>> My PC that is on the same network supports 10M, but just not this Tegra
>> device. I am checking to see if this is expected for this device.
>>
> 
> I sent a patch for you to test. I think that even if it doesn't fully
> fix the issue you're observing, it's worth picking it up as it reduces
> the impact of the workaround I introduced.


Thanks! I will test this tonight.

> I'll be off next week so I'm sending it quickly with the hope it will be useful.


OK thanks for letting me know.

Another thought I had, which is also quite timely, is that I have 
recently been testing a patch [0] as I found that this actually resolves 
an issue where we occasionally see our device fail to get an IP address.

This was sent out over a year ago and sadly we failed to follow up :-(

Russell was concerned if this would make the function that was being 
changed fail if it did not have the link (if I am understanding the 
comments correctly). However, looking at the code now, I see that the 
aqr107_read_status() function checks if '!phydev->link' before we poll 
the TX ready status, and so I am wondering if this change is OK? From my 
testing it does work. I would be interested to know if this may also 
resolve your issue?

With this change [0] I have been able to do 500 boots on our board and 
verify that the ethernet controller is able to get an IP address every 
time. Without this change it would fail to get an IP address anywhere 
from 1-100 boots typically.

I will test your patch in the same way, but I am wondering if both are 
trying to address the same sort of issue?

Cheers
Jon

[0] 
https://lore.kernel.org/linux-tegra/20230628124326.55732-3-ruppala@nvidia.com/#t
Bartosz Golaszewski July 18, 2024, 7:05 p.m. UTC | #9
On Thu, Jul 18, 2024 at 7:42 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
> On 18/07/2024 15:59, Bartosz Golaszewski wrote:
>
> ...
>
> >>>>> TBH I only observed the issue on AQR115C. I don't have any other model
> >>>>> to test with. Is it fine to fix it by implementing
> >>>>> aqr115_fill_interface_modes() that would first wait for this register
> >>>>> to return non-0 and then call aqr107_fill_interface_modes()?
> >>>>
> >>>> I am doing a bit more testing. We have seen a few issues with this PHY
> >>>> driver and so I am wondering if we also need something similar for the
> >>>> AQR113C variant too.
> >>>>
> >>>> Interestingly, the product brief for these PHYs [0] do show that both
> >>>> the AQR113C and AQR115C both support 10M. So I wonder if it is our
> >>>> ethernet controller that is not supporting 10M? I will check on this too.
> >>>>
> >>>
> >>> Oh you have an 113c? I didn't get this. Yeah, weird, all docs say it
> >>> should support 10M. In fact all AQR PHYs should hence my initial
> >>> change.
> >>
> >>
> >> Yes we have an AQR113C. I agree it should support this, but for whatever
> >> reason this is not advertised. I do see that 10M is advertised as
> >> supported by the network ...
> >>
> >>    Link partner advertised link modes:  10baseT/Half 10baseT/Full
> >>                                         100baseT/Half 100baseT/Full
> >>                                         1000baseT/Full
> >>
> >> My PC that is on the same network supports 10M, but just not this Tegra
> >> device. I am checking to see if this is expected for this device.
> >>
> >
> > I sent a patch for you to test. I think that even if it doesn't fully
> > fix the issue you're observing, it's worth picking it up as it reduces
> > the impact of the workaround I introduced.
>
>
> Thanks! I will test this tonight.
>
> > I'll be off next week so I'm sending it quickly with the hope it will be useful.
>
>
> OK thanks for letting me know.
>
> Another thought I had, which is also quite timely, is that I have
> recently been testing a patch [0] as I found that this actually resolves
> an issue where we occasionally see our device fail to get an IP address.
>
> This was sent out over a year ago and sadly we failed to follow up :-(
>
> Russell was concerned if this would make the function that was being
> changed fail if it did not have the link (if I am understanding the
> comments correctly). However, looking at the code now, I see that the
> aqr107_read_status() function checks if '!phydev->link' before we poll
> the TX ready status, and so I am wondering if this change is OK? From my
> testing it does work. I would be interested to know if this may also
> resolve your issue?
>
> With this change [0] I have been able to do 500 boots on our board and
> verify that the ethernet controller is able to get an IP address every
> time. Without this change it would fail to get an IP address anywhere
> from 1-100 boots typically.
>
> I will test your patch in the same way, but I am wondering if both are
> trying to address the same sort of issue?
>

The patch you linked does not fix the suspend/resume either. :(

Bartosz

> Cheers
> Jon
>
> [0]
> https://lore.kernel.org/linux-tegra/20230628124326.55732-3-ruppala@nvidia.com/#t
>
> --
> nvpublic
Andrew Lunn July 19, 2024, 3:41 a.m. UTC | #10
> Interestingly, the product brief for these PHYs [0] do show that both the
> AQR113C and AQR115C both support 10M. So I wonder if it is our ethernet
> controller that is not supporting 10M? I will check on this too.

The PHY should enumerate what it supports, look at
genphy_read_abilities(), which reads the Basic Mode Status Register,
and there are bits in there which indicate if the PHY supports 10Half
and 10Full.

The MAC should also be indicating what it support, and ethtool will be
showing you the combination of the two.

	Andrew
Jon Hunter July 19, 2024, 8:39 a.m. UTC | #11
On 18/07/2024 20:05, Bartosz Golaszewski wrote:
> On Thu, Jul 18, 2024 at 7:42 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>>
>>
>> On 18/07/2024 15:59, Bartosz Golaszewski wrote:
>>
>> ...
>>
>>>>>>> TBH I only observed the issue on AQR115C. I don't have any other model
>>>>>>> to test with. Is it fine to fix it by implementing
>>>>>>> aqr115_fill_interface_modes() that would first wait for this register
>>>>>>> to return non-0 and then call aqr107_fill_interface_modes()?
>>>>>>
>>>>>> I am doing a bit more testing. We have seen a few issues with this PHY
>>>>>> driver and so I am wondering if we also need something similar for the
>>>>>> AQR113C variant too.
>>>>>>
>>>>>> Interestingly, the product brief for these PHYs [0] do show that both
>>>>>> the AQR113C and AQR115C both support 10M. So I wonder if it is our
>>>>>> ethernet controller that is not supporting 10M? I will check on this too.
>>>>>>
>>>>>
>>>>> Oh you have an 113c? I didn't get this. Yeah, weird, all docs say it
>>>>> should support 10M. In fact all AQR PHYs should hence my initial
>>>>> change.
>>>>
>>>>
>>>> Yes we have an AQR113C. I agree it should support this, but for whatever
>>>> reason this is not advertised. I do see that 10M is advertised as
>>>> supported by the network ...
>>>>
>>>>     Link partner advertised link modes:  10baseT/Half 10baseT/Full
>>>>                                          100baseT/Half 100baseT/Full
>>>>                                          1000baseT/Full
>>>>
>>>> My PC that is on the same network supports 10M, but just not this Tegra
>>>> device. I am checking to see if this is expected for this device.
>>>>
>>>
>>> I sent a patch for you to test. I think that even if it doesn't fully
>>> fix the issue you're observing, it's worth picking it up as it reduces
>>> the impact of the workaround I introduced.
>>
>>
>> Thanks! I will test this tonight.
>>
>>> I'll be off next week so I'm sending it quickly with the hope it will be useful.
>>
>>
>> OK thanks for letting me know.
>>
>> Another thought I had, which is also quite timely, is that I have
>> recently been testing a patch [0] as I found that this actually resolves
>> an issue where we occasionally see our device fail to get an IP address.
>>
>> This was sent out over a year ago and sadly we failed to follow up :-(
>>
>> Russell was concerned if this would make the function that was being
>> changed fail if it did not have the link (if I am understanding the
>> comments correctly). However, looking at the code now, I see that the
>> aqr107_read_status() function checks if '!phydev->link' before we poll
>> the TX ready status, and so I am wondering if this change is OK? From my
>> testing it does work. I would be interested to know if this may also
>> resolve your issue?
>>
>> With this change [0] I have been able to do 500 boots on our board and
>> verify that the ethernet controller is able to get an IP address every
>> time. Without this change it would fail to get an IP address anywhere
>> from 1-100 boots typically.
>>
>> I will test your patch in the same way, but I am wondering if both are
>> trying to address the same sort of issue?
>>
> 
> The patch you linked does not fix the suspend/resume either. :(


Thanks for testing! I have verified that the patch you sent resolves the 
issue introduced by this patch for Tegra. And likewise this patch does 
not resolve the long-standing issue (not related to this change) that we 
have been observing.

Cheers
Jon
diff mbox series

Patch

diff --git a/drivers/net/phy/aquantia/aquantia_main.c b/drivers/net/phy/aquantia/aquantia_main.c
index 974795bd0860..2c8ba2725a91 100644
--- a/drivers/net/phy/aquantia/aquantia_main.c
+++ b/drivers/net/phy/aquantia/aquantia_main.c
@@ -652,7 +652,13 @@  static int aqr107_fill_interface_modes(struct phy_device *phydev)
 	unsigned long *possible = phydev->possible_interfaces;
 	unsigned int serdes_mode, rate_adapt;
 	phy_interface_t interface;
-	int i, val;
+	int i, val, ret;
+
+	ret = phy_read_mmd_poll_timeout(phydev, MDIO_MMD_VEND1,
+					VEND1_GLOBAL_CFG_10M, val, val != 0,
+					1000, 100000, false);
+	if (ret)
+		return ret;
 
 	/* Walk the media-speed configuration registers to determine which
 	 * host-side serdes modes may be used by the PHY depending on the