mbox series

[v2,0/3] Fix Armada 38x mvneta lockups when switching speeds

Message ID 20200721143756.GT1605@shell.armlinux.org.uk (mailing list archive)
Headers show
Series Fix Armada 38x mvneta lockups when switching speeds | expand

Message

Russell King (Oracle) July 21, 2020, 2:37 p.m. UTC
Hi,

While testing phylink over the weekend, I found it was possible to
cause the mvneta hardware to lockup in various weird and wonderful
ways by switching the interface speed between 1G and 2.5G repeatedly.
It didn't require a rapid switching, but one switch every few seconds.

Symptoms included one or more of:
- Timeout while trying to stop transmit (seen once)
- 2500BASE-X link negotiation failure (fails to exchange link word.)
- Detects lack of sync, but fails to flag 10ms of sync failure.
- SyncOk bit randomly toggles.

Once the hardware gets into a "bad" state, trying to recover it by
using the mvneta GMAC port reset fails to resolve the issue.
Disabling the port also fails to recover it.  The only way to
recover seemed to be via a reboot.

Many solutions to solve this were tried in various combinations -
while changing the COMPHY configuration:
- putting the GMAC into reset
- disabling the GMAC port
- augmenting the COMPHY configuration to try to "cleanly" disable
  the COMPHY via phy_power_down() and reconfigure it via
  phy_power_up(), including resetting parts of the COMPHY and
  re-running the RX initialisation.

None of that worked.  It was then discovered from the u-boot sources
that there is an undocumented register that has a lane-specific bit
set at the end of COMPHY initialisation, once the loosely documented
COMPHY setup has completed.

Experimentation with that showed that if the lane specific bit is
cleared before changing the COMPHY "GEN" configuration, and set
afterwards, mvneta no longer locks up.

Unfortunately, this undocumented register is not part of the COMPHY
register set that we map - it is located in a region of "System
Registers" which are shared between multiple different devices.

Who should be responsible for mapping this register (mvneta or
COMPHY) was considered; the register is only present on Armada 38x
systems, and seemingly not on Armada 37x or Armada 37xx systems.
It seems that it is a system-level register.  The COMPHYs seem to
be system specific, so let's make it part of the COMPHY.

With no real information on this register, all we can do is guess
about it's function and how to fit it into the system.

 .../bindings/phy/phy-armada38x-comphy.txt          | 10 ++++-
 arch/arm/boot/dts/armada-38x.dtsi                  |  3 +-
 drivers/phy/marvell/phy-armada38x-comphy.c         | 45 ++++++++++++++++++----
 3 files changed, 49 insertions(+), 9 deletions(-)

Comments

Vinod Koul July 21, 2020, 5:28 p.m. UTC | #1
On 21-07-20, 15:37, Russell King - ARM Linux admin wrote:
> Hi,
> 
> While testing phylink over the weekend, I found it was possible to
> cause the mvneta hardware to lockup in various weird and wonderful
> ways by switching the interface speed between 1G and 2.5G repeatedly.
> It didn't require a rapid switching, but one switch every few seconds.
> 
> Symptoms included one or more of:
> - Timeout while trying to stop transmit (seen once)
> - 2500BASE-X link negotiation failure (fails to exchange link word.)
> - Detects lack of sync, but fails to flag 10ms of sync failure.
> - SyncOk bit randomly toggles.
> 
> Once the hardware gets into a "bad" state, trying to recover it by
> using the mvneta GMAC port reset fails to resolve the issue.
> Disabling the port also fails to recover it.  The only way to
> recover seemed to be via a reboot.
> 
> Many solutions to solve this were tried in various combinations -
> while changing the COMPHY configuration:
> - putting the GMAC into reset
> - disabling the GMAC port
> - augmenting the COMPHY configuration to try to "cleanly" disable
>   the COMPHY via phy_power_down() and reconfigure it via
>   phy_power_up(), including resetting parts of the COMPHY and
>   re-running the RX initialisation.
> 
> None of that worked.  It was then discovered from the u-boot sources
> that there is an undocumented register that has a lane-specific bit
> set at the end of COMPHY initialisation, once the loosely documented
> COMPHY setup has completed.
> 
> Experimentation with that showed that if the lane specific bit is
> cleared before changing the COMPHY "GEN" configuration, and set
> afterwards, mvneta no longer locks up.
> 
> Unfortunately, this undocumented register is not part of the COMPHY
> register set that we map - it is located in a region of "System
> Registers" which are shared between multiple different devices.
> 
> Who should be responsible for mapping this register (mvneta or
> COMPHY) was considered; the register is only present on Armada 38x
> systems, and seemingly not on Armada 37x or Armada 37xx systems.
> It seems that it is a system-level register.  The COMPHYs seem to
> be system specific, so let's make it part of the COMPHY.
> 
> With no real information on this register, all we can do is guess
> about it's function and how to fit it into the system.

Applied 1 & 3, thanks