diff mbox series

[net,v2] net: phy: microchip: force IRQ polling mode for lan88xx

Message ID 20250416102413.30654-1-fiona.klute@gmx.de (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] net: phy: microchip: force IRQ polling mode for lan88xx | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: woojung.huh@microchip.com; 4 maintainers not CCed: linux@armlinux.org.uk hkallweit1@gmail.com woojung.huh@microchip.com andrew@lunn.ch
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 58 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-04-18--09-00 (tests: 916)

Commit Message

Fiona Klute April 16, 2025, 10:24 a.m. UTC
With lan88xx based devices the lan78xx driver can get stuck in an
interrupt loop while bringing the device up, flooding the kernel log
with messages like the following:

lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped

Removing interrupt support from the lan88xx PHY driver forces the
driver to use polling instead, which avoids the problem.

The issue has been observed with Raspberry Pi devices at least since
4.14 (see [1], bug report for their downstream kernel), as well as
with Nvidia devices [2] in 2020, where disabling polling was the
vendor-suggested workaround (together with the claim that phylib
changes in 4.9 made the interrupt handling in lan78xx incompatible).

Iperf reports well over 900Mbits/sec per direction with client in
--dualtest mode, so there does not seem to be a significant impact on
throughput (lan88xx device connected via switch to the peer).

[1] https://github.com/raspberrypi/linux/issues/2447
[2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11

Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch
Fixes: 792aec47d59d ("add microchip LAN88xx phy driver")
Signed-off-by: Fiona Klute <fiona.klute@gmx.de>
Cc: kernel-list@raspberrypi.com
Cc: stable@vger.kernel.org
---
v2:
- add comment why interrupt functions are missing
- add Fixes reference
v1: https://lore.kernel.org/netdev/20250414152634.2786447-1-fiona.klute@gmx.de/

 drivers/net/phy/microchip.c | 46 +++----------------------------------
 1 file changed, 3 insertions(+), 43 deletions(-)

Comments

Andrew Lunn April 16, 2025, 12:27 p.m. UTC | #1
On Wed, Apr 16, 2025 at 12:24:13PM +0200, Fiona Klute wrote:
> With lan88xx based devices the lan78xx driver can get stuck in an
> interrupt loop while bringing the device up, flooding the kernel log
> with messages like the following:
> 
> lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped
> 
> Removing interrupt support from the lan88xx PHY driver forces the
> driver to use polling instead, which avoids the problem.
> 
> The issue has been observed with Raspberry Pi devices at least since
> 4.14 (see [1], bug report for their downstream kernel), as well as
> with Nvidia devices [2] in 2020, where disabling polling was the
> vendor-suggested workaround (together with the claim that phylib
> changes in 4.9 made the interrupt handling in lan78xx incompatible).
> 
> Iperf reports well over 900Mbits/sec per direction with client in
> --dualtest mode, so there does not seem to be a significant impact on
> throughput (lan88xx device connected via switch to the peer).
> 
> [1] https://github.com/raspberrypi/linux/issues/2447
> [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11
> 
> Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch
> Fixes: 792aec47d59d ("add microchip LAN88xx phy driver")
> Signed-off-by: Fiona Klute <fiona.klute@gmx.de>
> Cc: kernel-list@raspberrypi.com
> Cc: stable@vger.kernel.org

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Fiona Klute April 17, 2025, 9:05 a.m. UTC | #2
Am 16.04.25 um 12:24 schrieb Fiona Klute:
> With lan88xx based devices the lan78xx driver can get stuck in an
> interrupt loop while bringing the device up, flooding the kernel log
> with messages like the following:
> 
> lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped
> 
> Removing interrupt support from the lan88xx PHY driver forces the
> driver to use polling instead, which avoids the problem.
> 
> The issue has been observed with Raspberry Pi devices at least since
> 4.14 (see [1], bug report for their downstream kernel), as well as
> with Nvidia devices [2] in 2020, where disabling polling was the

I noticed I got words mixed up here, needs to be either "disabling 
interrupts" or "forcing polling", not "disabling polling".

Should I re-send, or is that something that can be fixed while applying?

Best regards,
Fiona

> vendor-suggested workaround (together with the claim that phylib
> changes in 4.9 made the interrupt handling in lan78xx incompatible).
> 
> Iperf reports well over 900Mbits/sec per direction with client in
> --dualtest mode, so there does not seem to be a significant impact on
> throughput (lan88xx device connected via switch to the peer).
> 
> [1] https://github.com/raspberrypi/linux/issues/2447
> [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11
> 
> Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch
> Fixes: 792aec47d59d ("add microchip LAN88xx phy driver")
> Signed-off-by: Fiona Klute <fiona.klute@gmx.de>
> Cc: kernel-list@raspberrypi.com
> Cc: stable@vger.kernel.org
> ---
> v2:
> - add comment why interrupt functions are missing
> - add Fixes reference
> v1: https://lore.kernel.org/netdev/20250414152634.2786447-1-fiona.klute@gmx.de/
> 
>   drivers/net/phy/microchip.c | 46 +++----------------------------------
>   1 file changed, 3 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c
> index 0e17cc458efdc..93de88c1c8fd5 100644
> --- a/drivers/net/phy/microchip.c
> +++ b/drivers/net/phy/microchip.c
> @@ -37,47 +37,6 @@ static int lan88xx_write_page(struct phy_device *phydev, int page)
>   	return __phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, page);
>   }
>   
> -static int lan88xx_phy_config_intr(struct phy_device *phydev)
> -{
> -	int rc;
> -
> -	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
> -		/* unmask all source and clear them before enable */
> -		rc = phy_write(phydev, LAN88XX_INT_MASK, 0x7FFF);
> -		rc = phy_read(phydev, LAN88XX_INT_STS);
> -		rc = phy_write(phydev, LAN88XX_INT_MASK,
> -			       LAN88XX_INT_MASK_MDINTPIN_EN_ |
> -			       LAN88XX_INT_MASK_LINK_CHANGE_);
> -	} else {
> -		rc = phy_write(phydev, LAN88XX_INT_MASK, 0);
> -		if (rc)
> -			return rc;
> -
> -		/* Ack interrupts after they have been disabled */
> -		rc = phy_read(phydev, LAN88XX_INT_STS);
> -	}
> -
> -	return rc < 0 ? rc : 0;
> -}
> -
> -static irqreturn_t lan88xx_handle_interrupt(struct phy_device *phydev)
> -{
> -	int irq_status;
> -
> -	irq_status = phy_read(phydev, LAN88XX_INT_STS);
> -	if (irq_status < 0) {
> -		phy_error(phydev);
> -		return IRQ_NONE;
> -	}
> -
> -	if (!(irq_status & LAN88XX_INT_STS_LINK_CHANGE_))
> -		return IRQ_NONE;
> -
> -	phy_trigger_machine(phydev);
> -
> -	return IRQ_HANDLED;
> -}
> -
>   static int lan88xx_suspend(struct phy_device *phydev)
>   {
>   	struct lan88xx_priv *priv = phydev->priv;
> @@ -528,8 +487,9 @@ static struct phy_driver microchip_phy_driver[] = {
>   	.config_aneg	= lan88xx_config_aneg,
>   	.link_change_notify = lan88xx_link_change_notify,
>   
> -	.config_intr	= lan88xx_phy_config_intr,
> -	.handle_interrupt = lan88xx_handle_interrupt,
> +	/* Interrupt handling is broken, do not define related
> +	 * functions to force polling.
> +	 */
>   
>   	.suspend	= lan88xx_suspend,
>   	.resume		= genphy_resume,
Thangaraj Samynathan April 17, 2025, 2:17 p.m. UTC | #3
Hi Flona,
We haven't observed this issue with our LAN7801 + LAN88xx setup during
testing in our setups. Issue reported  specifically in RPI and NVidea
with which we have not tested recently. Additionally, community
discussions suggest that this issue may have been introduced due to
driver updates in the LAN78xx. There's no hardware errata indicating
any problems with the interrupts in LAN88xx.

If the issue lies in the interrupt handling within the LAN78xx, it's
likely that similar problems could arise with other PHYs that support
interrupt handling. This need to be debugged and addressed from LAN78xx
driver rather than removing interrupt support in LAN88xx. 

Thanks,
Thangaraj Samynathan
On Thu, 2025-04-17 at 11:05 +0200, Fiona Klute wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> Am 16.04.25 um 12:24 schrieb Fiona Klute:
> > With lan88xx based devices the lan78xx driver can get stuck in an
> > interrupt loop while bringing the device up, flooding the kernel
> > log
> > with messages like the following:
> > 
> > lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped
> > 
> > Removing interrupt support from the lan88xx PHY driver forces the
> > driver to use polling instead, which avoids the problem.
> > 
> > The issue has been observed with Raspberry Pi devices at least
> > since
> > 4.14 (see [1], bug report for their downstream kernel), as well as
> > with Nvidia devices [2] in 2020, where disabling polling was the
> 
> I noticed I got words mixed up here, needs to be either "disabling
> interrupts" or "forcing polling", not "disabling polling".
> 
> Should I re-send, or is that something that can be fixed while
> applying?
> 
> Best regards,
> Fiona
> 
> > vendor-suggested workaround (together with the claim that phylib
> > changes in 4.9 made the interrupt handling in lan78xx
> > incompatible).
> > 
> > Iperf reports well over 900Mbits/sec per direction with client in
> > --dualtest mode, so there does not seem to be a significant impact
> > on
> > throughput (lan88xx device connected via switch to the peer).
> > 
> > [1] https://github.com/raspberrypi/linux/issues/2447
> > [2] 
> > https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11
> > 
> > Link: 
> > https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch
> > Fixes: 792aec47d59d ("add microchip LAN88xx phy driver")
> > Signed-off-by: Fiona Klute <fiona.klute@gmx.de>
> > Cc: kernel-list@raspberrypi.com
> > Cc: stable@vger.kernel.org
> > ---
> > v2:
> > - add comment why interrupt functions are missing
> > - add Fixes reference
> > v1: 
> > https://lore.kernel.org/netdev/20250414152634.2786447-1-fiona.klute@gmx.de/
> > 
> >   drivers/net/phy/microchip.c | 46 +++-----------------------------
> > -----
> >   1 file changed, 3 insertions(+), 43 deletions(-)
> > 
> > diff --git a/drivers/net/phy/microchip.c
> > b/drivers/net/phy/microchip.c
> > index 0e17cc458efdc..93de88c1c8fd5 100644
> > --- a/drivers/net/phy/microchip.c
> > +++ b/drivers/net/phy/microchip.c
> > @@ -37,47 +37,6 @@ static int lan88xx_write_page(struct phy_device
> > *phydev, int page)
> >       return __phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, page);
> >   }
> > 
> > -static int lan88xx_phy_config_intr(struct phy_device *phydev)
> > -{
> > -     int rc;
> > -
> > -     if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
> > -             /* unmask all source and clear them before enable */
> > -             rc = phy_write(phydev, LAN88XX_INT_MASK, 0x7FFF);
> > -             rc = phy_read(phydev, LAN88XX_INT_STS);
> > -             rc = phy_write(phydev, LAN88XX_INT_MASK,
> > -                            LAN88XX_INT_MASK_MDINTPIN_EN_ |
> > -                            LAN88XX_INT_MASK_LINK_CHANGE_);
> > -     } else {
> > -             rc = phy_write(phydev, LAN88XX_INT_MASK, 0);
> > -             if (rc)
> > -                     return rc;
> > -
> > -             /* Ack interrupts after they have been disabled */
> > -             rc = phy_read(phydev, LAN88XX_INT_STS);
> > -     }
> > -
> > -     return rc < 0 ? rc : 0;
> > -}
> > -
> > -static irqreturn_t lan88xx_handle_interrupt(struct phy_device
> > *phydev)
> > -{
> > -     int irq_status;
> > -
> > -     irq_status = phy_read(phydev, LAN88XX_INT_STS);
> > -     if (irq_status < 0) {
> > -             phy_error(phydev);
> > -             return IRQ_NONE;
> > -     }
> > -
> > -     if (!(irq_status & LAN88XX_INT_STS_LINK_CHANGE_))
> > -             return IRQ_NONE;
> > -
> > -     phy_trigger_machine(phydev);
> > -
> > -     return IRQ_HANDLED;
> > -}
> > -
> >   static int lan88xx_suspend(struct phy_device *phydev)
> >   {
> >       struct lan88xx_priv *priv = phydev->priv;
> > @@ -528,8 +487,9 @@ static struct phy_driver microchip_phy_driver[]
> > = {
> >       .config_aneg    = lan88xx_config_aneg,
> >       .link_change_notify = lan88xx_link_change_notify,
> > 
> > -     .config_intr    = lan88xx_phy_config_intr,
> > -     .handle_interrupt = lan88xx_handle_interrupt,
> > +     /* Interrupt handling is broken, do not define related
> > +      * functions to force polling.
> > +      */
> > 
> >       .suspend        = lan88xx_suspend,
> >       .resume         = genphy_resume,
Andrew Lunn April 17, 2025, 6:11 p.m. UTC | #4
On Thu, Apr 17, 2025 at 02:17:20PM +0000, Thangaraj.S@microchip.com wrote:
> Hi Flona,
> We haven't observed this issue with our LAN7801 + LAN88xx setup during
> testing in our setups. Issue reported  specifically in RPI and NVidea
> with which we have not tested recently. Additionally, community
> discussions suggest that this issue may have been introduced due to
> driver updates in the LAN78xx. There's no hardware errata indicating
> any problems with the interrupts in LAN88xx.
> 
> If the issue lies in the interrupt handling within the LAN78xx, it's
> likely that similar problems could arise with other PHYs that support
> interrupt handling. This need to be debugged and addressed from LAN78xx
> driver rather than removing interrupt support in LAN88xx. 

Hi Thangaraj

It is an easily available platform. Maybe you can get one and try to
debug it?

This is a self contained simple fix. From a pragmatic standpoint, it
solves an issue which is bothering people. If you do find and fix an
issue in the LAN78xx this change can be easily reverted. So i'm
learning towards merging it.

	Andrew
diff mbox series

Patch

diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c
index 0e17cc458efdc..93de88c1c8fd5 100644
--- a/drivers/net/phy/microchip.c
+++ b/drivers/net/phy/microchip.c
@@ -37,47 +37,6 @@  static int lan88xx_write_page(struct phy_device *phydev, int page)
 	return __phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, page);
 }
 
-static int lan88xx_phy_config_intr(struct phy_device *phydev)
-{
-	int rc;
-
-	if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
-		/* unmask all source and clear them before enable */
-		rc = phy_write(phydev, LAN88XX_INT_MASK, 0x7FFF);
-		rc = phy_read(phydev, LAN88XX_INT_STS);
-		rc = phy_write(phydev, LAN88XX_INT_MASK,
-			       LAN88XX_INT_MASK_MDINTPIN_EN_ |
-			       LAN88XX_INT_MASK_LINK_CHANGE_);
-	} else {
-		rc = phy_write(phydev, LAN88XX_INT_MASK, 0);
-		if (rc)
-			return rc;
-
-		/* Ack interrupts after they have been disabled */
-		rc = phy_read(phydev, LAN88XX_INT_STS);
-	}
-
-	return rc < 0 ? rc : 0;
-}
-
-static irqreturn_t lan88xx_handle_interrupt(struct phy_device *phydev)
-{
-	int irq_status;
-
-	irq_status = phy_read(phydev, LAN88XX_INT_STS);
-	if (irq_status < 0) {
-		phy_error(phydev);
-		return IRQ_NONE;
-	}
-
-	if (!(irq_status & LAN88XX_INT_STS_LINK_CHANGE_))
-		return IRQ_NONE;
-
-	phy_trigger_machine(phydev);
-
-	return IRQ_HANDLED;
-}
-
 static int lan88xx_suspend(struct phy_device *phydev)
 {
 	struct lan88xx_priv *priv = phydev->priv;
@@ -528,8 +487,9 @@  static struct phy_driver microchip_phy_driver[] = {
 	.config_aneg	= lan88xx_config_aneg,
 	.link_change_notify = lan88xx_link_change_notify,
 
-	.config_intr	= lan88xx_phy_config_intr,
-	.handle_interrupt = lan88xx_handle_interrupt,
+	/* Interrupt handling is broken, do not define related
+	 * functions to force polling.
+	 */
 
 	.suspend	= lan88xx_suspend,
 	.resume		= genphy_resume,