Message ID | 20250416102413.30654-1-fiona.klute@gmx.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [net,v2] net: phy: microchip: force IRQ polling mode for lan88xx | expand |
On Wed, Apr 16, 2025 at 12:24:13PM +0200, Fiona Klute wrote: > With lan88xx based devices the lan78xx driver can get stuck in an > interrupt loop while bringing the device up, flooding the kernel log > with messages like the following: > > lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped > > Removing interrupt support from the lan88xx PHY driver forces the > driver to use polling instead, which avoids the problem. > > The issue has been observed with Raspberry Pi devices at least since > 4.14 (see [1], bug report for their downstream kernel), as well as > with Nvidia devices [2] in 2020, where disabling polling was the > vendor-suggested workaround (together with the claim that phylib > changes in 4.9 made the interrupt handling in lan78xx incompatible). > > Iperf reports well over 900Mbits/sec per direction with client in > --dualtest mode, so there does not seem to be a significant impact on > throughput (lan88xx device connected via switch to the peer). > > [1] https://github.com/raspberrypi/linux/issues/2447 > [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 > > Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch > Fixes: 792aec47d59d ("add microchip LAN88xx phy driver") > Signed-off-by: Fiona Klute <fiona.klute@gmx.de> > Cc: kernel-list@raspberrypi.com > Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Andrew
Am 16.04.25 um 12:24 schrieb Fiona Klute: > With lan88xx based devices the lan78xx driver can get stuck in an > interrupt loop while bringing the device up, flooding the kernel log > with messages like the following: > > lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped > > Removing interrupt support from the lan88xx PHY driver forces the > driver to use polling instead, which avoids the problem. > > The issue has been observed with Raspberry Pi devices at least since > 4.14 (see [1], bug report for their downstream kernel), as well as > with Nvidia devices [2] in 2020, where disabling polling was the I noticed I got words mixed up here, needs to be either "disabling interrupts" or "forcing polling", not "disabling polling". Should I re-send, or is that something that can be fixed while applying? Best regards, Fiona > vendor-suggested workaround (together with the claim that phylib > changes in 4.9 made the interrupt handling in lan78xx incompatible). > > Iperf reports well over 900Mbits/sec per direction with client in > --dualtest mode, so there does not seem to be a significant impact on > throughput (lan88xx device connected via switch to the peer). > > [1] https://github.com/raspberrypi/linux/issues/2447 > [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 > > Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch > Fixes: 792aec47d59d ("add microchip LAN88xx phy driver") > Signed-off-by: Fiona Klute <fiona.klute@gmx.de> > Cc: kernel-list@raspberrypi.com > Cc: stable@vger.kernel.org > --- > v2: > - add comment why interrupt functions are missing > - add Fixes reference > v1: https://lore.kernel.org/netdev/20250414152634.2786447-1-fiona.klute@gmx.de/ > > drivers/net/phy/microchip.c | 46 +++---------------------------------- > 1 file changed, 3 insertions(+), 43 deletions(-) > > diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c > index 0e17cc458efdc..93de88c1c8fd5 100644 > --- a/drivers/net/phy/microchip.c > +++ b/drivers/net/phy/microchip.c > @@ -37,47 +37,6 @@ static int lan88xx_write_page(struct phy_device *phydev, int page) > return __phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, page); > } > > -static int lan88xx_phy_config_intr(struct phy_device *phydev) > -{ > - int rc; > - > - if (phydev->interrupts == PHY_INTERRUPT_ENABLED) { > - /* unmask all source and clear them before enable */ > - rc = phy_write(phydev, LAN88XX_INT_MASK, 0x7FFF); > - rc = phy_read(phydev, LAN88XX_INT_STS); > - rc = phy_write(phydev, LAN88XX_INT_MASK, > - LAN88XX_INT_MASK_MDINTPIN_EN_ | > - LAN88XX_INT_MASK_LINK_CHANGE_); > - } else { > - rc = phy_write(phydev, LAN88XX_INT_MASK, 0); > - if (rc) > - return rc; > - > - /* Ack interrupts after they have been disabled */ > - rc = phy_read(phydev, LAN88XX_INT_STS); > - } > - > - return rc < 0 ? rc : 0; > -} > - > -static irqreturn_t lan88xx_handle_interrupt(struct phy_device *phydev) > -{ > - int irq_status; > - > - irq_status = phy_read(phydev, LAN88XX_INT_STS); > - if (irq_status < 0) { > - phy_error(phydev); > - return IRQ_NONE; > - } > - > - if (!(irq_status & LAN88XX_INT_STS_LINK_CHANGE_)) > - return IRQ_NONE; > - > - phy_trigger_machine(phydev); > - > - return IRQ_HANDLED; > -} > - > static int lan88xx_suspend(struct phy_device *phydev) > { > struct lan88xx_priv *priv = phydev->priv; > @@ -528,8 +487,9 @@ static struct phy_driver microchip_phy_driver[] = { > .config_aneg = lan88xx_config_aneg, > .link_change_notify = lan88xx_link_change_notify, > > - .config_intr = lan88xx_phy_config_intr, > - .handle_interrupt = lan88xx_handle_interrupt, > + /* Interrupt handling is broken, do not define related > + * functions to force polling. > + */ > > .suspend = lan88xx_suspend, > .resume = genphy_resume,
Hi Flona, We haven't observed this issue with our LAN7801 + LAN88xx setup during testing in our setups. Issue reported specifically in RPI and NVidea with which we have not tested recently. Additionally, community discussions suggest that this issue may have been introduced due to driver updates in the LAN78xx. There's no hardware errata indicating any problems with the interrupts in LAN88xx. If the issue lies in the interrupt handling within the LAN78xx, it's likely that similar problems could arise with other PHYs that support interrupt handling. This need to be debugged and addressed from LAN78xx driver rather than removing interrupt support in LAN88xx. Thanks, Thangaraj Samynathan On Thu, 2025-04-17 at 11:05 +0200, Fiona Klute wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you > know the content is safe > > Am 16.04.25 um 12:24 schrieb Fiona Klute: > > With lan88xx based devices the lan78xx driver can get stuck in an > > interrupt loop while bringing the device up, flooding the kernel > > log > > with messages like the following: > > > > lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped > > > > Removing interrupt support from the lan88xx PHY driver forces the > > driver to use polling instead, which avoids the problem. > > > > The issue has been observed with Raspberry Pi devices at least > > since > > 4.14 (see [1], bug report for their downstream kernel), as well as > > with Nvidia devices [2] in 2020, where disabling polling was the > > I noticed I got words mixed up here, needs to be either "disabling > interrupts" or "forcing polling", not "disabling polling". > > Should I re-send, or is that something that can be fixed while > applying? > > Best regards, > Fiona > > > vendor-suggested workaround (together with the claim that phylib > > changes in 4.9 made the interrupt handling in lan78xx > > incompatible). > > > > Iperf reports well over 900Mbits/sec per direction with client in > > --dualtest mode, so there does not seem to be a significant impact > > on > > throughput (lan88xx device connected via switch to the peer). > > > > [1] https://github.com/raspberrypi/linux/issues/2447 > > [2] > > https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 > > > > Link: > > https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch > > Fixes: 792aec47d59d ("add microchip LAN88xx phy driver") > > Signed-off-by: Fiona Klute <fiona.klute@gmx.de> > > Cc: kernel-list@raspberrypi.com > > Cc: stable@vger.kernel.org > > --- > > v2: > > - add comment why interrupt functions are missing > > - add Fixes reference > > v1: > > https://lore.kernel.org/netdev/20250414152634.2786447-1-fiona.klute@gmx.de/ > > > > drivers/net/phy/microchip.c | 46 +++----------------------------- > > ----- > > 1 file changed, 3 insertions(+), 43 deletions(-) > > > > diff --git a/drivers/net/phy/microchip.c > > b/drivers/net/phy/microchip.c > > index 0e17cc458efdc..93de88c1c8fd5 100644 > > --- a/drivers/net/phy/microchip.c > > +++ b/drivers/net/phy/microchip.c > > @@ -37,47 +37,6 @@ static int lan88xx_write_page(struct phy_device > > *phydev, int page) > > return __phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, page); > > } > > > > -static int lan88xx_phy_config_intr(struct phy_device *phydev) > > -{ > > - int rc; > > - > > - if (phydev->interrupts == PHY_INTERRUPT_ENABLED) { > > - /* unmask all source and clear them before enable */ > > - rc = phy_write(phydev, LAN88XX_INT_MASK, 0x7FFF); > > - rc = phy_read(phydev, LAN88XX_INT_STS); > > - rc = phy_write(phydev, LAN88XX_INT_MASK, > > - LAN88XX_INT_MASK_MDINTPIN_EN_ | > > - LAN88XX_INT_MASK_LINK_CHANGE_); > > - } else { > > - rc = phy_write(phydev, LAN88XX_INT_MASK, 0); > > - if (rc) > > - return rc; > > - > > - /* Ack interrupts after they have been disabled */ > > - rc = phy_read(phydev, LAN88XX_INT_STS); > > - } > > - > > - return rc < 0 ? rc : 0; > > -} > > - > > -static irqreturn_t lan88xx_handle_interrupt(struct phy_device > > *phydev) > > -{ > > - int irq_status; > > - > > - irq_status = phy_read(phydev, LAN88XX_INT_STS); > > - if (irq_status < 0) { > > - phy_error(phydev); > > - return IRQ_NONE; > > - } > > - > > - if (!(irq_status & LAN88XX_INT_STS_LINK_CHANGE_)) > > - return IRQ_NONE; > > - > > - phy_trigger_machine(phydev); > > - > > - return IRQ_HANDLED; > > -} > > - > > static int lan88xx_suspend(struct phy_device *phydev) > > { > > struct lan88xx_priv *priv = phydev->priv; > > @@ -528,8 +487,9 @@ static struct phy_driver microchip_phy_driver[] > > = { > > .config_aneg = lan88xx_config_aneg, > > .link_change_notify = lan88xx_link_change_notify, > > > > - .config_intr = lan88xx_phy_config_intr, > > - .handle_interrupt = lan88xx_handle_interrupt, > > + /* Interrupt handling is broken, do not define related > > + * functions to force polling. > > + */ > > > > .suspend = lan88xx_suspend, > > .resume = genphy_resume,
On Thu, Apr 17, 2025 at 02:17:20PM +0000, Thangaraj.S@microchip.com wrote: > Hi Flona, > We haven't observed this issue with our LAN7801 + LAN88xx setup during > testing in our setups. Issue reported specifically in RPI and NVidea > with which we have not tested recently. Additionally, community > discussions suggest that this issue may have been introduced due to > driver updates in the LAN78xx. There's no hardware errata indicating > any problems with the interrupts in LAN88xx. > > If the issue lies in the interrupt handling within the LAN78xx, it's > likely that similar problems could arise with other PHYs that support > interrupt handling. This need to be debugged and addressed from LAN78xx > driver rather than removing interrupt support in LAN88xx. Hi Thangaraj It is an easily available platform. Maybe you can get one and try to debug it? This is a self contained simple fix. From a pragmatic standpoint, it solves an issue which is bothering people. If you do find and fix an issue in the LAN78xx this change can be easily reverted. So i'm learning towards merging it. Andrew
diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c index 0e17cc458efdc..93de88c1c8fd5 100644 --- a/drivers/net/phy/microchip.c +++ b/drivers/net/phy/microchip.c @@ -37,47 +37,6 @@ static int lan88xx_write_page(struct phy_device *phydev, int page) return __phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, page); } -static int lan88xx_phy_config_intr(struct phy_device *phydev) -{ - int rc; - - if (phydev->interrupts == PHY_INTERRUPT_ENABLED) { - /* unmask all source and clear them before enable */ - rc = phy_write(phydev, LAN88XX_INT_MASK, 0x7FFF); - rc = phy_read(phydev, LAN88XX_INT_STS); - rc = phy_write(phydev, LAN88XX_INT_MASK, - LAN88XX_INT_MASK_MDINTPIN_EN_ | - LAN88XX_INT_MASK_LINK_CHANGE_); - } else { - rc = phy_write(phydev, LAN88XX_INT_MASK, 0); - if (rc) - return rc; - - /* Ack interrupts after they have been disabled */ - rc = phy_read(phydev, LAN88XX_INT_STS); - } - - return rc < 0 ? rc : 0; -} - -static irqreturn_t lan88xx_handle_interrupt(struct phy_device *phydev) -{ - int irq_status; - - irq_status = phy_read(phydev, LAN88XX_INT_STS); - if (irq_status < 0) { - phy_error(phydev); - return IRQ_NONE; - } - - if (!(irq_status & LAN88XX_INT_STS_LINK_CHANGE_)) - return IRQ_NONE; - - phy_trigger_machine(phydev); - - return IRQ_HANDLED; -} - static int lan88xx_suspend(struct phy_device *phydev) { struct lan88xx_priv *priv = phydev->priv; @@ -528,8 +487,9 @@ static struct phy_driver microchip_phy_driver[] = { .config_aneg = lan88xx_config_aneg, .link_change_notify = lan88xx_link_change_notify, - .config_intr = lan88xx_phy_config_intr, - .handle_interrupt = lan88xx_handle_interrupt, + /* Interrupt handling is broken, do not define related + * functions to force polling. + */ .suspend = lan88xx_suspend, .resume = genphy_resume,
With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling polling was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] https://github.com/raspberrypi/linux/issues/2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch Fixes: 792aec47d59d ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <fiona.klute@gmx.de> Cc: kernel-list@raspberrypi.com Cc: stable@vger.kernel.org --- v2: - add comment why interrupt functions are missing - add Fixes reference v1: https://lore.kernel.org/netdev/20250414152634.2786447-1-fiona.klute@gmx.de/ drivers/net/phy/microchip.c | 46 +++---------------------------------- 1 file changed, 3 insertions(+), 43 deletions(-)