Message ID | 20240828204135.6543-1-rosenp@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: ag71xx: disable napi interrupts during probe | expand |
On 8/28/2024 1:41 PM, Rosen Penev wrote: > From: Sven Eckelmann <sven@narfation.org> > > ag71xx_probe is registering ag71xx_interrupt as handler for gmac0/gmac1 > interrupts. The handler is trying to use napi_schedule to handle the > processing of packets. But the netif_napi_add for this device is > called a lot later in ag71xx_probe. > > It can therefore happen that a still running gmac0/gmac1 is triggering the > interrupt handler with a bit from AG71XX_INT_POLL set in > AG71XX_REG_INT_STATUS. The handler will then call napi_schedule and the > napi code will crash the system because the ag->napi is not yet > initialized. > > The gmcc0/gmac1 must be brought in a state in which it doesn't signal a > AG71XX_INT_POLL related status bits as interrupt before registering the > interrupt handler. ag71xx_hw_start will take care of re-initializing the > AG71XX_REG_INT_ENABLE. > > Signed-off-by: Sven Eckelmann <sven@narfation.org> > Signed-off-by: Rosen Penev <rosenp@gmail.com> > --- The description reads like a bug fix, so I would expect this to be targeted to net and have a Fixes tag indicating what commit introduced the issue, maybe: Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") The change seems reasonable to me otherwise. > drivers/net/ethernet/atheros/ag71xx.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c > index 0674a042e8d3..435c4b19acdd 100644 > --- a/drivers/net/ethernet/atheros/ag71xx.c > +++ b/drivers/net/ethernet/atheros/ag71xx.c > @@ -1855,6 +1855,12 @@ static int ag71xx_probe(struct platform_device *pdev) > if (!ag->mac_base) > return -ENOMEM; > > + /* ensure that HW is in manual polling mode before interrupts are > + * activated. Otherwise ag71xx_interrupt might call napi_schedule > + * before it is initialized by netif_napi_add. > + */ > + ag71xx_int_disable(ag, AG71XX_INT_POLL); > + > ndev->irq = platform_get_irq(pdev, 0); > err = devm_request_irq(&pdev->dev, ndev->irq, ag71xx_interrupt, > 0x0, dev_name(&pdev->dev), ndev);
On Wed, Aug 28, 2024 at 2:05 PM Jacob Keller <jacob.e.keller@intel.com> wrote: > > > > On 8/28/2024 1:41 PM, Rosen Penev wrote: > > From: Sven Eckelmann <sven@narfation.org> > > > > ag71xx_probe is registering ag71xx_interrupt as handler for gmac0/gmac1 > > interrupts. The handler is trying to use napi_schedule to handle the > > processing of packets. But the netif_napi_add for this device is > > called a lot later in ag71xx_probe. > > > > It can therefore happen that a still running gmac0/gmac1 is triggering the > > interrupt handler with a bit from AG71XX_INT_POLL set in > > AG71XX_REG_INT_STATUS. The handler will then call napi_schedule and the > > napi code will crash the system because the ag->napi is not yet > > initialized. > > > > The gmcc0/gmac1 must be brought in a state in which it doesn't signal a > > AG71XX_INT_POLL related status bits as interrupt before registering the > > interrupt handler. ag71xx_hw_start will take care of re-initializing the > > AG71XX_REG_INT_ENABLE. > > > > Signed-off-by: Sven Eckelmann <sven@narfation.org> > > Signed-off-by: Rosen Penev <rosenp@gmail.com> > > --- > > The description reads like a bug fix, so I would expect this to be > targeted to net and have a Fixes tag indicating what commit introduced > the issue, maybe: > > Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") > > The change seems reasonable to me otherwise. OTOH there are currently no dual GMAC users upstream. Just single. > > > drivers/net/ethernet/atheros/ag71xx.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c > > index 0674a042e8d3..435c4b19acdd 100644 > > --- a/drivers/net/ethernet/atheros/ag71xx.c > > +++ b/drivers/net/ethernet/atheros/ag71xx.c > > @@ -1855,6 +1855,12 @@ static int ag71xx_probe(struct platform_device *pdev) > > if (!ag->mac_base) > > return -ENOMEM; > > > > + /* ensure that HW is in manual polling mode before interrupts are > > + * activated. Otherwise ag71xx_interrupt might call napi_schedule > > + * before it is initialized by netif_napi_add. > > + */ > > + ag71xx_int_disable(ag, AG71XX_INT_POLL); > > + > > ndev->irq = platform_get_irq(pdev, 0); > > err = devm_request_irq(&pdev->dev, ndev->irq, ag71xx_interrupt, > > 0x0, dev_name(&pdev->dev), ndev);
> -----Original Message----- > From: Rosen Penev <rosenp@gmail.com> > Sent: Thursday, August 29, 2024 10:47 AM > To: Keller, Jacob E <jacob.e.keller@intel.com> > Cc: netdev@vger.kernel.org; davem@davemloft.net; edumazet@google.com; > kuba@kernel.org; pabeni@redhat.com; linux@armlinux.org.uk; linux- > kernel@vger.kernel.org; o.rempel@pengutronix.de; p.zabel@pengutronix.de > Subject: Re: [PATCH net-next] net: ag71xx: disable napi interrupts during probe > > On Wed, Aug 28, 2024 at 2:05 PM Jacob Keller <jacob.e.keller@intel.com> wrote: > > > > > > > > On 8/28/2024 1:41 PM, Rosen Penev wrote: > > > From: Sven Eckelmann <sven@narfation.org> > > > > > > ag71xx_probe is registering ag71xx_interrupt as handler for gmac0/gmac1 > > > interrupts. The handler is trying to use napi_schedule to handle the > > > processing of packets. But the netif_napi_add for this device is > > > called a lot later in ag71xx_probe. > > > > > > It can therefore happen that a still running gmac0/gmac1 is triggering the > > > interrupt handler with a bit from AG71XX_INT_POLL set in > > > AG71XX_REG_INT_STATUS. The handler will then call napi_schedule and the > > > napi code will crash the system because the ag->napi is not yet > > > initialized. > > > > > > The gmcc0/gmac1 must be brought in a state in which it doesn't signal a > > > AG71XX_INT_POLL related status bits as interrupt before registering the > > > interrupt handler. ag71xx_hw_start will take care of re-initializing the > > > AG71XX_REG_INT_ENABLE. > > > > > > Signed-off-by: Sven Eckelmann <sven@narfation.org> > > > Signed-off-by: Rosen Penev <rosenp@gmail.com> > > > --- > > > > The description reads like a bug fix, so I would expect this to be > > targeted to net and have a Fixes tag indicating what commit introduced > > the issue, maybe: > > > > Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver") > > > > The change seems reasonable to me otherwise. > OTOH there are currently no dual GMAC users upstream. Just single. > If that’s the case, updating the description to make that clear would help.
diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c index 0674a042e8d3..435c4b19acdd 100644 --- a/drivers/net/ethernet/atheros/ag71xx.c +++ b/drivers/net/ethernet/atheros/ag71xx.c @@ -1855,6 +1855,12 @@ static int ag71xx_probe(struct platform_device *pdev) if (!ag->mac_base) return -ENOMEM; + /* ensure that HW is in manual polling mode before interrupts are + * activated. Otherwise ag71xx_interrupt might call napi_schedule + * before it is initialized by netif_napi_add. + */ + ag71xx_int_disable(ag, AG71XX_INT_POLL); + ndev->irq = platform_get_irq(pdev, 0); err = devm_request_irq(&pdev->dev, ndev->irq, ag71xx_interrupt, 0x0, dev_name(&pdev->dev), ndev);