Message ID | 20220217145527.2696444-1-vincent.whitchurch@axis.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: stmmac: Enable NAPI before interrupts go live | expand |
On Thu, 17 Feb 2022 15:55:26 +0100 Vincent Whitchurch wrote: > From: Lars Persson <larper@axis.com> > > The stmmac_open function has a race window between enabling the RX > path and its interrupt to the point where napi_enabled is called. > > A chatty network with plenty of broadcast/multicast traffic has the > potential to completely fill the RX ring before the interrupt handler > is installed. In this scenario the single interrupt taken will find > napi disabled and the RX ring will not be processed. No further RX > interrupt will be delivered because the ring is full. > > The RX stall could eventually clear because the TX path will trigger a > DMA interrupt once the tx_coal_frames threshold is reached and then > NAPI becomes scheduled. LGTM, although now the ndo_open and ndo_stop paths are not symmetrical. Is there no way to mask the IRQs so that they don't fire immediately? More common flow (IMO) would be: - request irq - mask irq - populate rings - start dma - enable napi - unmask irq Other than the difference in flow between open/stop there may also be some unpleasantness around restarting tx queues twice with the patch as is.
On Fri, Feb 18, 2022 at 05:36:04AM +0100, Jakub Kicinski wrote: > On Thu, 17 Feb 2022 15:55:26 +0100 Vincent Whitchurch wrote: > > The stmmac_open function has a race window between enabling the RX > > path and its interrupt to the point where napi_enabled is called. > > > > A chatty network with plenty of broadcast/multicast traffic has the > > potential to completely fill the RX ring before the interrupt handler > > is installed. In this scenario the single interrupt taken will find > > napi disabled and the RX ring will not be processed. No further RX > > interrupt will be delivered because the ring is full. > > > > The RX stall could eventually clear because the TX path will trigger a > > DMA interrupt once the tx_coal_frames threshold is reached and then > > NAPI becomes scheduled. > > LGTM, although now the ndo_open and ndo_stop paths are not symmetrical. > Is there no way to mask the IRQs so that they don't fire immediately? The initial enabling of the DMA irqs is done as part of the ->init_chan() callback inside the various variants. We could use the ->disable_dma_irq() callback to to disable the DMA irqs and only enable them at the end of the init sequence with a stmmac_enable_all_dma_irq(). This avoids having to change all the variants and there should be no problem in calling ->disable_dma_irq() after the interrupts have been momentarily enabled in ->stmmac_init_chan() since the DMA is reset before these calls and not started until later. Note that I haven't added a symmetrical stmmac_disable_all_dma_irq() at the top of stmmac_release() before the NAPI disable since I can't see that it would do any good there since NAPI can re-enable DMA irqs. > More common flow (IMO) would be: > - request irq > - mask irq > - populate rings > - start dma > - enable napi > - unmask irq I don't think this driver has ever followed this exact sequence, but the request_irq() was done before the "start dma" and the "enable napi" before the commit mentioned in the Fixes: tag. But that's quite a while ago and the driver has changed a lot since then and gotten support for a lot of variants and features which I can't test, so I didn't dare to rewrite the entire init sequence. > Other than the difference in flow between open/stop there may also be > some unpleasantness around restarting tx queues twice with the patch > as is. New patch below, it works for me (though I don't have hardware with working suspend/resume support). I will send it out as a v2 if there are no objections. Thanks. 8<----- diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 6708ca2aa4f7..43978558d6c0 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2260,6 +2260,23 @@ static void stmmac_stop_tx_dma(struct stmmac_priv *priv, u32 chan) stmmac_stop_tx(priv, priv->ioaddr, chan); } +static void stmmac_enable_all_dma_irq(struct stmmac_priv *priv) +{ + u32 rx_channels_count = priv->plat->rx_queues_to_use; + u32 tx_channels_count = priv->plat->tx_queues_to_use; + u32 dma_csr_ch = max(rx_channels_count, tx_channels_count); + u32 chan; + + for (chan = 0; chan < dma_csr_ch; chan++) { + struct stmmac_channel *ch = &priv->channel[chan]; + unsigned long flags; + + spin_lock_irqsave(&ch->lock, flags); + stmmac_enable_dma_irq(priv, priv->ioaddr, chan, 1, 1); + spin_unlock_irqrestore(&ch->lock, flags); + } +} + /** * stmmac_start_all_dma - start all RX and TX DMA channels * @priv: driver private structure @@ -2902,8 +2919,10 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) stmmac_axi(priv, priv->ioaddr, priv->plat->axi); /* DMA CSR Channel configuration */ - for (chan = 0; chan < dma_csr_ch; chan++) + for (chan = 0; chan < dma_csr_ch; chan++) { stmmac_init_chan(priv, priv->ioaddr, priv->plat->dma_cfg, chan); + stmmac_disable_dma_irq(priv, priv->ioaddr, chan, 1, 1); + } /* DMA RX Channel Configuration */ for (chan = 0; chan < rx_channels_count; chan++) { @@ -3759,6 +3778,7 @@ static int stmmac_open(struct net_device *dev) stmmac_enable_all_queues(priv); netif_tx_start_all_queues(priv->dev); + stmmac_enable_all_dma_irq(priv); return 0; @@ -6508,8 +6528,10 @@ int stmmac_xdp_open(struct net_device *dev) } /* DMA CSR Channel configuration */ - for (chan = 0; chan < dma_csr_ch; chan++) + for (chan = 0; chan < dma_csr_ch; chan++) { stmmac_init_chan(priv, priv->ioaddr, priv->plat->dma_cfg, chan); + stmmac_disable_dma_irq(priv, priv->ioaddr, chan, 1, 1); + } /* Adjust Split header */ sph_en = (priv->hw->rx_csum > 0) && priv->sph; @@ -6570,6 +6592,7 @@ int stmmac_xdp_open(struct net_device *dev) stmmac_enable_all_queues(priv); netif_carrier_on(dev); netif_tx_start_all_queues(dev); + stmmac_enable_all_dma_irq(priv); return 0; @@ -7447,6 +7470,7 @@ int stmmac_resume(struct device *dev) stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); stmmac_enable_all_queues(priv); + stmmac_enable_all_dma_irq(priv); mutex_unlock(&priv->lock); rtnl_unlock();
On 17.02.2022 20:36:04, Jakub Kicinski wrote: > On Thu, 17 Feb 2022 15:55:26 +0100 Vincent Whitchurch wrote: > > From: Lars Persson <larper@axis.com> > > > > The stmmac_open function has a race window between enabling the RX > > path and its interrupt to the point where napi_enabled is called. > > > > A chatty network with plenty of broadcast/multicast traffic has the > > potential to completely fill the RX ring before the interrupt handler > > is installed. In this scenario the single interrupt taken will find > > napi disabled and the RX ring will not be processed. No further RX > > interrupt will be delivered because the ring is full. > > > > The RX stall could eventually clear because the TX path will trigger a > > DMA interrupt once the tx_coal_frames threshold is reached and then > > NAPI becomes scheduled. > > LGTM, although now the ndo_open and ndo_stop paths are not symmetrical. > Is there no way to mask the IRQs so that they don't fire immediately? > More common flow (IMO) would be: > - request irq > - mask irq I think you can merge these, to avoid a race condition, see: | cbe16f35bee6 genirq: Add IRQF_NO_AUTOEN for request_irq/nmi() > - populate rings > - start dma > - enable napi > - unmask irq > Other than the difference in flow between open/stop there may also be > some unpleasantness around restarting tx queues twice with the patch > as is. regards, Marc
On Fri, 18 Feb 2022 11:45:04 +0100 Marc Kleine-Budde wrote: > > - request irq > > - mask irq > > I think you can merge these, to avoid a race condition, see: > > | cbe16f35bee6 genirq: Add IRQF_NO_AUTOEN for request_irq/nmi() GTK, finally someone implemented it! :)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 6708ca2aa4f7..8bd4123515b0 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3753,11 +3753,12 @@ static int stmmac_open(struct net_device *dev) /* We may have called phylink_speed_down before */ phylink_speed_up(priv->phylink); + stmmac_enable_all_queues(priv); + ret = stmmac_request_irq(dev); if (ret) goto irq_error; - stmmac_enable_all_queues(priv); netif_tx_start_all_queues(priv->dev); return 0; @@ -3768,6 +3769,7 @@ static int stmmac_open(struct net_device *dev) for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++) hrtimer_cancel(&priv->tx_queue[chan].txtimer); + stmmac_disable_all_queues(priv); stmmac_hw_teardown(dev); init_error: free_dma_desc_resources(priv); @@ -6562,12 +6564,13 @@ int stmmac_xdp_open(struct net_device *dev) /* Start Rx & Tx DMA Channels */ stmmac_start_all_dma(priv); + /* Enable NAPI process*/ + stmmac_enable_all_queues(priv); + ret = stmmac_request_irq(dev); if (ret) goto irq_error; - /* Enable NAPI process*/ - stmmac_enable_all_queues(priv); netif_carrier_on(dev); netif_tx_start_all_queues(dev); @@ -6577,6 +6580,7 @@ int stmmac_xdp_open(struct net_device *dev) for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++) hrtimer_cancel(&priv->tx_queue[chan].txtimer); + stmmac_disable_all_queues(priv); stmmac_hw_teardown(dev); init_error: free_dma_desc_resources(priv);