Message ID | Z7dVp7_InAHe0q_y@shell.armlinux.org.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | net: stmmac: weirdness in stmmac_hw_setup() | expand |
On Thu, Feb 20, 2025 at 04:17:43PM +0000, Russell King (Oracle) wrote: > While looking through the stmmac driver, I've come across some > weirdness in stmmac_hw_setup() which looks completely barmy to me. > > It seems that it follows the initialisation suggested by Synopsys > (as best I can determine from the iMX8MP documentation), even going > as far as to *enable* transmit and receive *before* the network > device has been administratively brought up. stmmac_hw_setup() does > this: > > /* Enable the MAC Rx/Tx */ > stmmac_mac_set(priv, priv->ioaddr, true); > > which sets the TE and RE bits in the MAC configuration register. > > This means that if the network link is active, packets will start > to be received and will be placed into the receive descriptors. > > We won't transmit anything because we won't be placing packets in > the transmit descriptors to be transmitted. > > However, this in stmmac_hw_setup() is just wrong. Can it be deleted > as per the below? > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index c2913f003fe6..d6e492f523f5 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -3493,9 +3493,6 @@ static int stmmac_hw_setup(struct net_device *dev, bool ptp_register) > priv->hw->rx_csum = 0; > } > > - /* Enable the MAC Rx/Tx */ > - stmmac_mac_set(priv, priv->ioaddr, true); > - > /* Set the HW DMA mode and the COE */ > stmmac_dma_operation_mode(priv); > Tested this on Jetson TX2 and couldn't detect a change in behaviour. Booted the device using NFS and ran a few iterations of iperf. Same as before, so: Tested-by: Thierry Reding <treding@nvidia.com>
On Thu, 20 Feb 2025 16:17:43 +0000, "Russell King (Oracle)" wrote: > While looking through the stmmac driver, I've come across some > weirdness in stmmac_hw_setup() which looks completely barmy to me. > > It seems that it follows the initialisation suggested by Synopsys > (as best I can determine from the iMX8MP documentation), even going > as far as to *enable* transmit and receive *before* the network > device has been administratively brought up. stmmac_hw_setup() does > this: > > /* Enable the MAC Rx/Tx */ > stmmac_mac_set(priv, priv->ioaddr, true); > > which sets the TE and RE bits in the MAC configuration register. > > This means that if the network link is active, packets will start > to be received and will be placed into the receive descriptors. > > We won't transmit anything because we won't be placing packets in > the transmit descriptors to be transmitted. > > However, this in stmmac_hw_setup() is just wrong. Can it be deleted > as per the below? > Tested-by: Thierry Reding <treding@nvidia.com> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index c2913f003fe6..d6e492f523f5 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -3493,9 +3493,6 @@ static int stmmac_hw_setup(struct net_device *dev, bool ptp_register) > priv->hw->rx_csum = 0; > } > > - /* Enable the MAC Rx/Tx */ > - stmmac_mac_set(priv, priv->ioaddr, true); > - > /* Set the HW DMA mode and the COE */ > stmmac_dma_operation_mode(priv); > A better fix here: https://lore.kernel.org/all/tencent_CCC29C4F562F2DEFE48289DB52F4D91BDE05@qq.com/ I can reproduce and confirm that patch works well. I was going to leave a Tested-by: tag on that patch if everything looks good enough, but the author (now copied) never come back.
On Wed, Feb 26, 2025 at 10:43:26AM +0800, Furong Xu wrote: > On Thu, 20 Feb 2025 16:17:43 +0000, "Russell King (Oracle)" wrote: > > > While looking through the stmmac driver, I've come across some > > weirdness in stmmac_hw_setup() which looks completely barmy to me. > > > > It seems that it follows the initialisation suggested by Synopsys > > (as best I can determine from the iMX8MP documentation), even going > > as far as to *enable* transmit and receive *before* the network > > device has been administratively brought up. stmmac_hw_setup() does > > this: > > > > /* Enable the MAC Rx/Tx */ > > stmmac_mac_set(priv, priv->ioaddr, true); > > > > which sets the TE and RE bits in the MAC configuration register. > > > > This means that if the network link is active, packets will start > > to be received and will be placed into the receive descriptors. > > > > We won't transmit anything because we won't be placing packets in > > the transmit descriptors to be transmitted. > > > > However, this in stmmac_hw_setup() is just wrong. Can it be deleted > > as per the below? > > Tested-by: Thierry Reding <treding@nvidia.com> > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index c2913f003fe6..d6e492f523f5 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -3493,9 +3493,6 @@ static int stmmac_hw_setup(struct net_device *dev, bool ptp_register) > > priv->hw->rx_csum = 0; > > } > > > > - /* Enable the MAC Rx/Tx */ > > - stmmac_mac_set(priv, priv->ioaddr, true); > > - > > /* Set the HW DMA mode and the COE */ > > stmmac_dma_operation_mode(priv); > > > > A better fix here: > https://lore.kernel.org/all/tencent_CCC29C4F562F2DEFE48289DB52F4D91BDE05@qq.com/ I don't think that addresses the issue I highlighted above, since it's still calling stmmac_mac_set(, true) in stmmac_hw_setup(), which I believe to be wrong (as per my explanation above.) However, the removal of setting GMAC_CONFIG_TE in the start_tx method looks correct to me, because: - the start_tx method is called via stmmac_start_tx(), which is only called from stmmac_start_tx_dma(). - stmmac_start_tx_dma() is called from: * stmmac_start_all_dma() * stmmac_tx_err() * stmmac_enable_tx_queue() * stmmac_start_all_dma() is called from the end of stmmac_hw_setup(), but we've already called stmmac_mac_set(, true) both before and after the patch in the above URL, so is redundant. Incidentally, this brings the same set of questions I've stated in my initial email, and to me this is wrong. * stmmac_tx_err() can only happen when we are already active (so transmission was already enabled). * stmmac_enable_tx_queue() is called from stmmac_xdp_enable_pool(), which will only call this if netif_running() returns true, implying that the netdev is already adminstratively brought up and thus stmmac_hw_setup() will have been called. Again, this brings the same set of questions I've stated in my initial email, and to me this is wrong. While looking at Simon's comment, he talks about stmmac_xdp_open(), so I just looked at that, and found: netif_carrier_on(dev); Then there's: netif_carrier_off(dev); in stmmac_xdp_release(). These were introduced in commit ac746c8520d9 ("net: stmmac: enhance XDP ZC driver level switching performance"), well after stmmac had been converted to phylink. Phylink documents this: 16. Verify that the driver does not call:: netif_carrier_on() netif_carrier_off() as these will interfere with phylink's tracking of the link state, and cause phylink to omit calls via the :c:func:`mac_link_up` and :c:func:`mac_link_down` methods. So, the presence of these calls will mess up phylink, resulting in mac_link_up() and/or mac_link_down() *NOT* being called at appropriate times. However, stmmac_xdp_(open|release)() doesn't seem to do anything to bring the PHY or PCS up/down. This makes me wonder whether XDP support in the stmmac driver is basically broken, or maybe the netdev needs to already be administratively up (ip li set dev ... up). I don't know XDP as I've never used it. If that is the case, then those netif_carrier_*() calls need removing. Or - I say again - stmmac needs to *stop* using phylink. It's one or the other. A network driver either needs to use phylink properly or not use phylink *at* *all*.
On Wed, Feb 26, 2025 at 10:43:26AM +0800, Furong Xu wrote: > On Thu, 20 Feb 2025 16:17:43 +0000, "Russell King (Oracle)" wrote: > > > While looking through the stmmac driver, I've come across some > > weirdness in stmmac_hw_setup() which looks completely barmy to me. > > > > It seems that it follows the initialisation suggested by Synopsys > > (as best I can determine from the iMX8MP documentation), even going > > as far as to *enable* transmit and receive *before* the network > > device has been administratively brought up. stmmac_hw_setup() does > > this: > > > > /* Enable the MAC Rx/Tx */ > > stmmac_mac_set(priv, priv->ioaddr, true); > > > > which sets the TE and RE bits in the MAC configuration register. > > > > This means that if the network link is active, packets will start > > to be received and will be placed into the receive descriptors. > > > > We won't transmit anything because we won't be placing packets in > > the transmit descriptors to be transmitted. > > > > However, this in stmmac_hw_setup() is just wrong. Can it be deleted > > as per the below? > > Tested-by: Thierry Reding <treding@nvidia.com> > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index c2913f003fe6..d6e492f523f5 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -3493,9 +3493,6 @@ static int stmmac_hw_setup(struct net_device *dev, bool ptp_register) > > priv->hw->rx_csum = 0; > > } > > > > - /* Enable the MAC Rx/Tx */ > > - stmmac_mac_set(priv, priv->ioaddr, true); > > - > > /* Set the HW DMA mode and the COE */ > > stmmac_dma_operation_mode(priv); > > > > A better fix here: > https://lore.kernel.org/all/tencent_CCC29C4F562F2DEFE48289DB52F4D91BDE05@qq.com/ > > I can reproduce and confirm that patch works well. > I was going to leave a Tested-by: tag on that patch if everything looks good enough, > but the author (now copied) never come back. As I now have access to the databook, the initialisation guidance given in appendix E is very clear that at least the RE bit should not be set until initialisation of both DMA and MAC have been completed. RE allows the MAC to start receiving packets and placing them in the FIFO for the DMA block to then transfer them to memory. If the link is down, then there should be no activity on the receive interface, so not setting RE until the link is up should have no effect. Therefore, start_rx methods that set this bit should not be doing so, and we should not set this bit until mac_link_up() has been called. Analysing the code for the RE bit, we have: 1. xxx_set_mac() (called via stmmac_mac_set()) which sets/clears both the RE and TE bits together. This is called with a "true" argument from: 1.1. stmmac_mac_link_up() - correct, the link has come up. 1.2. stmmac_hw_setup() - incorrect, we don't know whether the link is up or down. I think simply removing this would be correct, provided the patches that move phylink_resume() later in stmmac_resume() are merged (will be posted non-RFC only once my current set of stmmac clock changes have been merged.) 1.3. stmmac_xdp_open() - probably incorrect, we don't know whether the link is up or down. stmmac_xdp_open() is only called from stmmac_xdp_set_prog(), and the network interface must be running. This implies that everything has already been initialised, and phylink has been started. stmmac_xdp_open() also messes with the carrier, which will upset phylink. I think the right solution here is to call phylink_stop() in stmmac_xdp_release() and phylink_start() in stmmac_xdp_open(), removing the stmmac_mac_set() and netif_carrier*() calls in both these functions. 2. xxx_pmt() (called via stmmac_pmt()) sets RE if (e.g. dwxgmac2) WAKE_MAGIC or WAKE_UCAST is set. This brings up questions. The only place it's called with a potentially non-zero argument is in stmmac_suspend(). A few lines above is: /* Stop TX/RX DMA */ stmmac_stop_all_dma(priv); which is a reasonable thing to do when suspending. However, the DW databook states clearly that RE should not be set if DMA is not able to empty the FIFO between the MAC and DMA otherwise it will overflow. So... I think this causes overflow there. I'm not sure what effect that has. Has wake-on-lan been tested? 3. xxx_start_rx() (called via stmmac_start_rx()) sets RE along with starting the requested queue. This is called in two places: 3.1. stmmac_test_flowctrl(). It looks to me like this needs the link to already be up, which means that RE must already be set. Note that the call to stmmac_stop_rx() does not clear RE. 3.2. stmmac_start_rx_dma() which has two callers. 3.2.1. stmmac_start_all_dma(). This is called from: 3.2.1.1. the end of stmmac_hw_setup(), which has already called stmmac_mac_set() which would've set the RE bit. Same arguments as for 1.2 for why this is wrong. 3.2.2. the end of stmmac_enable_rx_queue(), which is called from stmmac_xdp_enable_pool(). Very similar to 3.1 above. TE seems to be a very similar story with the exception of xxx_pmt(), except the functions are "tx" rather than "rx". So, my plan with regard to RE is: 1. Add phylink_start() and phylink_stop() into stmmac_xdp_open() and stmmac_xdp_release() to the link is properly managed. 2. Remove setting RE from xxx_start_rx(). 3. Remove stmmac_mac_set(, true) from everywhere except stmmac_mac_link_up(). 4. Remove stmmac_mac_set(, false) from stmmac_release(). This has already called phylink_stop(), which will have synchronously called stmmac_mac_link_down(). 5. With (1) addressed stmmac_mac_set(, false) and netif_carrier_off() both become redundant in stmmac_xdp_release(). 6. Remove stmmac_stop_all_dma() and stmmac_mac_set(, false) in stmmac_dvr_remove(). Not only is this inherently racy (the network device is *still* *published* *to* *userspace* here, but one of the jobs unregister_netdev() does is to take the network device down before unregistering - in other words, .ndo_stop() will have been called and completed before this returns, which will have the effect of calling stmmac_mac_link_down() (which will call stmmac_mac_set(, false), and stmmac_release() will have already called stmmac_stop_all_dma(). (6) is a frequent programming mistake that I see in lots of network drivers. It seems most people assume that unregister_netdev() does nothing, and they need to manually take stop the MAC and take down the link. :( I'm going to leave xxx_pmt() alone for the moment as although that's questionable, I think feedback from people using WoL is necessary.
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index c2913f003fe6..d6e492f523f5 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3493,9 +3493,6 @@ static int stmmac_hw_setup(struct net_device *dev, bool ptp_register) priv->hw->rx_csum = 0; } - /* Enable the MAC Rx/Tx */ - stmmac_mac_set(priv, priv->ioaddr, true); - /* Set the HW DMA mode and the COE */ stmmac_dma_operation_mode(priv);