diff mbox series

[RFC,net-next,1/5] net: stmmac: call phylink_start() and phylink_stop() in XDP functions

Message ID E1tnfRe-0057S9-6W@rmk-PC.armlinux.org.uk (mailing list archive)
State New
Headers show
Series net: stmmac: fix setting RE and TE inappropriately | expand

Commit Message

Russell King (Oracle) Feb. 27, 2025, 3:05 p.m. UTC
Phylink does not permit drivers to mess with the netif carrier, as
this will de-synchronise phylink with the MAC driver. Moreover,
setting and clearing the TE and RE bits via stmmac_mac_set() in this
path is also wrong as the link may not be up.

Replace the netif_carrier_on(), netif_carrier_off() and
stmmac_mac_set() calls with the appropriate phylink_start() and
phylink_stop() calls, thereby allowing phylink to manage the netif
carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
methods.

Note that RE should only be set after the DMA is ready to avoid the
receive FIFO between the MAC and DMA blocks overflowing, so
phylink_start() needs to be placed after DMA has been started.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

Comments

Andrew Lunn Feb. 27, 2025, 10:27 p.m. UTC | #1
On Thu, Feb 27, 2025 at 03:05:02PM +0000, Russell King (Oracle) wrote:
> Phylink does not permit drivers to mess with the netif carrier, as
> this will de-synchronise phylink with the MAC driver. Moreover,
> setting and clearing the TE and RE bits via stmmac_mac_set() in this
> path is also wrong as the link may not be up.
> 
> Replace the netif_carrier_on(), netif_carrier_off() and
> stmmac_mac_set() calls with the appropriate phylink_start() and
> phylink_stop() calls, thereby allowing phylink to manage the netif
> carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
> methods.
> 
> Note that RE should only be set after the DMA is ready to avoid the
> receive FIFO between the MAC and DMA blocks overflowing, so
> phylink_start() needs to be placed after DMA has been started.

Sorry, i don't know enough about XDP to review this :-(

	Andrew
Russell King (Oracle) Feb. 28, 2025, 12:02 a.m. UTC | #2
On Thu, Feb 27, 2025 at 11:27:57PM +0100, Andrew Lunn wrote:
> On Thu, Feb 27, 2025 at 03:05:02PM +0000, Russell King (Oracle) wrote:
> > Phylink does not permit drivers to mess with the netif carrier, as
> > this will de-synchronise phylink with the MAC driver. Moreover,
> > setting and clearing the TE and RE bits via stmmac_mac_set() in this
> > path is also wrong as the link may not be up.
> > 
> > Replace the netif_carrier_on(), netif_carrier_off() and
> > stmmac_mac_set() calls with the appropriate phylink_start() and
> > phylink_stop() calls, thereby allowing phylink to manage the netif
> > carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
> > methods.
> > 
> > Note that RE should only be set after the DMA is ready to avoid the
> > receive FIFO between the MAC and DMA blocks overflowing, so
> > phylink_start() needs to be placed after DMA has been started.
> 
> Sorry, i don't know enough about XDP to review this :-(

I suspect there aren't many people who could review it.

However, delving into the history, it seems that this commit was
responsible for introducing stmmac_xdp_{open,release}, thus
intrducing the fidding of the netif carrier which is prohibited
by phylink:

commit ac746c8520d9d056b6963ecca8ff1da9929d02f1
Author: Ong Boon Leong <boon.leong.ong@intel.com>
Date:   Thu Nov 11 22:39:49 2021 +0800

    net: stmmac: enhance XDP ZC driver level switching performance

but that commit was wrong for this very reason.

Didn't phylib used to not renegotiate the link if nothing changed
across a phy_stop()..phy_start() ?

I'm wondering whether my commit is in essence reverting this commit.
Furong Xu Feb. 28, 2025, 7:31 a.m. UTC | #3
On Thu, 27 Feb 2025 15:05:02 +0000
"Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> wrote:

> Phylink does not permit drivers to mess with the netif carrier, as
> this will de-synchronise phylink with the MAC driver. Moreover,
> setting and clearing the TE and RE bits via stmmac_mac_set() in this
> path is also wrong as the link may not be up.
> 
> Replace the netif_carrier_on(), netif_carrier_off() and
> stmmac_mac_set() calls with the appropriate phylink_start() and
> phylink_stop() calls, thereby allowing phylink to manage the netif
> carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
> methods.
> 
> Note that RE should only be set after the DMA is ready to avoid the
> receive FIFO between the MAC and DMA blocks overflowing, so
> phylink_start() needs to be placed after DMA has been started.
> 
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 13796b1c8d7c..84d8b1c9f6d4 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -6887,6 +6887,8 @@ void stmmac_xdp_release(struct net_device *dev)
>  	/* Ensure tx function is not running */
>  	netif_tx_disable(dev);
>  
> +	phylink_stop(priv->phylink);
> +
>  	/* Disable NAPI process */
>  	stmmac_disable_all_queues(priv);
>  
> @@ -6902,14 +6904,10 @@ void stmmac_xdp_release(struct net_device *dev)
>  	/* Release and free the Rx/Tx resources */
>  	free_dma_desc_resources(priv, &priv->dma_conf);
>  
> -	/* Disable the MAC Rx/Tx */
> -	stmmac_mac_set(priv, priv->ioaddr, false);
> -
>  	/* set trans_start so we don't get spurious
>  	 * watchdogs during reset
>  	 */
>  	netif_trans_update(dev);
> -	netif_carrier_off(dev);
>  }
>  
>  int stmmac_xdp_open(struct net_device *dev)
> @@ -6992,25 +6990,25 @@ int stmmac_xdp_open(struct net_device *dev)
>  		tx_q->txtimer.function = stmmac_tx_timer;
>  	}
>  
> -	/* Enable the MAC Rx/Tx */
> -	stmmac_mac_set(priv, priv->ioaddr, true);
> -
>  	/* Start Rx & Tx DMA Channels */
>  	stmmac_start_all_dma(priv);
>  
> +	phylink_start(priv->phylink);
> +
>  	ret = stmmac_request_irq(dev);
>  	if (ret)
>  		goto irq_error;
>  
>  	/* Enable NAPI process*/
>  	stmmac_enable_all_queues(priv);
> -	netif_carrier_on(dev);
>  	netif_tx_start_all_queues(dev);
>  	stmmac_enable_all_dma_irq(priv);
>  
>  	return 0;
>  
>  irq_error:
> +	phylink_stop(priv->phylink);
> +
>  	for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++)
>  		hrtimer_cancel(&priv->dma_conf.tx_queue[chan].txtimer);
>  

XDP programs work like a charm both before and after this patch.

Tested-by: Furong Xu <0x1207@gmail.com>
Andrew Lunn Feb. 28, 2025, 1:14 p.m. UTC | #4
On Fri, Feb 28, 2025 at 03:31:22PM +0800, Furong Xu wrote:
> On Thu, 27 Feb 2025 15:05:02 +0000
> "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> wrote:
> 
> > Phylink does not permit drivers to mess with the netif carrier, as
> > this will de-synchronise phylink with the MAC driver. Moreover,
> > setting and clearing the TE and RE bits via stmmac_mac_set() in this
> > path is also wrong as the link may not be up.
> > 
> > Replace the netif_carrier_on(), netif_carrier_off() and
> > stmmac_mac_set() calls with the appropriate phylink_start() and
> > phylink_stop() calls, thereby allowing phylink to manage the netif
> > carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
> > methods.
> > 
> > Note that RE should only be set after the DMA is ready to avoid the
> > receive FIFO between the MAC and DMA blocks overflowing, so
> > phylink_start() needs to be placed after DMA has been started.
> > 
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
 
> XDP programs work like a charm both before and after this patch.
> 
> Tested-by: Furong Xu <0x1207@gmail.com>

Thanks for testing this.

Could you give a little details of how you actually tested it?

The issues here is, when is the link set admin up, requiring that
phylink triggers an autoneg etc.

For plain old TCP/IP, you at some point use:

ip link set eth42 up

which will cause the core to call the drivers open() method, which
then triggers phylnk.

The carrier manipulation which this code replaces seems to suggest you
can load an XDP program while the interface is admin down, and that
action of loading the program will implicitly set the carrier up.

Did you test loading an XDP program on an interface which is admin
down?

	Andrew
Russell King (Oracle) Feb. 28, 2025, 2:38 p.m. UTC | #5
On Fri, Feb 28, 2025 at 02:14:29PM +0100, Andrew Lunn wrote:
> On Fri, Feb 28, 2025 at 03:31:22PM +0800, Furong Xu wrote:
> > On Thu, 27 Feb 2025 15:05:02 +0000
> > "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> wrote:
> > 
> > > Phylink does not permit drivers to mess with the netif carrier, as
> > > this will de-synchronise phylink with the MAC driver. Moreover,
> > > setting and clearing the TE and RE bits via stmmac_mac_set() in this
> > > path is also wrong as the link may not be up.
> > > 
> > > Replace the netif_carrier_on(), netif_carrier_off() and
> > > stmmac_mac_set() calls with the appropriate phylink_start() and
> > > phylink_stop() calls, thereby allowing phylink to manage the netif
> > > carrier and TE/RE bits through the .mac_link_up() and .mac_link_down()
> > > methods.
> > > 
> > > Note that RE should only be set after the DMA is ready to avoid the
> > > receive FIFO between the MAC and DMA blocks overflowing, so
> > > phylink_start() needs to be placed after DMA has been started.
> > > 
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>  
> > XDP programs work like a charm both before and after this patch.
> > 
> > Tested-by: Furong Xu <0x1207@gmail.com>
> 
> Thanks for testing this.
> 
> Could you give a little details of how you actually tested it?
> 
> The issues here is, when is the link set admin up, requiring that
> phylink triggers an autoneg etc.
> 
> For plain old TCP/IP, you at some point use:
> 
> ip link set eth42 up
> 
> which will cause the core to call the drivers open() method, which
> then triggers phylnk.
> 
> The carrier manipulation which this code replaces seems to suggest you
> can load an XDP program while the interface is admin down, and that
> action of loading the program will implicitly set the carrier up.

It won't. See
drivers/net/ethernet/stmicro/stmmac/stmmac_xdp.c::stmmac_xdp_set_prog():

        if_running = netif_running(dev);

        need_update = !!priv->xdp_prog != !!prog;

        if (if_running && need_update)
                stmmac_xdp_release(dev);

	...

        if (if_running && need_update)
                stmmac_xdp_open(dev);

stmmac_xdp_open() and stmmac_xdp_release() will only be called if
netif_running() returns true as explained yesterday (but maybe
without enough detail). This tests __LINK_STATE_START in dev->state,
which is set just before calling .ndo_open(), and cleared if it fails
or before calling .ndo_stop().

Thus, netif_running() indicates whether the net core thinks the
device is adminsitratively "up".

Therefore, in the old code, if the programme is changed while the
device is administratively down, netif_running() will return false,
if_running will be false, and neither stmmac_xdp_release() nor
stmmac_xdp_open() will be called.

Hope that addresses your concern.

Thanks.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 13796b1c8d7c..84d8b1c9f6d4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -6887,6 +6887,8 @@  void stmmac_xdp_release(struct net_device *dev)
 	/* Ensure tx function is not running */
 	netif_tx_disable(dev);
 
+	phylink_stop(priv->phylink);
+
 	/* Disable NAPI process */
 	stmmac_disable_all_queues(priv);
 
@@ -6902,14 +6904,10 @@  void stmmac_xdp_release(struct net_device *dev)
 	/* Release and free the Rx/Tx resources */
 	free_dma_desc_resources(priv, &priv->dma_conf);
 
-	/* Disable the MAC Rx/Tx */
-	stmmac_mac_set(priv, priv->ioaddr, false);
-
 	/* set trans_start so we don't get spurious
 	 * watchdogs during reset
 	 */
 	netif_trans_update(dev);
-	netif_carrier_off(dev);
 }
 
 int stmmac_xdp_open(struct net_device *dev)
@@ -6992,25 +6990,25 @@  int stmmac_xdp_open(struct net_device *dev)
 		tx_q->txtimer.function = stmmac_tx_timer;
 	}
 
-	/* Enable the MAC Rx/Tx */
-	stmmac_mac_set(priv, priv->ioaddr, true);
-
 	/* Start Rx & Tx DMA Channels */
 	stmmac_start_all_dma(priv);
 
+	phylink_start(priv->phylink);
+
 	ret = stmmac_request_irq(dev);
 	if (ret)
 		goto irq_error;
 
 	/* Enable NAPI process*/
 	stmmac_enable_all_queues(priv);
-	netif_carrier_on(dev);
 	netif_tx_start_all_queues(dev);
 	stmmac_enable_all_dma_irq(priv);
 
 	return 0;
 
 irq_error:
+	phylink_stop(priv->phylink);
+
 	for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++)
 		hrtimer_cancel(&priv->dma_conf.tx_queue[chan].txtimer);