diff mbox series

[net,2/3] net: enetc: fix the issues of XDP_REDIRECT feature

Message ID 20240919084104.661180-3-wei.fang@nxp.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: enetc: fix some issues of XDP | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 11 of 11 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 116 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-09-20--15-00 (tests: 764)

Commit Message

Wei Fang Sept. 19, 2024, 8:41 a.m. UTC
When testing the XDP_REDIRECT function on the LS1028A platform, we
found a very reproducible issue that the Tx frames can no longer be
sent out even if XDP_REDIRECT is turned off. Specifically, if there
is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on,
the console may display some warnings like "timeout for tx ring #6
clear", and all redirected frames will be dropped, the detaild log
is as follows.

root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2
Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc)
[203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear
[204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
[204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
eno0->eno2     1420505 rx/s       1420590 err,drop/s      0 xmit/s
  xmit eno0->eno2    0 xmit/s     1420590 drop/s     0 drv_err/s     15.71 bulk-avg
eno0->eno2     1420484 rx/s       1420485 err,drop/s      0 xmit/s
  xmit eno0->eno2    0 xmit/s     1420485 drop/s     0 drv_err/s     15.71 bulk-avg

By analyzing the XDP_REDIRECT implementation of enetc driver, we
found two problems. First, enetc driver will reconfigure Tx and
Rx BD rings when a bpf program is installed or uninstalled, but
there is no mechanisms to block the redirected frames when enetc
driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to
prevent the redirected frames to be attached to Tx BD rings.

Second, Tx BD rings are disabled first in enetc_stop() and then
wait for empty. This operation is not safe while the Tx BD ring
is actively transmitting frames, and will cause the ring to not
be empty and hardware exception. As described in the block guide
of LS1028A NETC, software should only disable an active ring after
all pending ring entries have been consumed (i.e. when PI = CI).
Disabling a transmit ring that is actively processing BDs risks
a HW-SW race hazard whereby a hardware resource becomes assigned
to work on one or more ring entries only to have those entries be
removed due to the ring becoming disabled. So the correct behavior
is that the software stops putting frames on the Tx BD rings (this
is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be
empty, and finally disables the Tx BD rings.

Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
 drivers/net/ethernet/freescale/enetc/enetc.c | 43 ++++++++++++++++----
 drivers/net/ethernet/freescale/enetc/enetc.h |  1 +
 2 files changed, 35 insertions(+), 9 deletions(-)

Comments

Maciej Fijalkowski Sept. 20, 2024, 12:50 p.m. UTC | #1
On Thu, Sep 19, 2024 at 04:41:03PM +0800, Wei Fang wrote:
> When testing the XDP_REDIRECT function on the LS1028A platform, we
> found a very reproducible issue that the Tx frames can no longer be
> sent out even if XDP_REDIRECT is turned off. Specifically, if there
> is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on,
> the console may display some warnings like "timeout for tx ring #6
> clear", and all redirected frames will be dropped, the detaild log
> is as follows.
> 
> root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2
> Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc)
> [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear
> [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
> [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
> eno0->eno2     1420505 rx/s       1420590 err,drop/s      0 xmit/s
>   xmit eno0->eno2    0 xmit/s     1420590 drop/s     0 drv_err/s     15.71 bulk-avg
> eno0->eno2     1420484 rx/s       1420485 err,drop/s      0 xmit/s
>   xmit eno0->eno2    0 xmit/s     1420485 drop/s     0 drv_err/s     15.71 bulk-avg
> 
> By analyzing the XDP_REDIRECT implementation of enetc driver, we
> found two problems. First, enetc driver will reconfigure Tx and
> Rx BD rings when a bpf program is installed or uninstalled, but
> there is no mechanisms to block the redirected frames when enetc
> driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to
> prevent the redirected frames to be attached to Tx BD rings.
> 
> Second, Tx BD rings are disabled first in enetc_stop() and then
> wait for empty. This operation is not safe while the Tx BD ring
> is actively transmitting frames, and will cause the ring to not
> be empty and hardware exception. As described in the block guide
> of LS1028A NETC, software should only disable an active ring after
> all pending ring entries have been consumed (i.e. when PI = CI).
> Disabling a transmit ring that is actively processing BDs risks
> a HW-SW race hazard whereby a hardware resource becomes assigned
> to work on one or more ring entries only to have those entries be
> removed due to the ring becoming disabled. So the correct behavior
> is that the software stops putting frames on the Tx BD rings (this
> is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be
> empty, and finally disables the Tx BD rings.
> 
> Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Wei Fang <wei.fang@nxp.com>

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> ---
>  drivers/net/ethernet/freescale/enetc/enetc.c | 43 ++++++++++++++++----
>  drivers/net/ethernet/freescale/enetc/enetc.h |  1 +
>  2 files changed, 35 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
> index 56e59721ec7d..5830c046cb7d 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -902,6 +902,7 @@ static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
>  
>  	if (unlikely(tx_frm_cnt && netif_carrier_ok(ndev) &&
>  		     __netif_subqueue_stopped(ndev, tx_ring->index) &&
> +		     !test_bit(ENETC_TX_DOWN, &priv->flags) &&
>  		     (enetc_bd_unused(tx_ring) >= ENETC_TXBDS_MAX_NEEDED))) {
>  		netif_wake_subqueue(ndev, tx_ring->index);
>  	}
> @@ -1377,6 +1378,9 @@ int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
>  	int xdp_tx_bd_cnt, i, k;
>  	int xdp_tx_frm_cnt = 0;
>  
> +	if (unlikely(test_bit(ENETC_TX_DOWN, &priv->flags)))
> +		return -ENETDOWN;
> +
>  	enetc_lock_mdio();
>  
>  	tx_ring = priv->xdp_tx_ring[smp_processor_id()];
> @@ -2223,18 +2227,24 @@ static void enetc_enable_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
>  	enetc_rxbdr_wr(hw, idx, ENETC_RBMR, rbmr);
>  }
>  
> -static void enetc_enable_bdrs(struct enetc_ndev_priv *priv)
> +static void enetc_enable_rx_bdrs(struct enetc_ndev_priv *priv)
>  {
>  	struct enetc_hw *hw = &priv->si->hw;
>  	int i;
>  
> -	for (i = 0; i < priv->num_tx_rings; i++)
> -		enetc_enable_txbdr(hw, priv->tx_ring[i]);
> -
>  	for (i = 0; i < priv->num_rx_rings; i++)
>  		enetc_enable_rxbdr(hw, priv->rx_ring[i]);
>  }
>  
> +static void enetc_enable_tx_bdrs(struct enetc_ndev_priv *priv)
> +{
> +	struct enetc_hw *hw = &priv->si->hw;
> +	int i;
> +
> +	for (i = 0; i < priv->num_tx_rings; i++)
> +		enetc_enable_txbdr(hw, priv->tx_ring[i]);
> +}
> +
>  static void enetc_disable_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
>  {
>  	int idx = rx_ring->index;
> @@ -2251,7 +2261,16 @@ static void enetc_disable_txbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
>  	enetc_txbdr_wr(hw, idx, ENETC_TBMR, 0);
>  }
>  
> -static void enetc_disable_bdrs(struct enetc_ndev_priv *priv)
> +static void enetc_disable_rx_bdrs(struct enetc_ndev_priv *priv)
> +{
> +	struct enetc_hw *hw = &priv->si->hw;
> +	int i;
> +
> +	for (i = 0; i < priv->num_rx_rings; i++)
> +		enetc_disable_rxbdr(hw, priv->rx_ring[i]);
> +}
> +
> +static void enetc_disable_tx_bdrs(struct enetc_ndev_priv *priv)
>  {
>  	struct enetc_hw *hw = &priv->si->hw;
>  	int i;
> @@ -2259,8 +2278,6 @@ static void enetc_disable_bdrs(struct enetc_ndev_priv *priv)
>  	for (i = 0; i < priv->num_tx_rings; i++)
>  		enetc_disable_txbdr(hw, priv->tx_ring[i]);
>  
> -	for (i = 0; i < priv->num_rx_rings; i++)
> -		enetc_disable_rxbdr(hw, priv->rx_ring[i]);
>  }
>  
>  static void enetc_wait_txbdr(struct enetc_hw *hw, struct enetc_bdr *tx_ring)
> @@ -2452,6 +2469,8 @@ void enetc_start(struct net_device *ndev)
>  
>  	enetc_setup_interrupts(priv);
>  
> +	enetc_enable_tx_bdrs(priv);
> +
>  	for (i = 0; i < priv->bdr_int_num; i++) {
>  		int irq = pci_irq_vector(priv->si->pdev,
>  					 ENETC_BDR_INT_BASE_IDX + i);
> @@ -2460,9 +2479,11 @@ void enetc_start(struct net_device *ndev)
>  		enable_irq(irq);
>  	}
>  
> -	enetc_enable_bdrs(priv);
> +	enetc_enable_rx_bdrs(priv);
>  
>  	netif_tx_start_all_queues(ndev);
> +
> +	clear_bit(ENETC_TX_DOWN, &priv->flags);
>  }
>  EXPORT_SYMBOL_GPL(enetc_start);
>  
> @@ -2520,9 +2541,11 @@ void enetc_stop(struct net_device *ndev)
>  	struct enetc_ndev_priv *priv = netdev_priv(ndev);
>  	int i;
>  
> +	set_bit(ENETC_TX_DOWN, &priv->flags);
> +
>  	netif_tx_stop_all_queues(ndev);
>  
> -	enetc_disable_bdrs(priv);
> +	enetc_disable_rx_bdrs(priv);
>  
>  	for (i = 0; i < priv->bdr_int_num; i++) {
>  		int irq = pci_irq_vector(priv->si->pdev,
> @@ -2535,6 +2558,8 @@ void enetc_stop(struct net_device *ndev)
>  
>  	enetc_wait_bdrs(priv);
>  
> +	enetc_disable_tx_bdrs(priv);
> +
>  	enetc_clear_interrupts(priv);
>  }
>  EXPORT_SYMBOL_GPL(enetc_stop);
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
> index 97524dfa234c..fb7d98d57783 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> @@ -325,6 +325,7 @@ enum enetc_active_offloads {
>  
>  enum enetc_flags_bit {
>  	ENETC_TX_ONESTEP_TSTAMP_IN_PROGRESS = 0,
> +	ENETC_TX_DOWN,
>  };
>  
>  /* interrupt coalescing modes */
> -- 
> 2.34.1
> 
>
Vladimir Oltean Sept. 27, 2024, 2:41 p.m. UTC | #2
On Thu, Sep 19, 2024 at 04:41:03PM +0800, Wei Fang wrote:
> @@ -2251,7 +2261,16 @@ static void enetc_disable_txbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
>  	enetc_txbdr_wr(hw, idx, ENETC_TBMR, 0);
>  }
>  
> -static void enetc_disable_bdrs(struct enetc_ndev_priv *priv)
> +static void enetc_disable_rx_bdrs(struct enetc_ndev_priv *priv)
> +{
> +	struct enetc_hw *hw = &priv->si->hw;
> +	int i;
> +
> +	for (i = 0; i < priv->num_rx_rings; i++)
> +		enetc_disable_rxbdr(hw, priv->rx_ring[i]);
> +}
> +
> +static void enetc_disable_tx_bdrs(struct enetc_ndev_priv *priv)
>  {
>  	struct enetc_hw *hw = &priv->si->hw;
>  	int i;
> @@ -2259,8 +2278,6 @@ static void enetc_disable_bdrs(struct enetc_ndev_priv *priv)
>  	for (i = 0; i < priv->num_tx_rings; i++)
>  		enetc_disable_txbdr(hw, priv->tx_ring[i]);
>  

Please do not leave a blank line here. In the git tree after applying
this patch, that blank line appears at the end of enetc_disable_tx_bdrs().

> -	for (i = 0; i < priv->num_rx_rings; i++)
> -		enetc_disable_rxbdr(hw, priv->rx_ring[i]);
>  }
Wei Fang Sept. 29, 2024, 1:35 a.m. UTC | #3
> -----Original Message-----
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> Sent: 2024年9月27日 22:42
> To: Wei Fang <wei.fang@nxp.com>
> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; Claudiu Manoil <claudiu.manoil@nxp.com>;
> ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org;
> john.fastabend@gmail.com; linux-kernel@vger.kernel.org;
> netdev@vger.kernel.org; bpf@vger.kernel.org; stable@vger.kernel.org;
> imx@lists.linux.dev
> Subject: Re: [PATCH net 2/3] net: enetc: fix the issues of XDP_REDIRECT feature
> 
> On Thu, Sep 19, 2024 at 04:41:03PM +0800, Wei Fang wrote:
> > @@ -2251,7 +2261,16 @@ static void enetc_disable_txbdr(struct enetc_hw
> *hw, struct enetc_bdr *rx_ring)
> >  	enetc_txbdr_wr(hw, idx, ENETC_TBMR, 0);
> >  }
> >
> > -static void enetc_disable_bdrs(struct enetc_ndev_priv *priv)
> > +static void enetc_disable_rx_bdrs(struct enetc_ndev_priv *priv)
> > +{
> > +	struct enetc_hw *hw = &priv->si->hw;
> > +	int i;
> > +
> > +	for (i = 0; i < priv->num_rx_rings; i++)
> > +		enetc_disable_rxbdr(hw, priv->rx_ring[i]);
> > +}
> > +
> > +static void enetc_disable_tx_bdrs(struct enetc_ndev_priv *priv)
> >  {
> >  	struct enetc_hw *hw = &priv->si->hw;
> >  	int i;
> > @@ -2259,8 +2278,6 @@ static void enetc_disable_bdrs(struct
> enetc_ndev_priv *priv)
> >  	for (i = 0; i < priv->num_tx_rings; i++)
> >  		enetc_disable_txbdr(hw, priv->tx_ring[i]);
> >
> 
> Please do not leave a blank line here. In the git tree after applying
> this patch, that blank line appears at the end of enetc_disable_tx_bdrs().
> 
Thanks for reminder, it's weird that the checkpatch.pl did not raise a
warning here.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 56e59721ec7d..5830c046cb7d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -902,6 +902,7 @@  static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
 
 	if (unlikely(tx_frm_cnt && netif_carrier_ok(ndev) &&
 		     __netif_subqueue_stopped(ndev, tx_ring->index) &&
+		     !test_bit(ENETC_TX_DOWN, &priv->flags) &&
 		     (enetc_bd_unused(tx_ring) >= ENETC_TXBDS_MAX_NEEDED))) {
 		netif_wake_subqueue(ndev, tx_ring->index);
 	}
@@ -1377,6 +1378,9 @@  int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
 	int xdp_tx_bd_cnt, i, k;
 	int xdp_tx_frm_cnt = 0;
 
+	if (unlikely(test_bit(ENETC_TX_DOWN, &priv->flags)))
+		return -ENETDOWN;
+
 	enetc_lock_mdio();
 
 	tx_ring = priv->xdp_tx_ring[smp_processor_id()];
@@ -2223,18 +2227,24 @@  static void enetc_enable_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
 	enetc_rxbdr_wr(hw, idx, ENETC_RBMR, rbmr);
 }
 
-static void enetc_enable_bdrs(struct enetc_ndev_priv *priv)
+static void enetc_enable_rx_bdrs(struct enetc_ndev_priv *priv)
 {
 	struct enetc_hw *hw = &priv->si->hw;
 	int i;
 
-	for (i = 0; i < priv->num_tx_rings; i++)
-		enetc_enable_txbdr(hw, priv->tx_ring[i]);
-
 	for (i = 0; i < priv->num_rx_rings; i++)
 		enetc_enable_rxbdr(hw, priv->rx_ring[i]);
 }
 
+static void enetc_enable_tx_bdrs(struct enetc_ndev_priv *priv)
+{
+	struct enetc_hw *hw = &priv->si->hw;
+	int i;
+
+	for (i = 0; i < priv->num_tx_rings; i++)
+		enetc_enable_txbdr(hw, priv->tx_ring[i]);
+}
+
 static void enetc_disable_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
 {
 	int idx = rx_ring->index;
@@ -2251,7 +2261,16 @@  static void enetc_disable_txbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
 	enetc_txbdr_wr(hw, idx, ENETC_TBMR, 0);
 }
 
-static void enetc_disable_bdrs(struct enetc_ndev_priv *priv)
+static void enetc_disable_rx_bdrs(struct enetc_ndev_priv *priv)
+{
+	struct enetc_hw *hw = &priv->si->hw;
+	int i;
+
+	for (i = 0; i < priv->num_rx_rings; i++)
+		enetc_disable_rxbdr(hw, priv->rx_ring[i]);
+}
+
+static void enetc_disable_tx_bdrs(struct enetc_ndev_priv *priv)
 {
 	struct enetc_hw *hw = &priv->si->hw;
 	int i;
@@ -2259,8 +2278,6 @@  static void enetc_disable_bdrs(struct enetc_ndev_priv *priv)
 	for (i = 0; i < priv->num_tx_rings; i++)
 		enetc_disable_txbdr(hw, priv->tx_ring[i]);
 
-	for (i = 0; i < priv->num_rx_rings; i++)
-		enetc_disable_rxbdr(hw, priv->rx_ring[i]);
 }
 
 static void enetc_wait_txbdr(struct enetc_hw *hw, struct enetc_bdr *tx_ring)
@@ -2452,6 +2469,8 @@  void enetc_start(struct net_device *ndev)
 
 	enetc_setup_interrupts(priv);
 
+	enetc_enable_tx_bdrs(priv);
+
 	for (i = 0; i < priv->bdr_int_num; i++) {
 		int irq = pci_irq_vector(priv->si->pdev,
 					 ENETC_BDR_INT_BASE_IDX + i);
@@ -2460,9 +2479,11 @@  void enetc_start(struct net_device *ndev)
 		enable_irq(irq);
 	}
 
-	enetc_enable_bdrs(priv);
+	enetc_enable_rx_bdrs(priv);
 
 	netif_tx_start_all_queues(ndev);
+
+	clear_bit(ENETC_TX_DOWN, &priv->flags);
 }
 EXPORT_SYMBOL_GPL(enetc_start);
 
@@ -2520,9 +2541,11 @@  void enetc_stop(struct net_device *ndev)
 	struct enetc_ndev_priv *priv = netdev_priv(ndev);
 	int i;
 
+	set_bit(ENETC_TX_DOWN, &priv->flags);
+
 	netif_tx_stop_all_queues(ndev);
 
-	enetc_disable_bdrs(priv);
+	enetc_disable_rx_bdrs(priv);
 
 	for (i = 0; i < priv->bdr_int_num; i++) {
 		int irq = pci_irq_vector(priv->si->pdev,
@@ -2535,6 +2558,8 @@  void enetc_stop(struct net_device *ndev)
 
 	enetc_wait_bdrs(priv);
 
+	enetc_disable_tx_bdrs(priv);
+
 	enetc_clear_interrupts(priv);
 }
 EXPORT_SYMBOL_GPL(enetc_stop);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 97524dfa234c..fb7d98d57783 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -325,6 +325,7 @@  enum enetc_active_offloads {
 
 enum enetc_flags_bit {
 	ENETC_TX_ONESTEP_TSTAMP_IN_PROGRESS = 0,
+	ENETC_TX_DOWN,
 };
 
 /* interrupt coalescing modes */