diff mbox series

[v3,net,3/3] net: enetc: disable IRQ after Rx and Tx BD rings are disabled

Message ID 20241009090327.146461-4-wei.fang@nxp.com (mailing list archive)
State Superseded
Headers show
Series net: enetc: fix some issues of XDP | expand

Commit Message

Wei Fang Oct. 9, 2024, 9:03 a.m. UTC
When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
on LS1028A, it was found that if the command was re-run multiple times,
Rx could not receive the frames, and the result of xdo-bench showed
that the rx rate was 0.

root@ls1028ardb:~# ./xdp-bench tx eno0
Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
Summary                      2046 rx/s                  0 err,drop/s
Summary                         0 rx/s                  0 err,drop/s
Summary                         0 rx/s                  0 err,drop/s
Summary                         0 rx/s                  0 err,drop/s

By observing the Rx PIR and CIR registers, we found that CIR is always
equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring
is full and can no longer accommodate other Rx frames. Therefore, we
can conclude that the problem is caused by the Rx BD ring not being
cleaned up.

Further analysis of the code revealed that the Rx BD ring will only
be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
Therefore, some debug logs were added to the driver and the current
values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
BD ring was full. The logs are as follows.

[  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
[  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
[  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110

From the results, we can see that the max value of xdp_tx_in_flight
has reached 2140. However, the size of the Rx BD ring is only 2048.
This is incredible, so we checked the code again and found that
xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled
and it was not reset when the bfp program was installed again. The
root cause is that the IRQ is disabled too early in enetc_stop(),
resulting in enetc_recycle_xdp_tx_buff() not being called, therefore,
xdp_tx_in_flight is not cleared.

Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
v2 changes:
1. Modify the titile and rephrase the commit meesage.
2. Use the new solution as described in the title
v3 changes:
no changes.
---
 drivers/net/ethernet/freescale/enetc/enetc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Vladimir Oltean Oct. 9, 2024, 12:09 p.m. UTC | #1
On Wed, Oct 09, 2024 at 05:03:27PM +0800, Wei Fang wrote:
> When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
> on LS1028A, it was found that if the command was re-run multiple times,
> Rx could not receive the frames, and the result of xdo-bench showed

xdp-bench

> that the rx rate was 0.
> 
> root@ls1028ardb:~# ./xdp-bench tx eno0
> Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
> Summary                      2046 rx/s                  0 err,drop/s
> Summary                         0 rx/s                  0 err,drop/s
> Summary                         0 rx/s                  0 err,drop/s
> Summary                         0 rx/s                  0 err,drop/s
> 
> By observing the Rx PIR and CIR registers, we found that CIR is always
> equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring
> is full and can no longer accommodate other Rx frames. Therefore, we
> can conclude that the problem is caused by the Rx BD ring not being
> cleaned up.
> 
> Further analysis of the code revealed that the Rx BD ring will only
> be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
> Therefore, some debug logs were added to the driver and the current
> values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
> BD ring was full. The logs are as follows.
> 
> [  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
> [  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
> [  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
> 
> From the results, we can see that the max value of xdp_tx_in_flight
> has reached 2140. However, the size of the Rx BD ring is only 2048.
> This is incredible, so we checked the code again and found that
> xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled
> and it was not reset when the bfp program was installed again.

Please make it clear that this is more general and it happens whenever
enetc_stop() is called.

> The root cause is that the IRQ is disabled too early in enetc_stop(),
> resulting in enetc_recycle_xdp_tx_buff() not being called, therefore,
> xdp_tx_in_flight is not cleared.

I feel that the problem is not so much the IRQ, as the NAPI (softirq),
really. Under heavy traffic we don't even get that many hardirqs (if any),
but NAPI just reschedules itself because of the budget which constantly
gets exceeded. Please make this also clear in the commit title,
something like "net: enetc: disable NAPI only after TX rings are empty".

I would restate the problem as: "The root cause is that we disable NAPI
too aggressively, without having waited for the pending XDP_TX frames to
be transmitted, and their buffers recycled, so that the xdp_tx_in_flight
counter can naturally drop to zero. Later, enetc_free_tx_ring() does
free those stale, untransmitted XDP_TX packets, but it is not coded up
to also reset the xdp_tx_in_flight counter, hence the manifestation of
the bug."

And then we should have a paragraph that describes the solution as well.
"One option would be to cover this extra condition in enetc_free_tx_ring(),
but now that the ENETC_TX_DOWN exists, we have created a window at the
beginning of enetc_stop() where NAPI can still be scheduled, but any
concurrent enqueue will be blocked. Therefore, we can call enetc_wait_bdrs()
and enetc_disable_tx_bdrs() with NAPI still scheduled, and it is
guaranteed that this will not wait indefinitely, but instead give us an
indication that the pending TX frames have orderly dropped to zero.
Only then should we call napi_disable().

This way, enetc_free_tx_ring() becomes entirely redundant and can be
dropped as part of subsequent cleanup.

The change also refactors enetc_start() so that it looks like the mirror
opposite procedure of enetc_stop()."

I think describing the problem and solution in these terms gives the
reviewers more versed in NAPI a better chance of understanding what is
going on and what we are trying to achieve.
Wei Fang Oct. 9, 2024, 12:56 p.m. UTC | #2
> -----Original Message-----
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> Sent: 2024年10月9日 20:10
> To: Wei Fang <wei.fang@nxp.com>
> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; Claudiu Manoil <claudiu.manoil@nxp.com>;
> ast@kernel.org; daniel@iogearbox.net; hawk@kernel.org;
> john.fastabend@gmail.com; linux-kernel@vger.kernel.org;
> netdev@vger.kernel.org; bpf@vger.kernel.org; stable@vger.kernel.org;
> imx@lists.linux.dev; rkannoth@marvell.com; maciej.fijalkowski@intel.com;
> sbhatta@marvell.com
> Subject: Re: [PATCH v3 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings
> are disabled
> 
> On Wed, Oct 09, 2024 at 05:03:27PM +0800, Wei Fang wrote:
> > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
> > on LS1028A, it was found that if the command was re-run multiple times,
> > Rx could not receive the frames, and the result of xdo-bench showed
> 
> xdp-bench
> 
> > that the rx rate was 0.
> >
> > root@ls1028ardb:~# ./xdp-bench tx eno0
> > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
> > Summary                      2046 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> >
> > By observing the Rx PIR and CIR registers, we found that CIR is always
> > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring
> > is full and can no longer accommodate other Rx frames. Therefore, we
> > can conclude that the problem is caused by the Rx BD ring not being
> > cleaned up.
> >
> > Further analysis of the code revealed that the Rx BD ring will only
> > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
> > Therefore, some debug logs were added to the driver and the current
> > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
> > BD ring was full. The logs are as follows.
> >
> > [  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
> > [  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
> > [  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
> >
> > From the results, we can see that the max value of xdp_tx_in_flight
> > has reached 2140. However, the size of the Rx BD ring is only 2048.
> > This is incredible, so we checked the code again and found that
> > xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled
> > and it was not reset when the bfp program was installed again.
> 
> Please make it clear that this is more general and it happens whenever
> enetc_stop() is called.
> 
> > The root cause is that the IRQ is disabled too early in enetc_stop(),
> > resulting in enetc_recycle_xdp_tx_buff() not being called, therefore,
> > xdp_tx_in_flight is not cleared.
> 
> I feel that the problem is not so much the IRQ, as the NAPI (softirq),
> really. Under heavy traffic we don't even get that many hardirqs (if any),
> but NAPI just reschedules itself because of the budget which constantly
> gets exceeded. Please make this also clear in the commit title,
> something like "net: enetc: disable NAPI only after TX rings are empty".
> 
> I would restate the problem as: "The root cause is that we disable NAPI
> too aggressively, without having waited for the pending XDP_TX frames to
> be transmitted, and their buffers recycled, so that the xdp_tx_in_flight
> counter can naturally drop to zero. Later, enetc_free_tx_ring() does
> free those stale, untransmitted XDP_TX packets, but it is not coded up
> to also reset the xdp_tx_in_flight counter, hence the manifestation of
> the bug."
> 
> And then we should have a paragraph that describes the solution as well.
> "One option would be to cover this extra condition in enetc_free_tx_ring(),
> but now that the ENETC_TX_DOWN exists, we have created a window at the
> beginning of enetc_stop() where NAPI can still be scheduled, but any
> concurrent enqueue will be blocked. Therefore, we can call enetc_wait_bdrs()
> and enetc_disable_tx_bdrs() with NAPI still scheduled, and it is
> guaranteed that this will not wait indefinitely, but instead give us an
> indication that the pending TX frames have orderly dropped to zero.
> Only then should we call napi_disable().
> 
> This way, enetc_free_tx_ring() becomes entirely redundant and can be
> dropped as part of subsequent cleanup.
> 
> The change also refactors enetc_start() so that it looks like the mirror
> opposite procedure of enetc_stop()."
> 
> I think describing the problem and solution in these terms gives the
> reviewers more versed in NAPI a better chance of understanding what is
> going on and what we are trying to achieve.

Thanks for helping rephrase the commit message, I will applying it to
the next version.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 52da10f62430..c09370eab319 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2474,8 +2474,6 @@  void enetc_start(struct net_device *ndev)
 
 	enetc_setup_interrupts(priv);
 
-	enetc_enable_tx_bdrs(priv);
-
 	for (i = 0; i < priv->bdr_int_num; i++) {
 		int irq = pci_irq_vector(priv->si->pdev,
 					 ENETC_BDR_INT_BASE_IDX + i);
@@ -2484,6 +2482,8 @@  void enetc_start(struct net_device *ndev)
 		enable_irq(irq);
 	}
 
+	enetc_enable_tx_bdrs(priv);
+
 	enetc_enable_rx_bdrs(priv);
 
 	netif_tx_start_all_queues(ndev);
@@ -2552,6 +2552,10 @@  void enetc_stop(struct net_device *ndev)
 
 	enetc_disable_rx_bdrs(priv);
 
+	enetc_wait_bdrs(priv);
+
+	enetc_disable_tx_bdrs(priv);
+
 	for (i = 0; i < priv->bdr_int_num; i++) {
 		int irq = pci_irq_vector(priv->si->pdev,
 					 ENETC_BDR_INT_BASE_IDX + i);
@@ -2561,10 +2565,6 @@  void enetc_stop(struct net_device *ndev)
 		napi_disable(&priv->int_vector[i]->napi);
 	}
 
-	enetc_wait_bdrs(priv);
-
-	enetc_disable_tx_bdrs(priv);
-
 	enetc_clear_interrupts(priv);
 }
 EXPORT_SYMBOL_GPL(enetc_stop);