diff mbox series

[net,v1] mlxbf_gige: fix receive packet race condition

Message ID 20220908202853.21725-1-davthompson@nvidia.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v1] mlxbf_gige: fix receive packet race condition | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

David Thompson Sept. 8, 2022, 8:28 p.m. UTC
Under heavy traffic, the BF2 Gigabit interface can
become unresponsive for periods of time (several minutes)
before eventually recovering.  This is due to a possible
race condition in the mlxbf_gige_rx_packet function, where
the function exits with producer and consumer indices equal
but there are remaining packet(s) to be processed. In order
to prevent this situation, disable receive DMA during the
processing of received packets.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jakub Kicinski Sept. 19, 2022, 9:17 p.m. UTC | #1
On Thu, 8 Sep 2022 16:28:53 -0400 David Thompson wrote:
> Under heavy traffic, the BF2 Gigabit interface can
> become unresponsive for periods of time (several minutes)
> before eventually recovering.  This is due to a possible
> race condition in the mlxbf_gige_rx_packet function, where
> the function exits with producer and consumer indices equal
> but there are remaining packet(s) to be processed. In order
> to prevent this situation, disable receive DMA during the
> processing of received packets.

Pausing Rx DMA seems a little drastic, is the capacity of the NIC
buffer large enough to sink the traffic while the stack drains 
the ring?

Could you provide a little more detail on what the HW issue is? 
There is no less intrusive way we can fix it?
David Thompson Oct. 25, 2022, 4:31 p.m. UTC | #2
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Monday, September 19, 2022 5:18 PM
> To: David Thompson <davthompson@nvidia.com>
> Cc: davem@davemloft.net; edumazet@google.com; pabeni@redhat.com;
> netdev@vger.kernel.org; cai.huoqing@linux.dev; brgl@bgdev.pl; Liming Sun
> <limings@nvidia.com>; Asmaa Mnebhi <asmaa@nvidia.com>
> Subject: Re: [PATCH net v1] mlxbf_gige: fix receive packet race condition
> 
> On Thu, 8 Sep 2022 16:28:53 -0400 David Thompson wrote:
> > Under heavy traffic, the BF2 Gigabit interface can become unresponsive
> > for periods of time (several minutes) before eventually recovering.
> > This is due to a possible race condition in the mlxbf_gige_rx_packet
> > function, where the function exits with producer and consumer indices
> > equal but there are remaining packet(s) to be processed. In order to
> > prevent this situation, disable receive DMA during the processing of
> > received packets.
> 
> Pausing Rx DMA seems a little drastic, is the capacity of the NIC buffer large enough to sink the
> traffic while the stack drains the ring?
> 
> Could you provide a little more detail on what the HW issue is?
> There is no less intrusive way we can fix it?

Thank you for your insight Jakub.  I will review this patch and see if
it can be solved without pausing of the DMA process.

FYI, a little background on the DMA operation in hardware:

The pausing of RX DMA prevents writing new packets to memory.
New packets will be written to a 20KB buffer (but won't get forwarded to memory and no consumer index update). Once this buffer is full, packets will get dropped.  

Thanks, Dave
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
index afa3b92a6905..1490fbc74169 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
@@ -299,6 +299,10 @@  int mlxbf_gige_poll(struct napi_struct *napi, int budget)
 
 	mlxbf_gige_handle_tx_complete(priv);
 
+	data = readq(priv->base + MLXBF_GIGE_RX_DMA);
+	data &= ~MLXBF_GIGE_RX_DMA_EN;
+	writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
+
 	do {
 		remaining_pkts = mlxbf_gige_rx_packet(priv, &work_done);
 	} while (remaining_pkts && work_done < budget);
@@ -314,6 +318,10 @@  int mlxbf_gige_poll(struct napi_struct *napi, int budget)
 		data = readq(priv->base + MLXBF_GIGE_INT_MASK);
 		data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET;
 		writeq(data, priv->base + MLXBF_GIGE_INT_MASK);
+
+		data = readq(priv->base + MLXBF_GIGE_RX_DMA);
+		data |= MLXBF_GIGE_RX_DMA_EN;
+		writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
 	}
 
 	return work_done;