[net,v1] mlxbf_gige: fix receive packet race condition

Message ID	20220908202853.21725-1-davthompson@nvidia.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; pr=C From: David Thompson <davthompson@nvidia.com> To: <davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org>, <pabeni@redhat.com> CC: <netdev@vger.kernel.org>, <cai.huoqing@linux.dev>, <brgl@bgdev.pl>, <limings@nvidia.com>, David Thompson <davthompson@nvidia.com>, Asmaa Mnebhi <asmaa@nvidia.com> Subject: [PATCH net v1] mlxbf_gige: fix receive packet race condition Date: Thu, 8 Sep 2022 16:28:53 -0400 Message-ID: <20220908202853.21725-1-davthompson@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	[net,v1] mlxbf_gige: fix receive packet race condition \| expand [net,v1] mlxbf_gige: fix receive packet race condition

Message ID

20220908202853.21725-1-davthompson@nvidia.com (mailing list archive)

State

Changes Requested

Delegated to:

Netdev Maintainers

Headers

Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 12.22.5.235 as permitted sender) receiver=protection.outlook.com;
 client-ip=12.22.5.235; helo=mail.nvidia.com; pr=C
From: David Thompson <davthompson@nvidia.com>
To: <davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org>,
        <pabeni@redhat.com>
CC: <netdev@vger.kernel.org>, <cai.huoqing@linux.dev>, <brgl@bgdev.pl>,
        <limings@nvidia.com>, David Thompson <davthompson@nvidia.com>,
        Asmaa Mnebhi <asmaa@nvidia.com>
Subject: [PATCH net v1] mlxbf_gige: fix receive packet race condition
Date: Thu, 8 Sep 2022 16:28:53 -0400
Message-ID: <20220908202853.21725-1-davthompson@nvidia.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Sep 2022 20:28:59.1693
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 cc4a51e5-7ccb-4f14-4710-08da91d8c2cb
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.235];Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 BN8NAM11FT111.eop-nam11.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4200
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: kuba@kernel.org

Series

[net,v1] mlxbf_gige: fix receive packet race condition | expand

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	success	CCed 8 of 8 maintainers
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Context

Check

Description

netdev/tree_selection

success

Clearly marked for net

netdev/fixes_present

success

Fixes tag present in non-next series

netdev/subject_prefix

success

Link

netdev/cover_letter

success

Single patches do not need cover letters

netdev/patch_count

success

Link

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 0 this patch: 0

netdev/cc_maintainers

success

CCed 8 of 8 maintainers

netdev/build_clang

success

Errors and warnings before: 0 this patch: 0

netdev/module_param

success

Was 0 now: 0

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

Fixes tag looks correct

netdev/build_allmodconfig_warn

success

Errors and warnings before: 0 this patch: 0

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 20 lines checked

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

Commit Message

David Thompson Sept. 8, 2022, 8:28 p.m. UTC

Under heavy traffic, the BF2 Gigabit interface can
become unresponsive for periods of time (several minutes)
before eventually recovering.  This is due to a possible
race condition in the mlxbf_gige_rx_packet function, where
the function exits with producer and consumer indices equal
but there are remaining packet(s) to be processed. In order
to prevent this situation, disable receive DMA during the
processing of received packets.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jakub Kicinski Sept. 19, 2022, 9:17 p.m. UTC | #1

On Thu, 8 Sep 2022 16:28:53 -0400 David Thompson wrote:
> Under heavy traffic, the BF2 Gigabit interface can
> become unresponsive for periods of time (several minutes)
> before eventually recovering.  This is due to a possible
> race condition in the mlxbf_gige_rx_packet function, where
> the function exits with producer and consumer indices equal
> but there are remaining packet(s) to be processed. In order
> to prevent this situation, disable receive DMA during the
> processing of received packets.

Pausing Rx DMA seems a little drastic, is the capacity of the NIC
buffer large enough to sink the traffic while the stack drains 
the ring?

Could you provide a little more detail on what the HW issue is? 
There is no less intrusive way we can fix it?

David Thompson Oct. 25, 2022, 4:31 p.m. UTC | #2

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Monday, September 19, 2022 5:18 PM
> To: David Thompson <davthompson@nvidia.com>
> Cc: davem@davemloft.net; edumazet@google.com; pabeni@redhat.com;
> netdev@vger.kernel.org; cai.huoqing@linux.dev; brgl@bgdev.pl; Liming Sun
> <limings@nvidia.com>; Asmaa Mnebhi <asmaa@nvidia.com>
> Subject: Re: [PATCH net v1] mlxbf_gige: fix receive packet race condition
> 
> On Thu, 8 Sep 2022 16:28:53 -0400 David Thompson wrote:
> > Under heavy traffic, the BF2 Gigabit interface can become unresponsive
> > for periods of time (several minutes) before eventually recovering.
> > This is due to a possible race condition in the mlxbf_gige_rx_packet
> > function, where the function exits with producer and consumer indices
> > equal but there are remaining packet(s) to be processed. In order to
> > prevent this situation, disable receive DMA during the processing of
> > received packets.
> 
> Pausing Rx DMA seems a little drastic, is the capacity of the NIC buffer large enough to sink the
> traffic while the stack drains the ring?
> 
> Could you provide a little more detail on what the HW issue is?
> There is no less intrusive way we can fix it?

Thank you for your insight Jakub.  I will review this patch and see if
it can be solved without pausing of the DMA process.

FYI, a little background on the DMA operation in hardware:

The pausing of RX DMA prevents writing new packets to memory.
New packets will be written to a 20KB buffer (but won't get forwarded to memory and no consumer index update). Once this buffer is full, packets will get dropped.  

Thanks, Dave

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
index afa3b92a6905..1490fbc74169 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
@@ -299,6 +299,10 @@  int mlxbf_gige_poll(struct napi_struct *napi, int budget)
 
 	mlxbf_gige_handle_tx_complete(priv);
 
+	data = readq(priv->base + MLXBF_GIGE_RX_DMA);
+	data &= ~MLXBF_GIGE_RX_DMA_EN;
+	writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
+
 	do {
 		remaining_pkts = mlxbf_gige_rx_packet(priv, &work_done);
 	} while (remaining_pkts && work_done < budget);
@@ -314,6 +318,10 @@  int mlxbf_gige_poll(struct napi_struct *napi, int budget)
 		data = readq(priv->base + MLXBF_GIGE_INT_MASK);
 		data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET;
 		writeq(data, priv->base + MLXBF_GIGE_INT_MASK);
+
+		data = readq(priv->base + MLXBF_GIGE_RX_DMA);
+		data |= MLXBF_GIGE_RX_DMA_EN;
+		writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
 	}
 
 	return work_done;

[net,v1] mlxbf_gige: fix receive packet race condition

Checks

Commit Message

Comments

Patch