[net-next,5/5] mlxsw: pci: Use NAPI for event processing

Message ID	2a2eeac12c4d70b366f4e4afdfabdf133359acd2.1714134205.git.petrm@nvidia.com (mailing list archive)
State	Accepted
Delegated to:	Netdev Maintainers
Headers	show Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2071.outbound.protection.outlook.com [40.107.223.71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1AFE2146D56 for <netdev@vger.kernel.org>; Fri, 26 Apr 2024 12:45:26 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C From: Petr Machata <petrm@nvidia.com> To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, <netdev@vger.kernel.org> CC: Ido Schimmel <idosch@nvidia.com>, Petr Machata <petrm@nvidia.com>, "Amit Cohen" <amcohen@nvidia.com>, <mlxsw@nvidia.com> Subject: [PATCH net-next 5/5] mlxsw: pci: Use NAPI for event processing Date: Fri, 26 Apr 2024 14:42:26 +0200 Message-ID: <2a2eeac12c4d70b366f4e4afdfabdf133359acd2.1714134205.git.petrm@nvidia.com> In-Reply-To: <cover.1714134205.git.petrm@nvidia.com> References: <cover.1714134205.git.petrm@nvidia.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	mlxsw: Improve events processing performance \| expand [net-next,0/5] mlxsw: Improve events processing performance [net-next,1/5] mlxsw: pci: Handle up to 64 Rx completions in tasklet [net-next,2/5] mlxsw: pci: Ring RDQ and CQ doorbells once per several completions [net-next,3/5] mlxsw: pci: Initialize dummy net devices for NAPI [net-next,4/5] mlxsw: pci: Reorganize 'mlxsw_pci_queue' structure [net-next,5/5] mlxsw: pci: Use NAPI for event processing

Message ID

2a2eeac12c4d70b366f4e4afdfabdf133359acd2.1714134205.git.petrm@nvidia.com (mailing list archive)

State

Accepted

Delegated to:

Netdev Maintainers

Headers

Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 216.228.117.160 as permitted sender) receiver=protection.outlook.com;
 client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C
From: Petr Machata <petrm@nvidia.com>
To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet
	<edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni
	<pabeni@redhat.com>, <netdev@vger.kernel.org>
CC: Ido Schimmel <idosch@nvidia.com>, Petr Machata <petrm@nvidia.com>, "Amit
 Cohen" <amcohen@nvidia.com>, <mlxsw@nvidia.com>
Subject: [PATCH net-next 5/5] mlxsw: pci: Use NAPI for event processing
Date: Fri, 26 Apr 2024 14:42:26 +0200
Message-ID: 
 <2a2eeac12c4d70b366f4e4afdfabdf133359acd2.1714134205.git.petrm@nvidia.com>
In-Reply-To: <cover.1714134205.git.petrm@nvidia.com>
References: <cover.1714134205.git.petrm@nvidia.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Apr 2024 12:45:22.6703
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 f7e8bf69-4d4a-4c41-d5ed-08dc65eebd07
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	CY4PEPF0000EE3B.namprd03.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB6800
X-Patchwork-Delegate: kuba@kernel.org

Series

mlxsw: Improve events processing performance | expand

Context	Check	Description
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 926 this patch: 926
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 6 of 6 maintainers
netdev/build_clang	success	Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 937 this patch: 937
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 170 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-04-26--21-00 (tests: 992)

Context

Check

Description

netdev/series_format

success

Posting correctly formatted

netdev/tree_selection

success

Clearly marked for net-next

netdev/ynl

success

Generated files up to date; no warnings/errors; no diff in generated;

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 926 this patch: 926

netdev/build_tools

success

No tools touched, skip

netdev/cc_maintainers

success

CCed 6 of 6 maintainers

netdev/build_clang

success

Errors and warnings before: 937 this patch: 937

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 937 this patch: 937

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 170 lines checked

netdev/build_clang_rust

success

No Rust files in patch. Skipping build

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

netdev/contest

success

net-next-2024-04-26--21-00 (tests: 992)

Commit Message

Petr Machata April 26, 2024, 12:42 p.m. UTC

From: Amit Cohen <amcohen@nvidia.com>

Spectrum ASICs only support a single interrupt, that means that all the
events are handled by one IRQ (interrupt request) handler. Once an
interrupt is received, we schedule tasklet to handle events from EQ and
then schedule tasklets to handle completions from CQs. Tasklet runs in
softIRQ (software IRQ) context, and will be run on the same CPU which
scheduled it. That means that today we use only one CPU to handle all the
packets (both network packets and EMADs) from hardware.

This can be improved using NAPI. The idea is to use NAPI instance per
CQ, which is mapped 1:1 to DQ (RDQ or SDQ). NAPI poll method can be run
in kernel thread, so then the driver will be able to handle WQEs in several
CPUs. Convert the existing code to use NAPI APIs.

Add NAPI instance as part of 'struct mlxsw_pci_queue' and initialize it
as part of CQs initialization. Set the appropriate poll method and dummy
net device, according to queue number, similar to tasklet setup. For CQs
which are used for completions of RDQ, use Rx poll method and
'napi_dev_rx', which is set as 'threaded'. It means that Rx poll method
will run in kernel context, so several RDQs will be handled in parallel.
For CQs which are used for completions of SDQ, use Tx poll method and
'napi_dev_tx', this method will run in softIRQ context, as it is
recommended in NAPI documentation, as Tx packets' processing is short task.

Convert mlxsw_pci_cq_{rx,tx}_tasklet() to poll methods. Handle 'budget'
argument - ignore it in Tx poll method, as it is recommended to not limit
Tx processing. For Rx processing, handle up to 'budget' completions.
Return 'work_done' which is the amount of completions that were handled.

Handle the following cases:
1. After processing 'budget' completions, the driver still has work to do:
   Return work-done = budget. In that case, the NAPI instance will be
   polled again (without the need to be rescheduled). Do not re-arm the
   queue, as NAPI will handle the reschedule, so we do not have to involve
   hardware to send an additional interrupt for the completions that should
   be processed.

2. Event processing has been completed:
   Call napi_complete_done() to mark NAPI processing as completed, which
   means that the poll method will not be rescheduled. Re-arm the queue,
   as all completions were handled.

   In case that poll method handled exactly 'budget' completions, return
   work-done = budget -1, to distinguish from the case that driver still
   has completions to handle. Otherwise, return the amount of completions
   that were handled.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 96 ++++++++++++++++++-----
 1 file changed, 77 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 7724f9a61479..bf66d996e32e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -87,7 +87,7 @@  struct mlxsw_pci_queue {
 		struct {
 			enum mlxsw_pci_cqe_v v;
 			struct mlxsw_pci_queue *dq;
-			struct tasklet_struct tasklet;
+			struct napi_struct napi;
 		} cq;
 		struct {
 			struct tasklet_struct tasklet;
@@ -684,16 +684,29 @@  static char *mlxsw_pci_cq_sw_cqe_get(struct mlxsw_pci_queue *q)
 	return elem;
 }
 
-#define MLXSW_PCI_CQ_MAX_HANDLE 64
+static bool mlxsw_pci_cq_cqe_to_handle(struct mlxsw_pci_queue *q)
+{
+	struct mlxsw_pci_queue_elem_info *elem_info;
+	bool owner_bit;
+
+	elem_info = mlxsw_pci_queue_elem_info_consumer_get(q);
+	owner_bit = mlxsw_pci_cqe_owner_get(q->u.cq.v, elem_info->elem);
+	return !mlxsw_pci_elem_hw_owned(q, owner_bit);
+}
 
-static void mlxsw_pci_cq_rx_tasklet(struct tasklet_struct *t)
+static int mlxsw_pci_napi_poll_cq_rx(struct napi_struct *napi, int budget)
 {
-	struct mlxsw_pci_queue *q = from_tasklet(q, t, u.cq.tasklet);
+	struct mlxsw_pci_queue *q = container_of(napi, struct mlxsw_pci_queue,
+						 u.cq.napi);
 	struct mlxsw_pci_queue *rdq = q->u.cq.dq;
 	struct mlxsw_pci *mlxsw_pci = q->pci;
-	int items = 0;
+	int work_done = 0;
 	char *cqe;
 
+	/* If the budget is 0, Rx processing should be skipped. */
+	if (unlikely(!budget))
+		return 0;
+
 	while ((cqe = mlxsw_pci_cq_sw_cqe_get(q))) {
 		u16 wqe_counter = mlxsw_pci_cqe_wqe_counter_get(cqe);
 		u8 sendq = mlxsw_pci_cqe_sr_get(q->u.cq.v, cqe);
@@ -712,22 +725,44 @@  static void mlxsw_pci_cq_rx_tasklet(struct tasklet_struct *t)
 		mlxsw_pci_cqe_rdq_handle(mlxsw_pci, rdq,
 					 wqe_counter, q->u.cq.v, cqe);
 
-		if (++items == MLXSW_PCI_CQ_MAX_HANDLE)
+		if (++work_done == budget)
 			break;
 	}
 
 	mlxsw_pci_queue_doorbell_consumer_ring(mlxsw_pci, q);
 	mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, rdq);
-	mlxsw_pci_queue_doorbell_arm_consumer_ring(mlxsw_pci, q);
+
+	if (work_done < budget)
+		goto processing_completed;
+
+	/* The driver still has outstanding work to do, budget was exhausted.
+	 * Return exactly budget. In that case, the NAPI instance will be polled
+	 * again.
+	 */
+	if (mlxsw_pci_cq_cqe_to_handle(q))
+		goto out;
+
+	/* The driver processed all the completions and handled exactly
+	 * 'budget'. Return 'budget - 1' to distinguish from the case that
+	 * driver still has completions to handle.
+	 */
+	if (work_done == budget)
+		work_done--;
+
+processing_completed:
+	if (napi_complete_done(napi, work_done))
+		mlxsw_pci_queue_doorbell_arm_consumer_ring(mlxsw_pci, q);
+out:
+	return work_done;
 }
 
-static void mlxsw_pci_cq_tx_tasklet(struct tasklet_struct *t)
+static int mlxsw_pci_napi_poll_cq_tx(struct napi_struct *napi, int budget)
 {
-	struct mlxsw_pci_queue *q = from_tasklet(q, t, u.cq.tasklet);
+	struct mlxsw_pci_queue *q = container_of(napi, struct mlxsw_pci_queue,
+						 u.cq.napi);
 	struct mlxsw_pci_queue *sdq = q->u.cq.dq;
 	struct mlxsw_pci *mlxsw_pci = q->pci;
-	int credits = q->count >> 1;
-	int items = 0;
+	int work_done = 0;
 	char *cqe;
 
 	while ((cqe = mlxsw_pci_cq_sw_cqe_get(q))) {
@@ -752,11 +787,21 @@  static void mlxsw_pci_cq_tx_tasklet(struct tasklet_struct *t)
 		mlxsw_pci_cqe_sdq_handle(mlxsw_pci, sdq,
 					 wqe_counter, q->u.cq.v, ncqe);
 
-		if (++items == credits)
-			break;
+		work_done++;
 	}
 
+	/* If the budget is 0 napi_complete_done() should never be called. */
+	if (unlikely(!budget))
+		goto processing_completed;
+
+	work_done = min(work_done, budget - 1);
+	if (unlikely(!napi_complete_done(napi, work_done)))
+		goto out;
+
+processing_completed:
 	mlxsw_pci_queue_doorbell_arm_consumer_ring(mlxsw_pci, q);
+out:
+	return work_done;
 }
 
 static enum mlxsw_pci_cq_type
@@ -772,17 +817,29 @@  mlxsw_pci_cq_type(const struct mlxsw_pci *mlxsw_pci,
 	return MLXSW_PCI_CQ_RDQ;
 }
 
-static void mlxsw_pci_cq_tasklet_setup(struct mlxsw_pci_queue *q,
-				       enum mlxsw_pci_cq_type cq_type)
+static void mlxsw_pci_cq_napi_setup(struct mlxsw_pci_queue *q,
+				    enum mlxsw_pci_cq_type cq_type)
 {
+	struct mlxsw_pci *mlxsw_pci = q->pci;
+
 	switch (cq_type) {
 	case MLXSW_PCI_CQ_SDQ:
-		tasklet_setup(&q->u.cq.tasklet, mlxsw_pci_cq_tx_tasklet);
+		netif_napi_add(mlxsw_pci->napi_dev_tx, &q->u.cq.napi,
+			       mlxsw_pci_napi_poll_cq_tx);
 		break;
 	case MLXSW_PCI_CQ_RDQ:
-		tasklet_setup(&q->u.cq.tasklet, mlxsw_pci_cq_rx_tasklet);
+		netif_napi_add(mlxsw_pci->napi_dev_rx, &q->u.cq.napi,
+			       mlxsw_pci_napi_poll_cq_rx);
 		break;
 	}
+
+	napi_enable(&q->u.cq.napi);
+}
+
+static void mlxsw_pci_cq_napi_teardown(struct mlxsw_pci_queue *q)
+{
+	napi_disable(&q->u.cq.napi);
+	netif_napi_del(&q->u.cq.napi);
 }
 
 static int mlxsw_pci_cq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
@@ -817,7 +874,7 @@  static int mlxsw_pci_cq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 	err = mlxsw_cmd_sw2hw_cq(mlxsw_pci->core, mbox, q->num);
 	if (err)
 		return err;
-	mlxsw_pci_cq_tasklet_setup(q, mlxsw_pci_cq_type(mlxsw_pci, q));
+	mlxsw_pci_cq_napi_setup(q, mlxsw_pci_cq_type(mlxsw_pci, q));
 	mlxsw_pci_queue_doorbell_consumer_ring(mlxsw_pci, q);
 	mlxsw_pci_queue_doorbell_arm_consumer_ring(mlxsw_pci, q);
 	return 0;
@@ -826,6 +883,7 @@  static int mlxsw_pci_cq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
 static void mlxsw_pci_cq_fini(struct mlxsw_pci *mlxsw_pci,
 			      struct mlxsw_pci_queue *q)
 {
+	mlxsw_pci_cq_napi_teardown(q);
 	mlxsw_cmd_hw2sw_cq(mlxsw_pci->core, q->num);
 }
 
@@ -886,7 +944,7 @@  static void mlxsw_pci_eq_tasklet(struct tasklet_struct *t)
 	cq_count = mlxsw_pci->num_cqs;
 	for_each_set_bit(cqn, active_cqns, cq_count) {
 		q = mlxsw_pci_cq_get(mlxsw_pci, cqn);
-		tasklet_schedule(&q->u.cq.tasklet);
+		napi_schedule(&q->u.cq.napi);
 	}
 }

[net-next,5/5] mlxsw: pci: Use NAPI for event processing

Checks

Commit Message

Patch