From patchwork Fri Aug 30 16:25:00 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785325
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BA081B81B4;
	Fri, 30 Aug 2024 16:25:41 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035143; cv=none;
 b=dD0LpuFrnp22dzlcM8r/egJfrNkuEzmx8BpK2npmdXcowcWux4rBHFLnXNQLbu/lFra/VhjLf75MpJrFPtWeKUe56pRjY+tWxoW4E+UuKpSefhuq+nV5f69BH6S7xEKh74iuLaWLma5LuJ6qzkU1g/ZrnaPGpyJLPesQ7NYwT2Q=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035143; c=relaxed/simple;
	bh=jcA9U7OKabMeHIJY0Zn6UT9mfWJf+XLAQpUH7kwaMnQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=KzRJKr18Zk6dC0awbnAxPuEqPfmANSQOv5Q0OhJE6awW+9ubgUodGM9zI7DfmYz+HBuab/CL48e65bZZitDgVWE2DpJA6IlqNtaTpCoY4g4eAl+aueJKTlCLGY8cRNTZA/isMI0spXS+LBySPhDE4hqIBoPBDqJVeon29+xbqIs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=XY/HOHjs; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="XY/HOHjs"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035142; x=1756571142;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=jcA9U7OKabMeHIJY0Zn6UT9mfWJf+XLAQpUH7kwaMnQ=;
  b=XY/HOHjsTpzmaCDNU6SXfh20x75ZZZ2BzExfyIO/OGCTUMZCqufF6tSS
   rPs4WdeJritBxX6StDFKOMWr9OHHbfpnmDQ/dkZv71l6A3pa58UrxWesi
   0VShzdPvNUkE7jxbhpWnsMkKZlz4yHgDivDA6ro0pd2/7NGkBJ1c+9RMa
   nMWrtzYcsdlV7oIEBkc4jhqU4fSK5npy53RZeepzful+pCjZgeDqo0iIz
   RJtdNgYtL+MoQlxqt6Mhf3TphI4fSN9GvIi0fQbFh+ZW/BfSCwCJ9W1jt
   2XxWxQ8EwSoocZO915jxfQOAmdiFy1YVBrac5gI4qn94CUZ0ZuEq9KtEl
   Q==;
X-CSE-ConnectionGUID: 755w5N9cTcGB68mnNmqDwg==
X-CSE-MsgGUID: 5QJE7LQ7TnSzZbkMFJ4m3w==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068878"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49068878"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:25:42 -0700
X-CSE-ConnectionGUID: MWh4kfuJS42GLauEUYyTvw==
X-CSE-MsgGUID: UDm8HCqnSBaAwilaLVSrAQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996434"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:38 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 1/9] firmware/psci: fix missing '%u' format literal
 in kthread_create_on_cpu()
Date: Fri, 30 Aug 2024 18:25:00 +0200
Message-ID: <20240830162508.1009458-2-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

kthread_create_on_cpu() always requires format string to contain one
'%u' at the end, as it automatically adds the CPU ID when passing it
to kthread_create_on_node(). The former doesn't marked as __printf()
as it's not printf-like itself, which effectively hides this from
the compiler.
If you convert this function to printf-like, you'll see the following:

In file included from drivers/firmware/psci/psci_checker.c:15:
drivers/firmware/psci/psci_checker.c: In function 'suspend_tests':
drivers/firmware/psci/psci_checker.c:401:48: warning: too many arguments for format [-Wformat-extra-args]
     401 |                                                "psci_suspend_test");
         |                                                ^~~~~~~~~~~~~~~~~~~
drivers/firmware/psci/psci_checker.c:400:32: warning: data argument not used by format string [-Wformat-extra-args]
     400 |                                                (void *)(long)cpu, cpu,
         |                                                                   ^
     401 |                                                "psci_suspend_test");
         |                                                ~~~~~~~~~~~~~~~~~~~

Add the missing format literal to fix this. Now the corresponding
kthread will be named as "psci_suspend_test-<cpuid>", as it's meant by
kthread_create_on_cpu().

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408141012.KhvKaxoh-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202408141243.eQiEOQQe-lkp@intel.com
Fixes: ea8b1c4a6019 ("drivers: psci: PSCI checker module")
Cc: stable@vger.kernel.org # 4.10+
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
---
 drivers/firmware/psci/psci_checker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/psci/psci_checker.c b/drivers/firmware/psci/psci_checker.c
index 116eb465cdb4..ecc511c745ce 100644
--- a/drivers/firmware/psci/psci_checker.c
+++ b/drivers/firmware/psci/psci_checker.c
@@ -398,7 +398,7 @@ static int suspend_tests(void)
 
 		thread = kthread_create_on_cpu(suspend_test_thread,
 					       (void *)(long)cpu, cpu,
-					       "psci_suspend_test");
+					       "psci_suspend_test-%u");
 		if (IS_ERR(thread))
 			pr_err("Failed to create kthread on CPU %d\n", cpu);
 		else

From patchwork Fri Aug 30 16:25:01 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785326
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EAC61BA265;
	Fri, 30 Aug 2024 16:25:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035147; cv=none;
 b=AwUF0mm1gmoi4MYr3ML6oG/ot+mSs/j8ZYSs2qP8SxvoXzEuSKAoPRDHc1UlQ35wCel/UpSB/0hQeaWeNXGsQBCDVy6VJ/+D0LKIsoCZBRhlDihGYf3FoNlkvtsZBNgIjImTT/20UryRQWGEogr+dEyydFoQzDxioHrmjLq5Ygg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035147; c=relaxed/simple;
	bh=K6sCx4+usVNH07IrYSUEk9l81+65yfH1aelgJ1+bzjc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=VDZZSk4Pxa8wqO4H5fwb1SSPt45iZYyXTFIVzsXlg0z6AJkQMUeX+vO6HLpFacZWfF1TnOzzmr1JAYdjmXPeaJ9FyR9Cw0IsoBYUcpWDf4VxU8+VcM3WBdMJoR4HIEcDkgO5qIzidls94JfnqGx+fuc6Prl35WQ0lonDwwJUKvw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=Lvbk0Yyz; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="Lvbk0Yyz"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035146; x=1756571146;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=K6sCx4+usVNH07IrYSUEk9l81+65yfH1aelgJ1+bzjc=;
  b=Lvbk0Yyzflsfue/RIbcAwnBp3VAPYBtUiVRmd5fAKdWzvr7gbhl8TmqD
   0lVj3+HdTUltyyTCbZIh7hefZnZcvYpkBvPxZaY489zurlMPBDX2/cmni
   xp1w3XskK8y5Mn1nrIZSeSutSflcQLniS4Y5z9Bfs1XnsaKCbEX+FzuSs
   oII0zzJaxxyoBfAG83r4r7fmgg0pet3jwESNI3h5AgcdcDCWGTtevdn08
   8kRJ5s60KC6Y/QZU4EPTMFKLFy0NOI62cLLVye51UfhPurqGgW3PNd7V6
   6UetfdNZOgvawl/3npRqkLASqpnUPPXJxTU/m9UHFkYQYvwUi8MNZA6bl
   A==;
X-CSE-ConnectionGUID: 54409XAlRs2L5VnPLAl31Q==
X-CSE-MsgGUID: I9wmsyu1TpqJhdIBNQAqzQ==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068891"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49068891"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:25:46 -0700
X-CSE-ConnectionGUID: e9oAoBBmTCSlvQprHrdzog==
X-CSE-MsgGUID: hJd6l5JVQHinTUNGW0r9YQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996442"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:41 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 2/9] kthread: allow vararg
 kthread_{create,run}_on_cpu()
Date: Fri, 30 Aug 2024 18:25:01 +0200
Message-ID: <20240830162508.1009458-3-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

Currently, kthread_{create,run}_on_cpu() doesn't support varargs like
kthread_create{,_on_node}() do, which makes them less convenient to
use.
Convert them to take varargs as the last argument. The only difference
is that they always append the CPU ID at the end and require the format
string to have an excess '%u' at the end due to that. That's still true;
meanwhile, the compiler will correctly point out to that if missing.
One more nice side effect is that you can now use the underscored
__kthread_create_on_cpu() if you want to override that rule and not
have CPU ID at the end of the name.
The current callers are not anyhow affected.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/kthread.h | 51 ++++++++++++++++++++++++++---------------
 kernel/kthread.c        | 22 ++++++++++--------
 2 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b11f53c1ba2e..27a94e691948 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -27,11 +27,21 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 #define kthread_create(threadfn, data, namefmt, arg...) \
 	kthread_create_on_node(threadfn, data, NUMA_NO_NODE, namefmt, ##arg)
 
-
-struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
-					  void *data,
-					  unsigned int cpu,
-					  const char *namefmt);
+__printf(4, 5)
+struct task_struct *__kthread_create_on_cpu(int (*threadfn)(void *data),
+					    void *data, unsigned int cpu,
+					    const char *namefmt, ...);
+
+#define kthread_create_on_cpu(threadfn, data, cpu, namefmt, ...)	   \
+	_kthread_create_on_cpu(threadfn, data, cpu, __UNIQUE_ID(cpu_),	   \
+			       namefmt, ##__VA_ARGS__)
+
+#define _kthread_create_on_cpu(threadfn, data, cpu, uc, namefmt, ...) ({   \
+	u32 uc = (cpu);							   \
+									   \
+	__kthread_create_on_cpu(threadfn, data, uc, namefmt,		   \
+				##__VA_ARGS__, uc);			   \
+})
 
 void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk);
 bool set_kthread_struct(struct task_struct *p);
@@ -62,25 +72,28 @@ bool kthread_is_per_cpu(struct task_struct *k);
  * @threadfn: the function to run until signal_pending(current).
  * @data: data ptr for @threadfn.
  * @cpu: The cpu on which the thread should be bound,
- * @namefmt: printf-style name for the thread. Format is restricted
- *	     to "name.*%u". Code fills in cpu number.
+ * @namefmt: printf-style name for the thread. Must have an excess '%u'
+ *	     at the end as kthread_create_on_cpu() fills in CPU number.
  *
  * Description: Convenient wrapper for kthread_create_on_cpu()
  * followed by wake_up_process().  Returns the kthread or
  * ERR_PTR(-ENOMEM).
  */
-static inline struct task_struct *
-kthread_run_on_cpu(int (*threadfn)(void *data), void *data,
-			unsigned int cpu, const char *namefmt)
-{
-	struct task_struct *p;
-
-	p = kthread_create_on_cpu(threadfn, data, cpu, namefmt);
-	if (!IS_ERR(p))
-		wake_up_process(p);
-
-	return p;
-}
+#define kthread_run_on_cpu(threadfn, data, cpu, namefmt, ...)		   \
+	_kthread_run_on_cpu(threadfn, data, cpu, __UNIQUE_ID(task_),	   \
+			    namefmt, ##__VA_ARGS__)
+
+#define _kthread_run_on_cpu(threadfn, data, cpu, ut, namefmt, ...)	   \
+({									   \
+	struct task_struct *ut;						   \
+									   \
+	ut = kthread_create_on_cpu(threadfn, data, cpu, namefmt,	   \
+				   ##__VA_ARGS__);			   \
+	if (!IS_ERR(ut))						   \
+		wake_up_process(ut);					   \
+									   \
+	ut;								   \
+})
 
 void free_kthread_struct(struct task_struct *k);
 void kthread_bind(struct task_struct *k, unsigned int cpu);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index f7be976ff88a..e9da0115fb2b 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -559,23 +559,27 @@ void kthread_bind(struct task_struct *p, unsigned int cpu)
 EXPORT_SYMBOL(kthread_bind);
 
 /**
- * kthread_create_on_cpu - Create a cpu bound kthread
+ * __kthread_create_on_cpu - Create a cpu bound kthread
  * @threadfn: the function to run until signal_pending(current).
  * @data: data ptr for @threadfn.
  * @cpu: The cpu on which the thread should be bound,
- * @namefmt: printf-style name for the thread. Format is restricted
- *	     to "name.*%u". Code fills in cpu number.
+ * @namefmt: printf-style name for the thread. Must have an excess '%u'
+ *	     at the end as kthread_create_on_cpu() fills in CPU number.
  *
  * Description: This helper function creates and names a kernel thread
  */
-struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
-					  void *data, unsigned int cpu,
-					  const char *namefmt)
+struct task_struct *__kthread_create_on_cpu(int (*threadfn)(void *data),
+					    void *data, unsigned int cpu,
+					    const char *namefmt, ...)
 {
 	struct task_struct *p;
+	va_list args;
+
+	va_start(args, namefmt);
+	p = __kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
+				     args);
+	va_end(args);
 
-	p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
-				   cpu);
 	if (IS_ERR(p))
 		return p;
 	kthread_bind(p, cpu);
@@ -583,7 +587,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
 	to_kthread(p)->cpu = cpu;
 	return p;
 }
-EXPORT_SYMBOL(kthread_create_on_cpu);
+EXPORT_SYMBOL(__kthread_create_on_cpu);
 
 void kthread_set_per_cpu(struct task_struct *k, int cpu)
 {

From patchwork Fri Aug 30 16:25:02 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785327
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 773381BAECD;
	Fri, 30 Aug 2024 16:25:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035151; cv=none;
 b=B/cIxCGdwxAOwPedvFYDYLkrUrCSJeXoFMlqNeaMYit51RBTrDCx2erUlWubujCpQGsoQ7xemIUXIvwoSL4LE/ZjQmusChTsNP2ugUiWoRvYgpbZl8bFGV73oMDW+o1Fi+K2iarYhGzQiWuYbzwnNWktYmZb1GUD9wO5yWpVguM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035151; c=relaxed/simple;
	bh=Jkw9xcvH47v01BMyofQZ1se88nDV8RAlcX4vSYpST48=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=B90+SYFM5o1iyLH7vLiq5LLEBLN9MY/qB5pwmA3TV7oxALrDPoTgAxIKik5zpemJYL7AgB83/X0cBSsmLuFodmxH5Vs/Q/n1FK3odsDtnA9vlbI7NrQRPXaRHmlxyx+c7n/yauR29OhH9n47xWs0gor1nGXRhJ862rxr35HL1QI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=NMt1ypIX; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="NMt1ypIX"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035150; x=1756571150;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=Jkw9xcvH47v01BMyofQZ1se88nDV8RAlcX4vSYpST48=;
  b=NMt1ypIX10OfZOU9RYfZGpcjqHidjWQJ0+D7MU4MV8kAX6mneqybn6dn
   +XsVhx0UJUrFeMHGXbZKGW7X0PET1WNcIAylbpSIPsLK1STD2J0ZIM94S
   RWPMRUpQqN6FtKlC0WXQ7VNsPdH9I+lywoLywI/6Uc96NbMWYGZLf+ehR
   w4no4etwvsLyJmxMO7MQs5YgBVwag6eLsK7c8TlM9Fil1X81PRehDG3ZY
   31tNJ8gCh7dQwUZq/IDZpHMXsstE8At4WB5hIjRCEzNe6EGjkDbddcajW
   ztDo72tUVHS33BnDvGVdoGhpXecCKz1fpa0oNAvwLjwLidKBuVhE0A2Cg
   w==;
X-CSE-ConnectionGUID: W15WPPxNQH+v4Gcksaq0vQ==
X-CSE-MsgGUID: VCmqGPusSKuVpJDQp8RTNg==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068909"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49068909"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:25:50 -0700
X-CSE-ConnectionGUID: P1Msx8FsTpGlRo0jNNXMnA==
X-CSE-MsgGUID: jwX6ZVu/Rg+18wHQBZDVEQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996446"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:45 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 3/9] net: napi: add ability to create CPU-pinned
 threaded NAPI
Date: Fri, 30 Aug 2024 18:25:02 +0200
Message-ID: <20240830162508.1009458-4-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

From: Lorenzo Bianconi <lorenzo@kernel.org>

Add netif_napi_add_percpu() to pin NAPI in threaded mode to a particular
CPU. This means, if the NAPI is not threaded, it will be run as usually,
but when switching to threaded mode, it will always be run on the
specified CPU.
It's not meant to be used in drivers, but might be useful when creating
percpu threaded NAPIs, for example, to replace percpu kthreads or
workers where a NAPI context is needed.
The already existing netif_napi_add*() are not anyhow affected.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
---
 include/linux/netdevice.h | 35 +++++++++++++++++++++++++++++++++--
 net/core/dev.c            | 18 +++++++++++++-----
 2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ca5f0dda733b..4d6fb0ccdea1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -377,6 +377,7 @@ struct napi_struct {
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	int			irq;
+	int			thread_cpuid;
 };
 
 enum {
@@ -2619,8 +2620,18 @@ static inline void netif_napi_set_irq(struct napi_struct *napi, int irq)
  */
 #define NAPI_POLL_WEIGHT 64
 
-void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
-			   int (*poll)(struct napi_struct *, int), int weight);
+void netif_napi_add_weight_percpu(struct net_device *dev,
+				  struct napi_struct *napi,
+				  int (*poll)(struct napi_struct *, int),
+				  int weight, int thread_cpuid);
+
+static inline void netif_napi_add_weight(struct net_device *dev,
+					 struct napi_struct *napi,
+					 int (*poll)(struct napi_struct *, int),
+					 int weight)
+{
+	netif_napi_add_weight_percpu(dev, napi, poll, weight, -1);
+}
 
 /**
  * netif_napi_add() - initialize a NAPI context
@@ -2665,6 +2676,26 @@ static inline void netif_napi_add_tx(struct net_device *dev,
 	netif_napi_add_tx_weight(dev, napi, poll, NAPI_POLL_WEIGHT);
 }
 
+/**
+ * netif_napi_add_percpu() - initialize a CPU-pinned threaded NAPI context
+ * @dev:  network device
+ * @napi: NAPI context
+ * @poll: polling function
+ * @thread_cpuid: CPU which this NAPI will be pinned to
+ *
+ * Variant of netif_napi_add() which pins the NAPI to the specified CPU. No
+ * changes in the "standard" mode, but in case with the threaded one, this
+ * NAPI will always be run on the passed CPU no matter where scheduled.
+ */
+static inline void netif_napi_add_percpu(struct net_device *dev,
+					 struct napi_struct *napi,
+					 int (*poll)(struct napi_struct *, int),
+					 int thread_cpuid)
+{
+	netif_napi_add_weight_percpu(dev, napi, poll, NAPI_POLL_WEIGHT,
+				     thread_cpuid);
+}
+
 /**
  *  __netif_napi_del - remove a NAPI context
  *  @napi: NAPI context
diff --git a/net/core/dev.c b/net/core/dev.c
index 98bb5f890b88..93ca3df8e9dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1428,8 +1428,13 @@ static int napi_kthread_create(struct napi_struct *n)
 	 * TASK_INTERRUPTIBLE mode to avoid the blocked task
 	 * warning and work with loadavg.
 	 */
-	n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
-				n->dev->name, n->napi_id);
+	if (n->thread_cpuid >= 0)
+		n->thread = kthread_run_on_cpu(napi_threaded_poll, n,
+					       n->thread_cpuid, "napi/%s-%u",
+					       n->dev->name);
+	else
+		n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
+					n->dev->name, n->napi_id);
 	if (IS_ERR(n->thread)) {
 		err = PTR_ERR(n->thread);
 		pr_err("kthread_run failed with err %d\n", err);
@@ -6640,8 +6645,10 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
 }
 EXPORT_SYMBOL(netif_queue_set_napi);
 
-void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
-			   int (*poll)(struct napi_struct *, int), int weight)
+void netif_napi_add_weight_percpu(struct net_device *dev,
+				  struct napi_struct *napi,
+				  int (*poll)(struct napi_struct *, int),
+				  int weight, int thread_cpuid)
 {
 	if (WARN_ON(test_and_set_bit(NAPI_STATE_LISTED, &napi->state)))
 		return;
@@ -6664,6 +6671,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 	napi->poll_owner = -1;
 #endif
 	napi->list_owner = -1;
+	napi->thread_cpuid = thread_cpuid;
 	set_bit(NAPI_STATE_SCHED, &napi->state);
 	set_bit(NAPI_STATE_NPSVC, &napi->state);
 	list_add_rcu(&napi->dev_list, &dev->napi_list);
@@ -6677,7 +6685,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 		dev->threaded = false;
 	netif_napi_set_irq(napi, -1);
 }
-EXPORT_SYMBOL(netif_napi_add_weight);
+EXPORT_SYMBOL(netif_napi_add_weight_percpu);
 
 void napi_disable(struct napi_struct *n)
 {

From patchwork Fri Aug 30 16:25:03 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785328
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 767F41BB69D;
	Fri, 30 Aug 2024 16:25:53 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035155; cv=none;
 b=acexqWopXcyG8iMv9F4/yX8wJH/a1vMwcWU8G/SOFqcvqEiVS3R35hX8UhDi4cn2J+uw3sxcUVgN801HElLZbRiGVRPAZEhspf9HEM8bH3mbaklZwCCHDU88Cd76f7O0hHLnybmn4s7ELdGSqQ6vsI3fxvnOir21qi3DcqIIy+4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035155; c=relaxed/simple;
	bh=a0UCboQUXUg/DyXfTNwZ+U8UqIT8QHC7+/9PZrB7usY=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=hUkWLbGWSV6l6UI7Q65sfxpNbpBEpa/60NxQsOml77jLPwcnlGzqqXuVD7a7szPPhC4XLGWrCBdYPLLFcVuuOyOi4S5FXdHUT5i4KsQC5Dd5uxgGrfJUbBR+WyN7w7EGicRKJSxyvvkOvDToFGnx+/umb6bl0STCkqYhPgvq2h4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=EGQLKToN; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="EGQLKToN"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035154; x=1756571154;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=a0UCboQUXUg/DyXfTNwZ+U8UqIT8QHC7+/9PZrB7usY=;
  b=EGQLKToN/nBlZV0O6KMMxCBS6hJfOK0r1Aj0twLgE/PAcbPWzskhMGAy
   xeVuhX4pVeygLPL5XyzUdtUWlJ+vaTkEDLl1t1nDrxJU5eOVNew4HFjDU
   //sl92gr7nQ76idwM03GJW2I6tlf8XNuQAc/LRIU7Ifv6cDv6czXmFKXW
   mUR4H2v9f3Iyn6tpgsAyPwtbjD7AgbuXNRQmYLuhD0Gjx/s7xB8NaBc7H
   R58SrDioDuTU3hvRMdA/rvYe5+ZU2vFaq8OZxdPxHK1/NR36+INR4bn/E
   0r5d6+lMd1q4iF7h53wpT8XOLpjDXW4PXQiqYY4IUD9E6HuCN97qGae3A
   A==;
X-CSE-ConnectionGUID: HGJH3lWeTfGlGxLidKSnCA==
X-CSE-MsgGUID: w59K+8ILS6ueDHmS3jdFpw==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068924"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49068924"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:25:54 -0700
X-CSE-ConnectionGUID: ymd8LKSdQ+6ayUMFVEOdZg==
X-CSE-MsgGUID: cBCpxEQgSPu4tUGMhlVZ0w==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996450"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:49 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 4/9] bpf: cpumap: use CPU-pinned threaded NAPI w/GRO
 instead of kthread
Date: Fri, 30 Aug 2024 18:25:03 +0200
Message-ID: <20240830162508.1009458-5-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

From: Lorenzo Bianconi <lorenzo@kernel.org>

Currently, cpumap uses its own kthread which processes cpumap-redirected
frames by batches of 8, without any weighting (but with rescheduling
points). The resulting skbs get passed to the stack via
netif_receive_skb_list(), which means no GRO happens.
In order to enable GRO, remove all custom kthread logics from the cpumap
and use CPU-pinned threaded NAPIs with the default weight of 64. When a
cpumap is created, a new logical netdevice called "cpumap" is created to
run these NAPIs on it (one per map). Then, a percpu threaded NAPI context
is created for each cpumap entry, IOW for each specified CPU.
Instead of wake_up_process(), the NAPI is now scheduled and runs as
usually: with the budget of 64, napi_complete_done() is called if the
budget is not exhausted. Frames are still processed by batches of 8.
Instead of netif_receive_skb_list(), napi_gro_receive() is now used.

Alex' tests with an UDP trafficgen and small frame size:

                no GRO    GRO
baseline        2.7       N/A    Mpps
threaded GRO    2.3       4      Mpps

diff            -14       +48    %

Daniel's tests with neper's TCP RR tests shows +14% throughput
increase[0].

Currently, GRO on cpumap is limited to that the checksum status is not
known as &xdp_frame doesn't have such metadata. When we have a way to
pass it from the drivers, the boost will be much bigger.

Cc: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/merfatcdvwpx2lj4j2pahhwp4vihstpidws3jwljwazhh76xkd@t5vsh4gvk4mh [0]
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 kernel/bpf/cpumap.c | 167 ++++++++++++++++++++------------------------
 1 file changed, 76 insertions(+), 91 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index fbdf5a1aabfe..d1cfa4111727 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -28,14 +28,10 @@
 
 #include <linux/sched.h>
 #include <linux/workqueue.h>
-#include <linux/kthread.h>
 #include <linux/completion.h>
 #include <trace/events/xdp.h>
 #include <linux/btf_ids.h>
 
-#include <linux/netdevice.h>   /* netif_receive_skb_list */
-#include <linux/etherdevice.h> /* eth_type_trans */
-
 /* General idea: XDP packets getting XDP redirected to another CPU,
  * will maximum be stored/queued for one driver ->poll() call.  It is
  * guaranteed that queueing the frame and the flush operation happen on
@@ -56,20 +52,22 @@ struct xdp_bulk_queue {
 
 /* Struct for every remote "destination" CPU in map */
 struct bpf_cpu_map_entry {
-	u32 cpu;    /* kthread CPU and map index */
+	u32 cpu;    /* NAPI thread CPU and map index */
 	int map_id; /* Back reference to map */
 
 	/* XDP can run multiple RX-ring queues, need __percpu enqueue store */
 	struct xdp_bulk_queue __percpu *bulkq;
 
-	/* Queue with potential multi-producers, and single-consumer kthread */
+	/*
+	 * Queue with potential multi-producers and single-consumer
+	 * NAPI thread
+	 */
 	struct ptr_ring *queue;
-	struct task_struct *kthread;
 
 	struct bpf_cpumap_val value;
 	struct bpf_prog *prog;
+	struct napi_struct napi;
 
-	struct completion kthread_running;
 	struct rcu_work free_work;
 };
 
@@ -77,12 +75,15 @@ struct bpf_cpu_map {
 	struct bpf_map map;
 	/* Below members specific for map type */
 	struct bpf_cpu_map_entry __rcu **cpu_map;
+	/* Dummy netdev to run threaded NAPI */
+	struct net_device *napi_dev;
 };
 
 static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 {
 	u32 value_size = attr->value_size;
 	struct bpf_cpu_map *cmap;
+	struct net_device *dev;
 
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 || attr->key_size != 4 ||
@@ -105,19 +106,34 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 	cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
 					   sizeof(struct bpf_cpu_map_entry *),
 					   cmap->map.numa_node);
-	if (!cmap->cpu_map) {
-		bpf_map_area_free(cmap);
-		return ERR_PTR(-ENOMEM);
-	}
+	if (!cmap->cpu_map)
+		goto free_cmap;
+
+	dev = bpf_map_area_alloc(struct_size(dev, priv, 0), NUMA_NO_NODE);
+	if (!dev)
+		goto free_cpu_map;
+
+	init_dummy_netdev(dev);
+	strscpy(dev->name, "cpumap");
+	dev->threaded = true;
+
+	cmap->napi_dev = dev;
 
 	return &cmap->map;
+
+free_cpu_map:
+	bpf_map_area_free(cmap->cpu_map);
+free_cmap:
+	bpf_map_area_free(cmap);
+
+	return ERR_PTR(-ENOMEM);
 }
 
 static void __cpu_map_ring_cleanup(struct ptr_ring *ring)
 {
 	/* The tear-down procedure should have made sure that queue is
 	 * empty.  See __cpu_map_entry_replace() and work-queue
-	 * invoked cpu_map_kthread_stop(). Catch any broken behaviour
+	 * invoked __cpu_map_entry_free(). Catch any broken behaviour
 	 * gracefully and warn once.
 	 */
 	void *ptr;
@@ -244,7 +260,6 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
 	if (!rcpu->prog)
 		return xdp_n;
 
-	rcu_read_lock_bh();
 	bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
 
 	nframes = cpu_map_bpf_prog_run_xdp(rcpu, frames, xdp_n, stats);
@@ -256,62 +271,45 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
 		cpu_map_bpf_prog_run_skb(rcpu, list, stats);
 
 	bpf_net_ctx_clear(bpf_net_ctx);
-	rcu_read_unlock_bh(); /* resched point, may call do_softirq() */
 
 	return nframes;
 }
 
-static int cpu_map_kthread_run(void *data)
+static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 {
-	struct bpf_cpu_map_entry *rcpu = data;
-	unsigned long last_qs = jiffies;
+	struct xdp_cpumap_stats stats = {}; /* zero stats */
+	u32 done = 0, kmem_alloc_drops = 0;
+	struct bpf_cpu_map_entry *rcpu;
 
-	complete(&rcpu->kthread_running);
-	set_current_state(TASK_INTERRUPTIBLE);
+	rcu_read_lock();
+	rcpu = container_of(napi, typeof(*rcpu), napi);
 
-	/* When kthread gives stop order, then rcpu have been disconnected
-	 * from map, thus no new packets can enter. Remaining in-flight
-	 * per CPU stored packets are flushed to this queue.  Wait honoring
-	 * kthread_stop signal until queue is empty.
-	 */
-	while (!kthread_should_stop() || !__ptr_ring_empty(rcpu->queue)) {
-		struct xdp_cpumap_stats stats = {}; /* zero stats */
-		unsigned int kmem_alloc_drops = 0, sched = 0;
+	while (likely(done < budget)) {
 		gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
 		int i, n, m, nframes, xdp_n;
 		void *frames[CPUMAP_BATCH];
+		struct sk_buff *skb, *tmp;
 		void *skbs[CPUMAP_BATCH];
 		LIST_HEAD(list);
 
-		/* Release CPU reschedule checks */
-		if (__ptr_ring_empty(rcpu->queue)) {
-			set_current_state(TASK_INTERRUPTIBLE);
-			/* Recheck to avoid lost wake-up */
-			if (__ptr_ring_empty(rcpu->queue)) {
-				schedule();
-				sched = 1;
-				last_qs = jiffies;
-			} else {
-				__set_current_state(TASK_RUNNING);
-			}
-		} else {
-			rcu_softirq_qs_periodic(last_qs);
-			sched = cond_resched();
-		}
+		if (__ptr_ring_empty(rcpu->queue))
+			break;
 
 		/*
 		 * The bpf_cpu_map_entry is single consumer, with this
-		 * kthread CPU pinned. Lockless access to ptr_ring
+		 * NAPI thread CPU pinned. Lockless access to ptr_ring
 		 * consume side valid as no-resize allowed of queue.
 		 */
-		n = __ptr_ring_consume_batched(rcpu->queue, frames,
-					       CPUMAP_BATCH);
+		n = min(budget - done, CPUMAP_BATCH);
+		n = __ptr_ring_consume_batched(rcpu->queue, frames, n);
+		done += n;
+
 		for (i = 0, xdp_n = 0; i < n; i++) {
 			void *f = frames[i];
 			struct page *page;
 
 			if (unlikely(__ptr_test_bit(0, &f))) {
-				struct sk_buff *skb = f;
+				skb = f;
 
 				__ptr_clear_bit(0, &skb);
 				list_add_tail(&skb->list, &list);
@@ -340,12 +338,10 @@ static int cpu_map_kthread_run(void *data)
 			}
 		}
 
-		local_bh_disable();
 		for (i = 0; i < nframes; i++) {
 			struct xdp_frame *xdpf = frames[i];
-			struct sk_buff *skb = skbs[i];
 
-			skb = __xdp_build_skb_from_frame(xdpf, skb,
+			skb = __xdp_build_skb_from_frame(xdpf, skbs[i],
 							 xdpf->dev_rx);
 			if (!skb) {
 				xdp_return_frame(xdpf);
@@ -354,17 +350,23 @@ static int cpu_map_kthread_run(void *data)
 
 			list_add_tail(&skb->list, &list);
 		}
-		netif_receive_skb_list(&list);
-
-		/* Feedback loop via tracepoint */
-		trace_xdp_cpumap_kthread(rcpu->map_id, n, kmem_alloc_drops,
-					 sched, &stats);
 
-		local_bh_enable(); /* resched point, may call do_softirq() */
+		list_for_each_entry_safe(skb, tmp, &list, list) {
+			skb_list_del_init(skb);
+			napi_gro_receive(napi, skb);
+		}
 	}
-	__set_current_state(TASK_RUNNING);
 
-	return 0;
+	rcu_read_unlock();
+
+	/* Feedback loop via tracepoint */
+	trace_xdp_cpumap_kthread(rcpu->map_id, done, kmem_alloc_drops, 0,
+				 &stats);
+
+	if (done < budget)
+		napi_complete_done(napi, done);
+
+	return done;
 }
 
 static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu,
@@ -394,6 +396,7 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value,
 {
 	int numa, err, i, fd = value->bpf_prog.fd;
 	gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
+	const struct bpf_cpu_map *cmap;
 	struct bpf_cpu_map_entry *rcpu;
 	struct xdp_bulk_queue *bq;
 
@@ -432,29 +435,13 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value,
 	if (fd > 0 && __cpu_map_load_bpf_program(rcpu, map, fd))
 		goto free_ptr_ring;
 
-	/* Setup kthread */
-	init_completion(&rcpu->kthread_running);
-	rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa,
-					       "cpumap/%d/map:%d", cpu,
-					       map->id);
-	if (IS_ERR(rcpu->kthread))
-		goto free_prog;
-
-	/* Make sure kthread runs on a single CPU */
-	kthread_bind(rcpu->kthread, cpu);
-	wake_up_process(rcpu->kthread);
-
-	/* Make sure kthread has been running, so kthread_stop() will not
-	 * stop the kthread prematurely and all pending frames or skbs
-	 * will be handled by the kthread before kthread_stop() returns.
-	 */
-	wait_for_completion(&rcpu->kthread_running);
+	cmap = container_of(map, typeof(*cmap), map);
+	netif_napi_add_percpu(cmap->napi_dev, &rcpu->napi, cpu_map_napi_poll,
+			      cpu);
+	napi_enable(&rcpu->napi);
 
 	return rcpu;
 
-free_prog:
-	if (rcpu->prog)
-		bpf_prog_put(rcpu->prog);
 free_ptr_ring:
 	ptr_ring_cleanup(rcpu->queue, NULL);
 free_queue:
@@ -477,11 +464,12 @@ static void __cpu_map_entry_free(struct work_struct *work)
 	 */
 	rcpu = container_of(to_rcu_work(work), struct bpf_cpu_map_entry, free_work);
 
-	/* kthread_stop will wake_up_process and wait for it to complete.
-	 * cpu_map_kthread_run() makes sure the pointer ring is empty
+	/* napi_disable() will wait for the NAPI poll to complete.
+	 * cpu_map_napi_poll() makes sure the pointer ring is empty
 	 * before exiting.
 	 */
-	kthread_stop(rcpu->kthread);
+	napi_disable(&rcpu->napi);
+	netif_napi_del(&rcpu->napi);
 
 	if (rcpu->prog)
 		bpf_prog_put(rcpu->prog);
@@ -498,8 +486,8 @@ static void __cpu_map_entry_free(struct work_struct *work)
  * __cpu_map_entry_free() in a separate workqueue after waiting for an RCU grace
  * period. This means that (a) all pending enqueue and flush operations have
  * completed (because of the RCU callback), and (b) we are in a workqueue
- * context where we can stop the kthread and wait for it to exit before freeing
- * everything.
+ * context where we can stop the NAPI thread and wait for it to exit before
+ * freeing everything.
  */
 static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
 				    u32 key_cpu, struct bpf_cpu_map_entry *rcpu)
@@ -579,9 +567,7 @@ static void cpu_map_free(struct bpf_map *map)
 	 */
 	synchronize_rcu();
 
-	/* The only possible user of bpf_cpu_map_entry is
-	 * cpu_map_kthread_run().
-	 */
+	/* The only possible user of bpf_cpu_map_entry is cpu_map_napi_poll() */
 	for (i = 0; i < cmap->map.max_entries; i++) {
 		struct bpf_cpu_map_entry *rcpu;
 
@@ -589,9 +575,10 @@ static void cpu_map_free(struct bpf_map *map)
 		if (!rcpu)
 			continue;
 
-		/* Stop kthread and cleanup entry directly */
+		/* Stop NAPI thread and cleanup entry directly */
 		__cpu_map_entry_free(&rcpu->free_work.work);
 	}
+	bpf_map_area_free(cmap->napi_dev);
 	bpf_map_area_free(cmap->cpu_map);
 	bpf_map_area_free(cmap);
 }
@@ -753,7 +740,7 @@ int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu,
 	if (ret < 0)
 		goto trace;
 
-	wake_up_process(rcpu->kthread);
+	napi_schedule(&rcpu->napi);
 trace:
 	trace_xdp_cpumap_enqueue(rcpu->map_id, !ret, !!ret, rcpu->cpu);
 	return ret;
@@ -765,8 +752,6 @@ void __cpu_map_flush(struct list_head *flush_list)
 
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node) {
 		bq_flush_to_queue(bq);
-
-		/* If already running, costs spin_lock_irqsave + smb_mb */
-		wake_up_process(bq->obj->kthread);
+		napi_schedule(&bq->obj->napi);
 	}
 }

From patchwork Fri Aug 30 16:25:04 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785329
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E83651BBBD3;
	Fri, 30 Aug 2024 16:25:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035158; cv=none;
 b=IzRodn6fB9SIZpqEJe+hUqtKA7g2Su1qOZWGi9YRtk0gcrZ1buF9/eHPOYY6l9mZ155MkVwygOGQwq/Jj0K4lVVWUwkR/TSpPe2f8L66Wv/rtq8iMlC34Ocj5iXn91VhRNOdWd9l47HzzUigHU/K9sQrsOnXwrL03P15VUbwtZk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035158; c=relaxed/simple;
	bh=l89IadWlEi0jEr4luW02lPosgA0IRD2ZprKXd+x2hRg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=NaAVifKfp/qQ4byy5VA2rxfmuOOH77UYzTZBXjRhN9YUu+jeOOpW34caGPuhSUnuhx0HEUx6jxEB7XTqHUVSUVDeqk1hEuU18SRque5QTyVbY/TxhzVDLjeaL6qLyZEeVfT45mGwLclczGkXeVG9+4L/vTkqEzxFmrr5ijlOol0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=VXdU1Yrk; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="VXdU1Yrk"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035158; x=1756571158;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=l89IadWlEi0jEr4luW02lPosgA0IRD2ZprKXd+x2hRg=;
  b=VXdU1YrkxGsHnlCyW6q1uHQkIq9Vxj+/mZ+4wzaNSFz0d5fG4mR6oOwd
   Bw3iZBgtMZF9Fz2Mv+xoBTdu5eDXRFcq2+Hj1OqPTQlVR3/iNUq+E0P8c
   Hf/L/5Hr0FhPwlJgTV59RFYEVbi1+3iU+hEAKFDhDQWOMQJO+rs9qv/MP
   ZTxXGSWvat7gD/n5RNir689pd0gm/zSjfUb3nGrlnwjUV3DwDowppFjlN
   FVxKv9xXEo5x/ac8+vKcSuRaNamxH4yfjw/rww5Lnt1nV+u7+L/rlnal6
   BX0p9tVBfKeDQEL8Pw6JbD3OZjizB3hNrctdMpVmM/MAltyHJ1/82ARBF
   w==;
X-CSE-ConnectionGUID: P0ekgLvOS+2aHxou7W5o5A==
X-CSE-MsgGUID: a1he498xSOCcy+xfF98gIw==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068940"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49068940"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:25:57 -0700
X-CSE-ConnectionGUID: nue82NPeRw6GsgiCud4Ejw==
X-CSE-MsgGUID: T8l2eGibQiintE5lAe2HuQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996455"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:53 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 5/9] bpf: cpumap: reuse skb array instead of a linked
 list to chain skbs
Date: Fri, 30 Aug 2024 18:25:04 +0200
Message-ID: <20240830162508.1009458-6-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

cpumap still uses linked lists to store a list of skbs to pass to the
stack. Now that we don't use listified Rx in favor of
napi_gro_receive(), linked list is now an unneeded overhead.
Inside the polling loop, we already have an array of skbs. Let's reuse
it for skbs passed to cpumap (generic XDP) and use napi_gro_receive()
directly in case of XDP_PASS when a program is installed to the map
itself. Don't list regular xdp_frames at all and just call
napi_gro_receive() directly as well right after building an skb.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 kernel/bpf/cpumap.c | 55 +++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index d1cfa4111727..d7206f3f6e80 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -150,21 +150,23 @@ static void __cpu_map_ring_cleanup(struct ptr_ring *ring)
 }
 
 static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu,
-				     struct list_head *listp,
+				     void **skbs, u32 skb_n,
 				     struct xdp_cpumap_stats *stats)
 {
-	struct sk_buff *skb, *tmp;
 	struct xdp_buff xdp;
 	u32 act;
 	int err;
 
-	list_for_each_entry_safe(skb, tmp, listp, list) {
+	for (u32 i = 0; i < skb_n; i++) {
+		struct sk_buff *skb = skbs[i];
+
 		act = bpf_prog_run_generic_xdp(skb, &xdp, rcpu->prog);
 		switch (act) {
 		case XDP_PASS:
+			napi_gro_receive(&rcpu->napi, skb);
+			stats->pass++;
 			break;
 		case XDP_REDIRECT:
-			skb_list_del_init(skb);
 			err = xdp_do_generic_redirect(skb->dev, skb, &xdp,
 						      rcpu->prog);
 			if (unlikely(err)) {
@@ -181,8 +183,7 @@ static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu,
 			trace_xdp_exception(skb->dev, rcpu->prog, act);
 			fallthrough;
 		case XDP_DROP:
-			skb_list_del_init(skb);
-			kfree_skb(skb);
+			napi_consume_skb(skb, true);
 			stats->drop++;
 			return;
 		}
@@ -251,8 +252,8 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
 #define CPUMAP_BATCH 8
 
 static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
-				int xdp_n, struct xdp_cpumap_stats *stats,
-				struct list_head *list)
+				int xdp_n, void **skbs, u32 skb_n,
+				struct xdp_cpumap_stats *stats)
 {
 	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	int nframes;
@@ -267,8 +268,8 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
 	if (stats->redirect)
 		xdp_do_flush();
 
-	if (unlikely(!list_empty(list)))
-		cpu_map_bpf_prog_run_skb(rcpu, list, stats);
+	if (unlikely(skb_n))
+		cpu_map_bpf_prog_run_skb(rcpu, skbs, skb_n, stats);
 
 	bpf_net_ctx_clear(bpf_net_ctx);
 
@@ -288,9 +289,7 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 		gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
 		int i, n, m, nframes, xdp_n;
 		void *frames[CPUMAP_BATCH];
-		struct sk_buff *skb, *tmp;
 		void *skbs[CPUMAP_BATCH];
-		LIST_HEAD(list);
 
 		if (__ptr_ring_empty(rcpu->queue))
 			break;
@@ -304,15 +303,15 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 		n = __ptr_ring_consume_batched(rcpu->queue, frames, n);
 		done += n;
 
-		for (i = 0, xdp_n = 0; i < n; i++) {
+		for (i = 0, xdp_n = 0, m = 0; i < n; i++) {
 			void *f = frames[i];
 			struct page *page;
 
 			if (unlikely(__ptr_test_bit(0, &f))) {
-				skb = f;
+				struct sk_buff *skb = f;
 
 				__ptr_clear_bit(0, &skb);
-				list_add_tail(&skb->list, &list);
+				skbs[m++] = skb;
 				continue;
 			}
 
@@ -327,19 +326,22 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 		}
 
 		/* Support running another XDP prog on this CPU */
-		nframes = cpu_map_bpf_prog_run(rcpu, frames, xdp_n, &stats, &list);
-		if (nframes) {
-			m = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
-						  gfp, nframes, skbs);
-			if (unlikely(m == 0)) {
-				for (i = 0; i < nframes; i++)
-					skbs[i] = NULL; /* effect: xdp_return_frame */
-				kmem_alloc_drops += nframes;
-			}
+		nframes = cpu_map_bpf_prog_run(rcpu, frames, xdp_n, skbs, m,
+					       &stats);
+		if (!nframes)
+			continue;
+
+		m = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp,
+					  nframes, skbs);
+		if (unlikely(!m)) {
+			for (i = 0; i < nframes; i++)
+				skbs[i] = NULL; /* effect: xdp_return_frame */
+			kmem_alloc_drops += nframes;
 		}
 
 		for (i = 0; i < nframes; i++) {
 			struct xdp_frame *xdpf = frames[i];
+			struct sk_buff *skb;
 
 			skb = __xdp_build_skb_from_frame(xdpf, skbs[i],
 							 xdpf->dev_rx);
@@ -348,11 +350,6 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 				continue;
 			}
 
-			list_add_tail(&skb->list, &list);
-		}
-
-		list_for_each_entry_safe(skb, tmp, &list, list) {
-			skb_list_del_init(skb);
 			napi_gro_receive(napi, skb);
 		}
 	}

From patchwork Fri Aug 30 16:25:05 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785330
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7E2E1BC08B;
	Fri, 30 Aug 2024 16:26:00 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035162; cv=none;
 b=KEK6Btr0b/mxhXHiUIwEhpnZRuMQeigginrhn4kQhCn+mJSfpPqtR3zzs18j1RXh6KtC7Pby0Jstkqh2QmCAoitIYOvHD7FIG1RHcKNPsKas4ivTDTuQjHX4fAfw72jbEm/+yNScUX3h7wmYMQCA7i3gCcTg2NYYNdDZGJ8T4gw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035162; c=relaxed/simple;
	bh=P2rqqspjoJ+gwn37T9gSKXeAKPP9RlbcPxrJ572nRgs=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=HoDDkrNRUaYkX6eKX9cpC7eTPmYpjZLQB+7z4zs9PBmH8o5KS+FSHQ/SL1xOpmYfDEyQw72C/rFJUoj3MyHnhKJ/4q4Qm1uqTY5P52/+OWXVXog6I3QeByuDvXihTc0CimQa49cJCfkYZmYaVxmNOkaerp9LwoOWYDx0O34zqlE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=D3foGkK9; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="D3foGkK9"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035162; x=1756571162;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=P2rqqspjoJ+gwn37T9gSKXeAKPP9RlbcPxrJ572nRgs=;
  b=D3foGkK99SH8Sj+nQ5oRvvAgLpVdqzF2qgmb6Qawpku/K0rlwPajXKSX
   V9P3cwV4CGGCWh2wZ5ypcFlA7qob3chCQ0jTidDqxb4N/nz2j9Xmx/KU6
   MBjhaJh/NAXu1v6N/244YJ8s+Rtw5LBiUHn9/+rCRXSqIDzSWcGtocoHx
   COjAJR8QHzYk+ErqXgmx1AvWerekMYygUZiSh5Ofe58FHVQFGFYjQDUv6
   ZxzlBqdUgQ/huXKK9+7ZKXV0Cv7dQZzbqq6/ykMHDxmr9bnzgxklP+NpK
   uDlYsLtZiaAYGaz8oy5m3HzT9NszGpukxxnfyKufGfXiymkYmL2M52NyW
   g==;
X-CSE-ConnectionGUID: ziUKf/s+Q2ey+FLCLhfDEw==
X-CSE-MsgGUID: X/ZcWqufSlG97CsMg5iWfA==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068955"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49068955"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:26:01 -0700
X-CSE-ConnectionGUID: ksPEpP6JQWeh19P0W3ROGw==
X-CSE-MsgGUID: Gq3ljsB1SYa0zjGC//t7aQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996468"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:57 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 6/9] net: skbuff: introduce napi_skb_cache_get_bulk()
Date: Fri, 30 Aug 2024 18:25:05 +0200
Message-ID: <20240830162508.1009458-7-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

Add a function to get an array of skbs from the NAPI percpu cache.
It's supposed to be a drop-in replacement for
kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and
xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the
requirement to call it only from the BH) is that it tries to use
as many NAPI cache entries for skbs as possible, and allocate new
ones only if needed.

The logic is as follows:

* there is enough skbs in the cache: decache them and return to the
  caller;
* not enough: try refilling the cache first. If there is now enough
  skbs, return;
* still not enough: try allocating skbs directly to the output array
  with %GFP_ZERO, maybe we'll be able to get some. If there's now
  enough, return;
* still not enough: return as many as we were able to obtain.

Most of times, if called from the NAPI polling loop, the first one will
be true, sometimes (rarely) the second one. The third and the fourth --
only under heavy memory pressure.
It can save significant amounts of CPU cycles if there are GRO cycles
and/or Tx completion cycles (anything that descends to
napi_skb_cache_put()) happening on this CPU.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/skbuff.h |  1 +
 net/core/skbuff.c      | 62 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index cf8f6ce06742..2bc3ca79bc6e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1304,6 +1304,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
 				 void *data, unsigned int frag_size);
 void skb_attempt_defer_free(struct sk_buff *skb);
 
+u32 napi_skb_cache_get_bulk(void **skbs, u32 n);
 struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
 struct sk_buff *slab_build_skb(void *data);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a52638363ea5..0a34f3aa00d1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -366,6 +366,68 @@ static struct sk_buff *napi_skb_cache_get(void)
 	return skb;
 }
 
+/**
+ * napi_skb_cache_get_bulk - obtain a number of zeroed skb heads from the cache
+ * @skbs: pointer to an at least @n-sized array to fill with skb pointers
+ * @n: number of entries to provide
+ *
+ * Tries to obtain @n &sk_buff entries from the NAPI percpu cache and writes
+ * the pointers into the provided array @skbs. If there are less entries
+ * available, tries to replenish the cache and bulk-allocates the diff from
+ * the MM layer if needed.
+ * The heads are being zeroed with either memset() or %__GFP_ZERO, so they are
+ * ready for {,__}build_skb_around() and don't have any data buffers attached.
+ * Must be called *only* from the BH context.
+ *
+ * Return: number of successfully allocated skbs (@n if no actual allocation
+ *	   needed or kmem_cache_alloc_bulk() didn't fail).
+ */
+u32 napi_skb_cache_get_bulk(void **skbs, u32 n)
+{
+	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	u32 bulk, total = n;
+
+	local_lock_nested_bh(&napi_alloc_cache.bh_lock);
+
+	if (nc->skb_count >= n)
+		goto get;
+
+	/* No enough cached skbs. Try refilling the cache first */
+	bulk = min(NAPI_SKB_CACHE_SIZE - nc->skb_count, NAPI_SKB_CACHE_BULK);
+	nc->skb_count += kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
+					       GFP_ATOMIC | __GFP_NOWARN, bulk,
+					       &nc->skb_cache[nc->skb_count]);
+	if (likely(nc->skb_count >= n))
+		goto get;
+
+	/* Still not enough. Bulk-allocate the missing part directly, zeroed */
+	n -= kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
+				   GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN,
+				   n - nc->skb_count, &skbs[nc->skb_count]);
+	if (likely(nc->skb_count >= n))
+		goto get;
+
+	/* kmem_cache didn't allocate the number we need, limit the output */
+	total -= n - nc->skb_count;
+	n = nc->skb_count;
+
+get:
+	for (u32 base = nc->skb_count - n, i = 0; i < n; i++) {
+		u32 cache_size = kmem_cache_size(net_hotdata.skbuff_cache);
+
+		skbs[i] = nc->skb_cache[base + i];
+
+		kasan_mempool_unpoison_object(skbs[i], cache_size);
+		memset(skbs[i], 0, offsetof(struct sk_buff, tail));
+	}
+
+	nc->skb_count -= n;
+	local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
+
+	return total;
+}
+EXPORT_SYMBOL_GPL(napi_skb_cache_get_bulk);
+
 static inline void __finalize_skb_around(struct sk_buff *skb, void *data,
 					 unsigned int size)
 {

From patchwork Fri Aug 30 16:25:06 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785331
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E14C1BD00B;
	Fri, 30 Aug 2024 16:26:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035166; cv=none;
 b=ly1frr9g5tY/+0HiS8W/LWYNv2UKR91/mSv51GtK7VsjjAppxKTvJkpR3zcspBlxSEFDkil/d/RrqvkMcGiVH6I3HeYTAyRM6GjlYC+dv+f75Gb/6p4spCoXoEsEQX4jv1eRt58WLe0miRJdafsW/stCHZ16/kOiGQbRPIEh4cs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035166; c=relaxed/simple;
	bh=j2+AZk0CsXXjVOjOXZCdGaih+ir0sDyytVAQE5WUi20=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Y/TaTtQb8WOrHXrEIxjaCQkXmeMDb3rxlIOJf+3mpqFV3SVh49ARLE1HdkmT7Z0Ymd3gS8OokLkgzFPprGwJv+ymd25XjxhBTeEKD00pdx6PKnqRcZzDsGEJS+o44DUXRT+R7fZFDbaM+PP3QjdVmIYsTXfrcX8oki7+8n5NsZA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=RerfOeeh; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="RerfOeeh"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035165; x=1756571165;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=j2+AZk0CsXXjVOjOXZCdGaih+ir0sDyytVAQE5WUi20=;
  b=RerfOeehav4Al0LLZaTEyznoItXQ4wEQnWfWNh+gjDmLlhDzjAktgN+K
   7aAJoh6g5gckiuuw0TwAJyhFE5ffGc1j0HtFCQtjEXjdMnVudjhuMb/B2
   JkrZWlCHfjMG+c2IcCRUmR3WxycT4PEsuObYZbb029Ct5uZGhEBWz06ZM
   48qwcrq/Od+CKkI4OY088QS/feVyD6tnQB4BptiadX17VppB5W01nrttG
   dpthgYBrVczTeppYZYxszo/7Jw0csKwaoyR1t8d1aoROLsqjP9YOXWWEO
   Ax1Scj2azIeaJUfk/b0FNOdJC3nrVETG+TlnmvIYwOga2Li2m3ZMMirVo
   A==;
X-CSE-ConnectionGUID: cPk3HRwuSKODHnPYxNwmmw==
X-CSE-MsgGUID: LDdpWjg5SwScA30/9jfAfw==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068975"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49068975"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:26:05 -0700
X-CSE-ConnectionGUID: HDToTPshTM65WgoCtd7RVw==
X-CSE-MsgGUID: if0gYkwKR8uHL2UIvzn1Jg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996487"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:26:01 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 7/9] bpf: cpumap: switch to napi_skb_cache_get_bulk()
Date: Fri, 30 Aug 2024 18:25:06 +0200
Message-ID: <20240830162508.1009458-8-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

Now that cpumap uses GRO, which drops unused skb heads to the NAPI
cache, use napi_skb_cache_get_bulk() to try to reuse cached entries
and lower the MM layer pressure. The polling loop already happens in
the BH context, so the switch is safe from that perspective.
The better GRO aggregates packets, the less new skbs will be allocated.
If an aggregated skb contains 16 frags, this means 15 skbs were returned
to the cache, so next 15 skbs will be built without allocating anything.

The same trafficgen UDP GRO test now shows:

                GRO off   GRO on
threaded GRO    2.3       4         Mpps
thr bulk GRO    2.4       4.7       Mpps

diff            +4        +17       %

Comparing to the baseline cpumap:

baseline        2.7       N/A       Mpps
thr bulk GRO    2.4       4.7       Mpps
diff            -11       +74       %

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 kernel/bpf/cpumap.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index d7206f3f6e80..992f4e30a589 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -286,7 +286,6 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 	rcpu = container_of(napi, typeof(*rcpu), napi);
 
 	while (likely(done < budget)) {
-		gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
 		int i, n, m, nframes, xdp_n;
 		void *frames[CPUMAP_BATCH];
 		void *skbs[CPUMAP_BATCH];
@@ -331,8 +330,7 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget)
 		if (!nframes)
 			continue;
 
-		m = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp,
-					  nframes, skbs);
+		m = napi_skb_cache_get_bulk(skbs, nframes);
 		if (unlikely(!m)) {
 			for (i = 0; i < nframes; i++)
 				skbs[i] = NULL; /* effect: xdp_return_frame */

From patchwork Fri Aug 30 16:25:07 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785332
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 34BE91BD515;
	Fri, 30 Aug 2024 16:26:08 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035169; cv=none;
 b=HLv43Nd3Xm6OEh63ePzMCAstjENMfEWLZZeZzjLYnujj7sc0bEre4/Q4eBRvYkEck+Ncs0EcXveik7KI3V0dKwx0x9QOAkJ381rVhhFSobQ2hkwkR8W+AZyhb6/l9kGGPSdZkewvjYVmIvrndRQlr1Kn13Hk0dEM3oSFy2qulcU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035169; c=relaxed/simple;
	bh=2E5MS1Ol4fcBhQi1/pEkQ0LUv2e4MXiN4O2hZZN3thk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=sRS6J88ZUWkXNoQytZLc91acRBOf6rDMV3IrPZRXiLtOnAL/l6070pIlwkVSzgXvI9X7lBerdhScZQbf3DsXkyd5TB8FSMBlgASpClpUwy3aNT9Ut9lpIC3xjefKIeOJd91FYBgUnsbICLYGNmpT2Yq98o4tYTkmRaJkJNwz19Q=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=JjS+FLTA; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="JjS+FLTA"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035169; x=1756571169;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=2E5MS1Ol4fcBhQi1/pEkQ0LUv2e4MXiN4O2hZZN3thk=;
  b=JjS+FLTA3ojfk9CyR54zOWGEqSBiD/oAXcXcrQayx9veu/BdKoOVC3Yr
   H0V+fd8rJB7oeVJ9ZPwItWO5SW4Zb5nTBtizDgmzApcBwawXkASYAyKfY
   zk4aZUDFNq2Nwuot3c+7RKH4Y3uIbifLA479ll8LLdJ+COiQHhXHrt9Cs
   7EOCTwzBOSROaevfKz6rEoStHbo8lGD9bh8PpiZZjSLWU5nUvgfMtp48V
   b0w4Tw/wEsBb2OUGHwJGxhimHuAWGPuCm69fPkiKFj47G5wNFVlJnxway
   A5NeA2Fkyyto2wmN8bm3JlkN8lslkC1bPIDfALswIYJh7Ly78viJgmCye
   Q==;
X-CSE-ConnectionGUID: wY+YDYe1QPakfxwLIOUMtA==
X-CSE-MsgGUID: TKdNvpd9TNarBS5nS0onlw==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068998"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49068998"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:26:09 -0700
X-CSE-ConnectionGUID: rLMPYcbcTUKfkmrcVr/nnw==
X-CSE-MsgGUID: LIdNwYpzSpinkdl3soQIyg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996501"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:26:04 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 8/9] veth: use napi_skb_cache_get_bulk() instead of
 xdp_alloc_skb_bulk()
Date: Fri, 30 Aug 2024 18:25:07 +0200
Message-ID: <20240830162508.1009458-9-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

Now that we can bulk-allocate skbs from the NAPI cache, use that
function to do that in veth as well instead of direct allocation from
the kmem caches. veth already uses NAPI for Rx processing, so this is
safe from the context perspective. veth also uses GRO, so using NAPI
caches makes real difference here.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/veth.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 18148e068aa0..774f226666c8 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -684,8 +684,7 @@ static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
 	void *skbs[VETH_XDP_BATCH];
 	int i;
 
-	if (xdp_alloc_skb_bulk(skbs, n_xdpf,
-			       GFP_ATOMIC | __GFP_ZERO) < 0) {
+	if (unlikely(!napi_skb_cache_get_bulk(skbs, n_xdpf))) {
 		for (i = 0; i < n_xdpf; i++)
 			xdp_return_frame(frames[i]);
 		stats->rx_drops += n_xdpf;

From patchwork Fri Aug 30 16:25:08 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alexander Lobakin <aleksander.lobakin@intel.com>
X-Patchwork-Id: 13785333
X-Patchwork-Delegate: bpf@iogearbox.net
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 080251B9B2E;
	Fri, 30 Aug 2024 16:26:11 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.7
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1725035173; cv=none;
 b=mcGsbykYF5miimUbUm2PhFQkI9msDx1UIPjLq3gRa6vta9TmJF1WBeMSXgVH/t69Z07VU4JZqiGXkew/KiK/X5j7qStVdzxexxRUMpbr53S4Ssv6EhOpdhJxCAVFgN5aWgRDTzqo6SdhtEuBlmvzIjv9KSmyCZWiQkslPJRQMdQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1725035173; c=relaxed/simple;
	bh=/VslWGAIDy5p/4CWQoCT3JSpikohdk4tCvhbyE0xCgQ=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=YLLEK63EWs/YEDgnkgZFkD/DY/yGiOWa3yhEAibZgr/XHHEwVzIaoy5Cp/SvzyxeLJr5kCJQ9F3Pq98lHPN4XH1nLIGN4vqjcTkyP/S0HsLfufH3gUgaXyuMrqOeNlAC+Ch74deddMw7oQnfbjEwZvKf+agj3O7mVG5mPxUiwwY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=SybPVICr; arc=none smtp.client-ip=192.198.163.7
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="SybPVICr"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1725035173; x=1756571173;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=/VslWGAIDy5p/4CWQoCT3JSpikohdk4tCvhbyE0xCgQ=;
  b=SybPVICrZUBjLieyFAfRtUMGmaFDj+s9zvRwKaYJ43JS1rFvr7fhjftM
   mzOGEw1q1ZNkD5SR8b0Wh6gP43vb5SscS8soL+nSm+CcG3Kp04AcefBQ5
   6yut8rHDHtzbRMNjjHvMNoO0smqtS9XF5abtO9z08GkQ5lfgm8OC9emXb
   rirF3E7ndfK/BtBTiIyFuK8EinGgM63Su+fblSZ5RkELBlqiMMzBxMDKK
   F0v8p9pN0w4s/8jcvcukAJh46NktWY5chN3sO0RUkhw6HH1jtpbP2oRFZ
   X+kNRIoC7LQeniHsjPqf9Jct3yrj6AzcmNAqeASIllaoPy1EYQd0R3Fkj
   g==;
X-CSE-ConnectionGUID: EcDHOY1fRby5UZfeSQG8JQ==
X-CSE-MsgGUID: umX4z53uTy+RfnrRSyCD6w==
X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49069021"
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="49069021"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Aug 2024 09:26:12 -0700
X-CSE-ConnectionGUID: DlCb8T8HRxWFJcV8eb1G9g==
X-CSE-MsgGUID: NzvQYEt2SN2w+Owk9Sjh7w==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.10,189,1719903600";
   d="scan'208";a="63996508"
Received: from newjersey.igk.intel.com ([10.102.20.203])
  by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:26:08 -0700
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Daniel Xu <dxu@dxuuu.xyz>,
	John Fastabend <john.fastabend@gmail.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next 9/9] xdp: remove xdp_alloc_skb_bulk()
Date: Fri, 30 Aug 2024 18:25:08 +0200
Message-ID: <20240830162508.1009458-10-aleksander.lobakin@intel.com>
X-Mailer: git-send-email 2.46.0
In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
References: <20240830162508.1009458-1-aleksander.lobakin@intel.com>
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-Patchwork-Delegate: bpf@iogearbox.net

The only user was veth, which now uses napi_skb_cache_get_bulk().
It's now preferred over a direct allocation and is exported as
well, so remove this one.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/xdp.h |  1 -
 net/core/xdp.c    | 10 ----------
 2 files changed, 11 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index e6770dd40c91..bd3363e384b2 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -245,7 +245,6 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct net_device *dev);
 struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					 struct net_device *dev);
-int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp);
 struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf);
 
 static inline
diff --git a/net/core/xdp.c b/net/core/xdp.c
index bcc5551c6424..34d057089d20 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -584,16 +584,6 @@ void xdp_warn(const char *msg, const char *func, const int line)
 };
 EXPORT_SYMBOL_GPL(xdp_warn);
 
-int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
-{
-	n_skb = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp, n_skb, skbs);
-	if (unlikely(!n_skb))
-		return -ENOMEM;
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);
-
 struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev)