From patchwork Fri Aug 30 16:25:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785325 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BA081B81B4; Fri, 30 Aug 2024 16:25:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035143; cv=none; b=dD0LpuFrnp22dzlcM8r/egJfrNkuEzmx8BpK2npmdXcowcWux4rBHFLnXNQLbu/lFra/VhjLf75MpJrFPtWeKUe56pRjY+tWxoW4E+UuKpSefhuq+nV5f69BH6S7xEKh74iuLaWLma5LuJ6qzkU1g/ZrnaPGpyJLPesQ7NYwT2Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035143; c=relaxed/simple; bh=jcA9U7OKabMeHIJY0Zn6UT9mfWJf+XLAQpUH7kwaMnQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KzRJKr18Zk6dC0awbnAxPuEqPfmANSQOv5Q0OhJE6awW+9ubgUodGM9zI7DfmYz+HBuab/CL48e65bZZitDgVWE2DpJA6IlqNtaTpCoY4g4eAl+aueJKTlCLGY8cRNTZA/isMI0spXS+LBySPhDE4hqIBoPBDqJVeon29+xbqIs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XY/HOHjs; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XY/HOHjs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035142; x=1756571142; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jcA9U7OKabMeHIJY0Zn6UT9mfWJf+XLAQpUH7kwaMnQ=; b=XY/HOHjsTpzmaCDNU6SXfh20x75ZZZ2BzExfyIO/OGCTUMZCqufF6tSS rPs4WdeJritBxX6StDFKOMWr9OHHbfpnmDQ/dkZv71l6A3pa58UrxWesi 0VShzdPvNUkE7jxbhpWnsMkKZlz4yHgDivDA6ro0pd2/7NGkBJ1c+9RMa nMWrtzYcsdlV7oIEBkc4jhqU4fSK5npy53RZeepzful+pCjZgeDqo0iIz RJtdNgYtL+MoQlxqt6Mhf3TphI4fSN9GvIi0fQbFh+ZW/BfSCwCJ9W1jt 2XxWxQ8EwSoocZO915jxfQOAmdiFy1YVBrac5gI4qn94CUZ0ZuEq9KtEl Q==; X-CSE-ConnectionGUID: 755w5N9cTcGB68mnNmqDwg== X-CSE-MsgGUID: 5QJE7LQ7TnSzZbkMFJ4m3w== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068878" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49068878" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:25:42 -0700 X-CSE-ConnectionGUID: MWh4kfuJS42GLauEUYyTvw== X-CSE-MsgGUID: UDm8HCqnSBaAwilaLVSrAQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996434" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:38 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 1/9] firmware/psci: fix missing '%u' format literal in kthread_create_on_cpu() Date: Fri, 30 Aug 2024 18:25:00 +0200 Message-ID: <20240830162508.1009458-2-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net kthread_create_on_cpu() always requires format string to contain one '%u' at the end, as it automatically adds the CPU ID when passing it to kthread_create_on_node(). The former doesn't marked as __printf() as it's not printf-like itself, which effectively hides this from the compiler. If you convert this function to printf-like, you'll see the following: In file included from drivers/firmware/psci/psci_checker.c:15: drivers/firmware/psci/psci_checker.c: In function 'suspend_tests': drivers/firmware/psci/psci_checker.c:401:48: warning: too many arguments for format [-Wformat-extra-args] 401 | "psci_suspend_test"); | ^~~~~~~~~~~~~~~~~~~ drivers/firmware/psci/psci_checker.c:400:32: warning: data argument not used by format string [-Wformat-extra-args] 400 | (void *)(long)cpu, cpu, | ^ 401 | "psci_suspend_test"); | ~~~~~~~~~~~~~~~~~~~ Add the missing format literal to fix this. Now the corresponding kthread will be named as "psci_suspend_test-", as it's meant by kthread_create_on_cpu(). Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202408141012.KhvKaxoh-lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202408141243.eQiEOQQe-lkp@intel.com Fixes: ea8b1c4a6019 ("drivers: psci: PSCI checker module") Cc: stable@vger.kernel.org # 4.10+ Signed-off-by: Alexander Lobakin Acked-by: Daniel Xu --- drivers/firmware/psci/psci_checker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firmware/psci/psci_checker.c b/drivers/firmware/psci/psci_checker.c index 116eb465cdb4..ecc511c745ce 100644 --- a/drivers/firmware/psci/psci_checker.c +++ b/drivers/firmware/psci/psci_checker.c @@ -398,7 +398,7 @@ static int suspend_tests(void) thread = kthread_create_on_cpu(suspend_test_thread, (void *)(long)cpu, cpu, - "psci_suspend_test"); + "psci_suspend_test-%u"); if (IS_ERR(thread)) pr_err("Failed to create kthread on CPU %d\n", cpu); else From patchwork Fri Aug 30 16:25:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785326 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EAC61BA265; Fri, 30 Aug 2024 16:25:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035147; cv=none; b=AwUF0mm1gmoi4MYr3ML6oG/ot+mSs/j8ZYSs2qP8SxvoXzEuSKAoPRDHc1UlQ35wCel/UpSB/0hQeaWeNXGsQBCDVy6VJ/+D0LKIsoCZBRhlDihGYf3FoNlkvtsZBNgIjImTT/20UryRQWGEogr+dEyydFoQzDxioHrmjLq5Ygg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035147; c=relaxed/simple; bh=K6sCx4+usVNH07IrYSUEk9l81+65yfH1aelgJ1+bzjc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VDZZSk4Pxa8wqO4H5fwb1SSPt45iZYyXTFIVzsXlg0z6AJkQMUeX+vO6HLpFacZWfF1TnOzzmr1JAYdjmXPeaJ9FyR9Cw0IsoBYUcpWDf4VxU8+VcM3WBdMJoR4HIEcDkgO5qIzidls94JfnqGx+fuc6Prl35WQ0lonDwwJUKvw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Lvbk0Yyz; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Lvbk0Yyz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035146; x=1756571146; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K6sCx4+usVNH07IrYSUEk9l81+65yfH1aelgJ1+bzjc=; b=Lvbk0Yyzflsfue/RIbcAwnBp3VAPYBtUiVRmd5fAKdWzvr7gbhl8TmqD 0lVj3+HdTUltyyTCbZIh7hefZnZcvYpkBvPxZaY489zurlMPBDX2/cmni xp1w3XskK8y5Mn1nrIZSeSutSflcQLniS4Y5z9Bfs1XnsaKCbEX+FzuSs oII0zzJaxxyoBfAG83r4r7fmgg0pet3jwESNI3h5AgcdcDCWGTtevdn08 8kRJ5s60KC6Y/QZU4EPTMFKLFy0NOI62cLLVye51UfhPurqGgW3PNd7V6 6UetfdNZOgvawl/3npRqkLASqpnUPPXJxTU/m9UHFkYQYvwUi8MNZA6bl A==; X-CSE-ConnectionGUID: 54409XAlRs2L5VnPLAl31Q== X-CSE-MsgGUID: I9wmsyu1TpqJhdIBNQAqzQ== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068891" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49068891" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:25:46 -0700 X-CSE-ConnectionGUID: e9oAoBBmTCSlvQprHrdzog== X-CSE-MsgGUID: hJd6l5JVQHinTUNGW0r9YQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996442" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:41 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 2/9] kthread: allow vararg kthread_{create,run}_on_cpu() Date: Fri, 30 Aug 2024 18:25:01 +0200 Message-ID: <20240830162508.1009458-3-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Currently, kthread_{create,run}_on_cpu() doesn't support varargs like kthread_create{,_on_node}() do, which makes them less convenient to use. Convert them to take varargs as the last argument. The only difference is that they always append the CPU ID at the end and require the format string to have an excess '%u' at the end due to that. That's still true; meanwhile, the compiler will correctly point out to that if missing. One more nice side effect is that you can now use the underscored __kthread_create_on_cpu() if you want to override that rule and not have CPU ID at the end of the name. The current callers are not anyhow affected. Signed-off-by: Alexander Lobakin --- include/linux/kthread.h | 51 ++++++++++++++++++++++++++--------------- kernel/kthread.c | 22 ++++++++++-------- 2 files changed, 45 insertions(+), 28 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index b11f53c1ba2e..27a94e691948 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -27,11 +27,21 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), #define kthread_create(threadfn, data, namefmt, arg...) \ kthread_create_on_node(threadfn, data, NUMA_NO_NODE, namefmt, ##arg) - -struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), - void *data, - unsigned int cpu, - const char *namefmt); +__printf(4, 5) +struct task_struct *__kthread_create_on_cpu(int (*threadfn)(void *data), + void *data, unsigned int cpu, + const char *namefmt, ...); + +#define kthread_create_on_cpu(threadfn, data, cpu, namefmt, ...) \ + _kthread_create_on_cpu(threadfn, data, cpu, __UNIQUE_ID(cpu_), \ + namefmt, ##__VA_ARGS__) + +#define _kthread_create_on_cpu(threadfn, data, cpu, uc, namefmt, ...) ({ \ + u32 uc = (cpu); \ + \ + __kthread_create_on_cpu(threadfn, data, uc, namefmt, \ + ##__VA_ARGS__, uc); \ +}) void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk); bool set_kthread_struct(struct task_struct *p); @@ -62,25 +72,28 @@ bool kthread_is_per_cpu(struct task_struct *k); * @threadfn: the function to run until signal_pending(current). * @data: data ptr for @threadfn. * @cpu: The cpu on which the thread should be bound, - * @namefmt: printf-style name for the thread. Format is restricted - * to "name.*%u". Code fills in cpu number. + * @namefmt: printf-style name for the thread. Must have an excess '%u' + * at the end as kthread_create_on_cpu() fills in CPU number. * * Description: Convenient wrapper for kthread_create_on_cpu() * followed by wake_up_process(). Returns the kthread or * ERR_PTR(-ENOMEM). */ -static inline struct task_struct * -kthread_run_on_cpu(int (*threadfn)(void *data), void *data, - unsigned int cpu, const char *namefmt) -{ - struct task_struct *p; - - p = kthread_create_on_cpu(threadfn, data, cpu, namefmt); - if (!IS_ERR(p)) - wake_up_process(p); - - return p; -} +#define kthread_run_on_cpu(threadfn, data, cpu, namefmt, ...) \ + _kthread_run_on_cpu(threadfn, data, cpu, __UNIQUE_ID(task_), \ + namefmt, ##__VA_ARGS__) + +#define _kthread_run_on_cpu(threadfn, data, cpu, ut, namefmt, ...) \ +({ \ + struct task_struct *ut; \ + \ + ut = kthread_create_on_cpu(threadfn, data, cpu, namefmt, \ + ##__VA_ARGS__); \ + if (!IS_ERR(ut)) \ + wake_up_process(ut); \ + \ + ut; \ +}) void free_kthread_struct(struct task_struct *k); void kthread_bind(struct task_struct *k, unsigned int cpu); diff --git a/kernel/kthread.c b/kernel/kthread.c index f7be976ff88a..e9da0115fb2b 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -559,23 +559,27 @@ void kthread_bind(struct task_struct *p, unsigned int cpu) EXPORT_SYMBOL(kthread_bind); /** - * kthread_create_on_cpu - Create a cpu bound kthread + * __kthread_create_on_cpu - Create a cpu bound kthread * @threadfn: the function to run until signal_pending(current). * @data: data ptr for @threadfn. * @cpu: The cpu on which the thread should be bound, - * @namefmt: printf-style name for the thread. Format is restricted - * to "name.*%u". Code fills in cpu number. + * @namefmt: printf-style name for the thread. Must have an excess '%u' + * at the end as kthread_create_on_cpu() fills in CPU number. * * Description: This helper function creates and names a kernel thread */ -struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), - void *data, unsigned int cpu, - const char *namefmt) +struct task_struct *__kthread_create_on_cpu(int (*threadfn)(void *data), + void *data, unsigned int cpu, + const char *namefmt, ...) { struct task_struct *p; + va_list args; + + va_start(args, namefmt); + p = __kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt, + args); + va_end(args); - p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt, - cpu); if (IS_ERR(p)) return p; kthread_bind(p, cpu); @@ -583,7 +587,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), to_kthread(p)->cpu = cpu; return p; } -EXPORT_SYMBOL(kthread_create_on_cpu); +EXPORT_SYMBOL(__kthread_create_on_cpu); void kthread_set_per_cpu(struct task_struct *k, int cpu) { From patchwork Fri Aug 30 16:25:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785327 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 773381BAECD; Fri, 30 Aug 2024 16:25:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035151; cv=none; b=B/cIxCGdwxAOwPedvFYDYLkrUrCSJeXoFMlqNeaMYit51RBTrDCx2erUlWubujCpQGsoQ7xemIUXIvwoSL4LE/ZjQmusChTsNP2ugUiWoRvYgpbZl8bFGV73oMDW+o1Fi+K2iarYhGzQiWuYbzwnNWktYmZb1GUD9wO5yWpVguM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035151; c=relaxed/simple; bh=Jkw9xcvH47v01BMyofQZ1se88nDV8RAlcX4vSYpST48=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B90+SYFM5o1iyLH7vLiq5LLEBLN9MY/qB5pwmA3TV7oxALrDPoTgAxIKik5zpemJYL7AgB83/X0cBSsmLuFodmxH5Vs/Q/n1FK3odsDtnA9vlbI7NrQRPXaRHmlxyx+c7n/yauR29OhH9n47xWs0gor1nGXRhJ862rxr35HL1QI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NMt1ypIX; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NMt1ypIX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035150; x=1756571150; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Jkw9xcvH47v01BMyofQZ1se88nDV8RAlcX4vSYpST48=; b=NMt1ypIX10OfZOU9RYfZGpcjqHidjWQJ0+D7MU4MV8kAX6mneqybn6dn +XsVhx0UJUrFeMHGXbZKGW7X0PET1WNcIAylbpSIPsLK1STD2J0ZIM94S RWPMRUpQqN6FtKlC0WXQ7VNsPdH9I+lywoLywI/6Uc96NbMWYGZLf+ehR w4no4etwvsLyJmxMO7MQs5YgBVwag6eLsK7c8TlM9Fil1X81PRehDG3ZY 31tNJ8gCh7dQwUZq/IDZpHMXsstE8At4WB5hIjRCEzNe6EGjkDbddcajW ztDo72tUVHS33BnDvGVdoGhpXecCKz1fpa0oNAvwLjwLidKBuVhE0A2Cg w==; X-CSE-ConnectionGUID: W15WPPxNQH+v4Gcksaq0vQ== X-CSE-MsgGUID: VCmqGPusSKuVpJDQp8RTNg== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068909" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49068909" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:25:50 -0700 X-CSE-ConnectionGUID: P1Msx8FsTpGlRo0jNNXMnA== X-CSE-MsgGUID: jwX6ZVu/Rg+18wHQBZDVEQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996446" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:45 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 3/9] net: napi: add ability to create CPU-pinned threaded NAPI Date: Fri, 30 Aug 2024 18:25:02 +0200 Message-ID: <20240830162508.1009458-4-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Lorenzo Bianconi Add netif_napi_add_percpu() to pin NAPI in threaded mode to a particular CPU. This means, if the NAPI is not threaded, it will be run as usually, but when switching to threaded mode, it will always be run on the specified CPU. It's not meant to be used in drivers, but might be useful when creating percpu threaded NAPIs, for example, to replace percpu kthreads or workers where a NAPI context is needed. The already existing netif_napi_add*() are not anyhow affected. Signed-off-by: Lorenzo Bianconi Signed-off-by: Alexander Lobakin Acked-by: Daniel Xu --- include/linux/netdevice.h | 35 +++++++++++++++++++++++++++++++++-- net/core/dev.c | 18 +++++++++++++----- 2 files changed, 46 insertions(+), 7 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ca5f0dda733b..4d6fb0ccdea1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -377,6 +377,7 @@ struct napi_struct { struct list_head dev_list; struct hlist_node napi_hash_node; int irq; + int thread_cpuid; }; enum { @@ -2619,8 +2620,18 @@ static inline void netif_napi_set_irq(struct napi_struct *napi, int irq) */ #define NAPI_POLL_WEIGHT 64 -void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, - int (*poll)(struct napi_struct *, int), int weight); +void netif_napi_add_weight_percpu(struct net_device *dev, + struct napi_struct *napi, + int (*poll)(struct napi_struct *, int), + int weight, int thread_cpuid); + +static inline void netif_napi_add_weight(struct net_device *dev, + struct napi_struct *napi, + int (*poll)(struct napi_struct *, int), + int weight) +{ + netif_napi_add_weight_percpu(dev, napi, poll, weight, -1); +} /** * netif_napi_add() - initialize a NAPI context @@ -2665,6 +2676,26 @@ static inline void netif_napi_add_tx(struct net_device *dev, netif_napi_add_tx_weight(dev, napi, poll, NAPI_POLL_WEIGHT); } +/** + * netif_napi_add_percpu() - initialize a CPU-pinned threaded NAPI context + * @dev: network device + * @napi: NAPI context + * @poll: polling function + * @thread_cpuid: CPU which this NAPI will be pinned to + * + * Variant of netif_napi_add() which pins the NAPI to the specified CPU. No + * changes in the "standard" mode, but in case with the threaded one, this + * NAPI will always be run on the passed CPU no matter where scheduled. + */ +static inline void netif_napi_add_percpu(struct net_device *dev, + struct napi_struct *napi, + int (*poll)(struct napi_struct *, int), + int thread_cpuid) +{ + netif_napi_add_weight_percpu(dev, napi, poll, NAPI_POLL_WEIGHT, + thread_cpuid); +} + /** * __netif_napi_del - remove a NAPI context * @napi: NAPI context diff --git a/net/core/dev.c b/net/core/dev.c index 98bb5f890b88..93ca3df8e9dd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1428,8 +1428,13 @@ static int napi_kthread_create(struct napi_struct *n) * TASK_INTERRUPTIBLE mode to avoid the blocked task * warning and work with loadavg. */ - n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d", - n->dev->name, n->napi_id); + if (n->thread_cpuid >= 0) + n->thread = kthread_run_on_cpu(napi_threaded_poll, n, + n->thread_cpuid, "napi/%s-%u", + n->dev->name); + else + n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d", + n->dev->name, n->napi_id); if (IS_ERR(n->thread)) { err = PTR_ERR(n->thread); pr_err("kthread_run failed with err %d\n", err); @@ -6640,8 +6645,10 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index, } EXPORT_SYMBOL(netif_queue_set_napi); -void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, - int (*poll)(struct napi_struct *, int), int weight) +void netif_napi_add_weight_percpu(struct net_device *dev, + struct napi_struct *napi, + int (*poll)(struct napi_struct *, int), + int weight, int thread_cpuid) { if (WARN_ON(test_and_set_bit(NAPI_STATE_LISTED, &napi->state))) return; @@ -6664,6 +6671,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, napi->poll_owner = -1; #endif napi->list_owner = -1; + napi->thread_cpuid = thread_cpuid; set_bit(NAPI_STATE_SCHED, &napi->state); set_bit(NAPI_STATE_NPSVC, &napi->state); list_add_rcu(&napi->dev_list, &dev->napi_list); @@ -6677,7 +6685,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, dev->threaded = false; netif_napi_set_irq(napi, -1); } -EXPORT_SYMBOL(netif_napi_add_weight); +EXPORT_SYMBOL(netif_napi_add_weight_percpu); void napi_disable(struct napi_struct *n) { From patchwork Fri Aug 30 16:25:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785328 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 767F41BB69D; Fri, 30 Aug 2024 16:25:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035155; cv=none; b=acexqWopXcyG8iMv9F4/yX8wJH/a1vMwcWU8G/SOFqcvqEiVS3R35hX8UhDi4cn2J+uw3sxcUVgN801HElLZbRiGVRPAZEhspf9HEM8bH3mbaklZwCCHDU88Cd76f7O0hHLnybmn4s7ELdGSqQ6vsI3fxvnOir21qi3DcqIIy+4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035155; c=relaxed/simple; bh=a0UCboQUXUg/DyXfTNwZ+U8UqIT8QHC7+/9PZrB7usY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hUkWLbGWSV6l6UI7Q65sfxpNbpBEpa/60NxQsOml77jLPwcnlGzqqXuVD7a7szPPhC4XLGWrCBdYPLLFcVuuOyOi4S5FXdHUT5i4KsQC5Dd5uxgGrfJUbBR+WyN7w7EGicRKJSxyvvkOvDToFGnx+/umb6bl0STCkqYhPgvq2h4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EGQLKToN; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EGQLKToN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035154; x=1756571154; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=a0UCboQUXUg/DyXfTNwZ+U8UqIT8QHC7+/9PZrB7usY=; b=EGQLKToN/nBlZV0O6KMMxCBS6hJfOK0r1Aj0twLgE/PAcbPWzskhMGAy xeVuhX4pVeygLPL5XyzUdtUWlJ+vaTkEDLl1t1nDrxJU5eOVNew4HFjDU //sl92gr7nQ76idwM03GJW2I6tlf8XNuQAc/LRIU7Ifv6cDv6czXmFKXW mUR4H2v9f3Iyn6tpgsAyPwtbjD7AgbuXNRQmYLuhD0Gjx/s7xB8NaBc7H R58SrDioDuTU3hvRMdA/rvYe5+ZU2vFaq8OZxdPxHK1/NR36+INR4bn/E 0r5d6+lMd1q4iF7h53wpT8XOLpjDXW4PXQiqYY4IUD9E6HuCN97qGae3A A==; X-CSE-ConnectionGUID: HGJH3lWeTfGlGxLidKSnCA== X-CSE-MsgGUID: w59K+8ILS6ueDHmS3jdFpw== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068924" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49068924" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:25:54 -0700 X-CSE-ConnectionGUID: ymd8LKSdQ+6ayUMFVEOdZg== X-CSE-MsgGUID: cBCpxEQgSPu4tUGMhlVZ0w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996450" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:49 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 4/9] bpf: cpumap: use CPU-pinned threaded NAPI w/GRO instead of kthread Date: Fri, 30 Aug 2024 18:25:03 +0200 Message-ID: <20240830162508.1009458-5-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net From: Lorenzo Bianconi Currently, cpumap uses its own kthread which processes cpumap-redirected frames by batches of 8, without any weighting (but with rescheduling points). The resulting skbs get passed to the stack via netif_receive_skb_list(), which means no GRO happens. In order to enable GRO, remove all custom kthread logics from the cpumap and use CPU-pinned threaded NAPIs with the default weight of 64. When a cpumap is created, a new logical netdevice called "cpumap" is created to run these NAPIs on it (one per map). Then, a percpu threaded NAPI context is created for each cpumap entry, IOW for each specified CPU. Instead of wake_up_process(), the NAPI is now scheduled and runs as usually: with the budget of 64, napi_complete_done() is called if the budget is not exhausted. Frames are still processed by batches of 8. Instead of netif_receive_skb_list(), napi_gro_receive() is now used. Alex' tests with an UDP trafficgen and small frame size: no GRO GRO baseline 2.7 N/A Mpps threaded GRO 2.3 4 Mpps diff -14 +48 % Daniel's tests with neper's TCP RR tests shows +14% throughput increase[0]. Currently, GRO on cpumap is limited to that the checksum status is not known as &xdp_frame doesn't have such metadata. When we have a way to pass it from the drivers, the boost will be much bigger. Cc: Daniel Xu Link: https://lore.kernel.org/bpf/merfatcdvwpx2lj4j2pahhwp4vihstpidws3jwljwazhh76xkd@t5vsh4gvk4mh [0] Signed-off-by: Lorenzo Bianconi Co-developed-by: Alexander Lobakin Signed-off-by: Alexander Lobakin --- kernel/bpf/cpumap.c | 167 ++++++++++++++++++++------------------------ 1 file changed, 76 insertions(+), 91 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index fbdf5a1aabfe..d1cfa4111727 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -28,14 +28,10 @@ #include #include -#include #include #include #include -#include /* netif_receive_skb_list */ -#include /* eth_type_trans */ - /* General idea: XDP packets getting XDP redirected to another CPU, * will maximum be stored/queued for one driver ->poll() call. It is * guaranteed that queueing the frame and the flush operation happen on @@ -56,20 +52,22 @@ struct xdp_bulk_queue { /* Struct for every remote "destination" CPU in map */ struct bpf_cpu_map_entry { - u32 cpu; /* kthread CPU and map index */ + u32 cpu; /* NAPI thread CPU and map index */ int map_id; /* Back reference to map */ /* XDP can run multiple RX-ring queues, need __percpu enqueue store */ struct xdp_bulk_queue __percpu *bulkq; - /* Queue with potential multi-producers, and single-consumer kthread */ + /* + * Queue with potential multi-producers and single-consumer + * NAPI thread + */ struct ptr_ring *queue; - struct task_struct *kthread; struct bpf_cpumap_val value; struct bpf_prog *prog; + struct napi_struct napi; - struct completion kthread_running; struct rcu_work free_work; }; @@ -77,12 +75,15 @@ struct bpf_cpu_map { struct bpf_map map; /* Below members specific for map type */ struct bpf_cpu_map_entry __rcu **cpu_map; + /* Dummy netdev to run threaded NAPI */ + struct net_device *napi_dev; }; static struct bpf_map *cpu_map_alloc(union bpf_attr *attr) { u32 value_size = attr->value_size; struct bpf_cpu_map *cmap; + struct net_device *dev; /* check sanity of attributes */ if (attr->max_entries == 0 || attr->key_size != 4 || @@ -105,19 +106,34 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr) cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *), cmap->map.numa_node); - if (!cmap->cpu_map) { - bpf_map_area_free(cmap); - return ERR_PTR(-ENOMEM); - } + if (!cmap->cpu_map) + goto free_cmap; + + dev = bpf_map_area_alloc(struct_size(dev, priv, 0), NUMA_NO_NODE); + if (!dev) + goto free_cpu_map; + + init_dummy_netdev(dev); + strscpy(dev->name, "cpumap"); + dev->threaded = true; + + cmap->napi_dev = dev; return &cmap->map; + +free_cpu_map: + bpf_map_area_free(cmap->cpu_map); +free_cmap: + bpf_map_area_free(cmap); + + return ERR_PTR(-ENOMEM); } static void __cpu_map_ring_cleanup(struct ptr_ring *ring) { /* The tear-down procedure should have made sure that queue is * empty. See __cpu_map_entry_replace() and work-queue - * invoked cpu_map_kthread_stop(). Catch any broken behaviour + * invoked __cpu_map_entry_free(). Catch any broken behaviour * gracefully and warn once. */ void *ptr; @@ -244,7 +260,6 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, if (!rcpu->prog) return xdp_n; - rcu_read_lock_bh(); bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); nframes = cpu_map_bpf_prog_run_xdp(rcpu, frames, xdp_n, stats); @@ -256,62 +271,45 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, cpu_map_bpf_prog_run_skb(rcpu, list, stats); bpf_net_ctx_clear(bpf_net_ctx); - rcu_read_unlock_bh(); /* resched point, may call do_softirq() */ return nframes; } -static int cpu_map_kthread_run(void *data) +static int cpu_map_napi_poll(struct napi_struct *napi, int budget) { - struct bpf_cpu_map_entry *rcpu = data; - unsigned long last_qs = jiffies; + struct xdp_cpumap_stats stats = {}; /* zero stats */ + u32 done = 0, kmem_alloc_drops = 0; + struct bpf_cpu_map_entry *rcpu; - complete(&rcpu->kthread_running); - set_current_state(TASK_INTERRUPTIBLE); + rcu_read_lock(); + rcpu = container_of(napi, typeof(*rcpu), napi); - /* When kthread gives stop order, then rcpu have been disconnected - * from map, thus no new packets can enter. Remaining in-flight - * per CPU stored packets are flushed to this queue. Wait honoring - * kthread_stop signal until queue is empty. - */ - while (!kthread_should_stop() || !__ptr_ring_empty(rcpu->queue)) { - struct xdp_cpumap_stats stats = {}; /* zero stats */ - unsigned int kmem_alloc_drops = 0, sched = 0; + while (likely(done < budget)) { gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; int i, n, m, nframes, xdp_n; void *frames[CPUMAP_BATCH]; + struct sk_buff *skb, *tmp; void *skbs[CPUMAP_BATCH]; LIST_HEAD(list); - /* Release CPU reschedule checks */ - if (__ptr_ring_empty(rcpu->queue)) { - set_current_state(TASK_INTERRUPTIBLE); - /* Recheck to avoid lost wake-up */ - if (__ptr_ring_empty(rcpu->queue)) { - schedule(); - sched = 1; - last_qs = jiffies; - } else { - __set_current_state(TASK_RUNNING); - } - } else { - rcu_softirq_qs_periodic(last_qs); - sched = cond_resched(); - } + if (__ptr_ring_empty(rcpu->queue)) + break; /* * The bpf_cpu_map_entry is single consumer, with this - * kthread CPU pinned. Lockless access to ptr_ring + * NAPI thread CPU pinned. Lockless access to ptr_ring * consume side valid as no-resize allowed of queue. */ - n = __ptr_ring_consume_batched(rcpu->queue, frames, - CPUMAP_BATCH); + n = min(budget - done, CPUMAP_BATCH); + n = __ptr_ring_consume_batched(rcpu->queue, frames, n); + done += n; + for (i = 0, xdp_n = 0; i < n; i++) { void *f = frames[i]; struct page *page; if (unlikely(__ptr_test_bit(0, &f))) { - struct sk_buff *skb = f; + skb = f; __ptr_clear_bit(0, &skb); list_add_tail(&skb->list, &list); @@ -340,12 +338,10 @@ static int cpu_map_kthread_run(void *data) } } - local_bh_disable(); for (i = 0; i < nframes; i++) { struct xdp_frame *xdpf = frames[i]; - struct sk_buff *skb = skbs[i]; - skb = __xdp_build_skb_from_frame(xdpf, skb, + skb = __xdp_build_skb_from_frame(xdpf, skbs[i], xdpf->dev_rx); if (!skb) { xdp_return_frame(xdpf); @@ -354,17 +350,23 @@ static int cpu_map_kthread_run(void *data) list_add_tail(&skb->list, &list); } - netif_receive_skb_list(&list); - - /* Feedback loop via tracepoint */ - trace_xdp_cpumap_kthread(rcpu->map_id, n, kmem_alloc_drops, - sched, &stats); - local_bh_enable(); /* resched point, may call do_softirq() */ + list_for_each_entry_safe(skb, tmp, &list, list) { + skb_list_del_init(skb); + napi_gro_receive(napi, skb); + } } - __set_current_state(TASK_RUNNING); - return 0; + rcu_read_unlock(); + + /* Feedback loop via tracepoint */ + trace_xdp_cpumap_kthread(rcpu->map_id, done, kmem_alloc_drops, 0, + &stats); + + if (done < budget) + napi_complete_done(napi, done); + + return done; } static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu, @@ -394,6 +396,7 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, { int numa, err, i, fd = value->bpf_prog.fd; gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; + const struct bpf_cpu_map *cmap; struct bpf_cpu_map_entry *rcpu; struct xdp_bulk_queue *bq; @@ -432,29 +435,13 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, if (fd > 0 && __cpu_map_load_bpf_program(rcpu, map, fd)) goto free_ptr_ring; - /* Setup kthread */ - init_completion(&rcpu->kthread_running); - rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa, - "cpumap/%d/map:%d", cpu, - map->id); - if (IS_ERR(rcpu->kthread)) - goto free_prog; - - /* Make sure kthread runs on a single CPU */ - kthread_bind(rcpu->kthread, cpu); - wake_up_process(rcpu->kthread); - - /* Make sure kthread has been running, so kthread_stop() will not - * stop the kthread prematurely and all pending frames or skbs - * will be handled by the kthread before kthread_stop() returns. - */ - wait_for_completion(&rcpu->kthread_running); + cmap = container_of(map, typeof(*cmap), map); + netif_napi_add_percpu(cmap->napi_dev, &rcpu->napi, cpu_map_napi_poll, + cpu); + napi_enable(&rcpu->napi); return rcpu; -free_prog: - if (rcpu->prog) - bpf_prog_put(rcpu->prog); free_ptr_ring: ptr_ring_cleanup(rcpu->queue, NULL); free_queue: @@ -477,11 +464,12 @@ static void __cpu_map_entry_free(struct work_struct *work) */ rcpu = container_of(to_rcu_work(work), struct bpf_cpu_map_entry, free_work); - /* kthread_stop will wake_up_process and wait for it to complete. - * cpu_map_kthread_run() makes sure the pointer ring is empty + /* napi_disable() will wait for the NAPI poll to complete. + * cpu_map_napi_poll() makes sure the pointer ring is empty * before exiting. */ - kthread_stop(rcpu->kthread); + napi_disable(&rcpu->napi); + netif_napi_del(&rcpu->napi); if (rcpu->prog) bpf_prog_put(rcpu->prog); @@ -498,8 +486,8 @@ static void __cpu_map_entry_free(struct work_struct *work) * __cpu_map_entry_free() in a separate workqueue after waiting for an RCU grace * period. This means that (a) all pending enqueue and flush operations have * completed (because of the RCU callback), and (b) we are in a workqueue - * context where we can stop the kthread and wait for it to exit before freeing - * everything. + * context where we can stop the NAPI thread and wait for it to exit before + * freeing everything. */ static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap, u32 key_cpu, struct bpf_cpu_map_entry *rcpu) @@ -579,9 +567,7 @@ static void cpu_map_free(struct bpf_map *map) */ synchronize_rcu(); - /* The only possible user of bpf_cpu_map_entry is - * cpu_map_kthread_run(). - */ + /* The only possible user of bpf_cpu_map_entry is cpu_map_napi_poll() */ for (i = 0; i < cmap->map.max_entries; i++) { struct bpf_cpu_map_entry *rcpu; @@ -589,9 +575,10 @@ static void cpu_map_free(struct bpf_map *map) if (!rcpu) continue; - /* Stop kthread and cleanup entry directly */ + /* Stop NAPI thread and cleanup entry directly */ __cpu_map_entry_free(&rcpu->free_work.work); } + bpf_map_area_free(cmap->napi_dev); bpf_map_area_free(cmap->cpu_map); bpf_map_area_free(cmap); } @@ -753,7 +740,7 @@ int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu, if (ret < 0) goto trace; - wake_up_process(rcpu->kthread); + napi_schedule(&rcpu->napi); trace: trace_xdp_cpumap_enqueue(rcpu->map_id, !ret, !!ret, rcpu->cpu); return ret; @@ -765,8 +752,6 @@ void __cpu_map_flush(struct list_head *flush_list) list_for_each_entry_safe(bq, tmp, flush_list, flush_node) { bq_flush_to_queue(bq); - - /* If already running, costs spin_lock_irqsave + smb_mb */ - wake_up_process(bq->obj->kthread); + napi_schedule(&bq->obj->napi); } } From patchwork Fri Aug 30 16:25:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785329 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E83651BBBD3; Fri, 30 Aug 2024 16:25:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035158; cv=none; b=IzRodn6fB9SIZpqEJe+hUqtKA7g2Su1qOZWGi9YRtk0gcrZ1buF9/eHPOYY6l9mZ155MkVwygOGQwq/Jj0K4lVVWUwkR/TSpPe2f8L66Wv/rtq8iMlC34Ocj5iXn91VhRNOdWd9l47HzzUigHU/K9sQrsOnXwrL03P15VUbwtZk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035158; c=relaxed/simple; bh=l89IadWlEi0jEr4luW02lPosgA0IRD2ZprKXd+x2hRg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NaAVifKfp/qQ4byy5VA2rxfmuOOH77UYzTZBXjRhN9YUu+jeOOpW34caGPuhSUnuhx0HEUx6jxEB7XTqHUVSUVDeqk1hEuU18SRque5QTyVbY/TxhzVDLjeaL6qLyZEeVfT45mGwLclczGkXeVG9+4L/vTkqEzxFmrr5ijlOol0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VXdU1Yrk; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VXdU1Yrk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035158; x=1756571158; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=l89IadWlEi0jEr4luW02lPosgA0IRD2ZprKXd+x2hRg=; b=VXdU1YrkxGsHnlCyW6q1uHQkIq9Vxj+/mZ+4wzaNSFz0d5fG4mR6oOwd Bw3iZBgtMZF9Fz2Mv+xoBTdu5eDXRFcq2+Hj1OqPTQlVR3/iNUq+E0P8c Hf/L/5Hr0FhPwlJgTV59RFYEVbi1+3iU+hEAKFDhDQWOMQJO+rs9qv/MP ZTxXGSWvat7gD/n5RNir689pd0gm/zSjfUb3nGrlnwjUV3DwDowppFjlN FVxKv9xXEo5x/ac8+vKcSuRaNamxH4yfjw/rww5Lnt1nV+u7+L/rlnal6 BX0p9tVBfKeDQEL8Pw6JbD3OZjizB3hNrctdMpVmM/MAltyHJ1/82ARBF w==; X-CSE-ConnectionGUID: P0ekgLvOS+2aHxou7W5o5A== X-CSE-MsgGUID: a1he498xSOCcy+xfF98gIw== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068940" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49068940" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:25:57 -0700 X-CSE-ConnectionGUID: nue82NPeRw6GsgiCud4Ejw== X-CSE-MsgGUID: T8l2eGibQiintE5lAe2HuQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996455" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:53 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 5/9] bpf: cpumap: reuse skb array instead of a linked list to chain skbs Date: Fri, 30 Aug 2024 18:25:04 +0200 Message-ID: <20240830162508.1009458-6-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net cpumap still uses linked lists to store a list of skbs to pass to the stack. Now that we don't use listified Rx in favor of napi_gro_receive(), linked list is now an unneeded overhead. Inside the polling loop, we already have an array of skbs. Let's reuse it for skbs passed to cpumap (generic XDP) and use napi_gro_receive() directly in case of XDP_PASS when a program is installed to the map itself. Don't list regular xdp_frames at all and just call napi_gro_receive() directly as well right after building an skb. Signed-off-by: Alexander Lobakin --- kernel/bpf/cpumap.c | 55 +++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 29 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index d1cfa4111727..d7206f3f6e80 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -150,21 +150,23 @@ static void __cpu_map_ring_cleanup(struct ptr_ring *ring) } static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu, - struct list_head *listp, + void **skbs, u32 skb_n, struct xdp_cpumap_stats *stats) { - struct sk_buff *skb, *tmp; struct xdp_buff xdp; u32 act; int err; - list_for_each_entry_safe(skb, tmp, listp, list) { + for (u32 i = 0; i < skb_n; i++) { + struct sk_buff *skb = skbs[i]; + act = bpf_prog_run_generic_xdp(skb, &xdp, rcpu->prog); switch (act) { case XDP_PASS: + napi_gro_receive(&rcpu->napi, skb); + stats->pass++; break; case XDP_REDIRECT: - skb_list_del_init(skb); err = xdp_do_generic_redirect(skb->dev, skb, &xdp, rcpu->prog); if (unlikely(err)) { @@ -181,8 +183,7 @@ static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu, trace_xdp_exception(skb->dev, rcpu->prog, act); fallthrough; case XDP_DROP: - skb_list_del_init(skb); - kfree_skb(skb); + napi_consume_skb(skb, true); stats->drop++; return; } @@ -251,8 +252,8 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu, #define CPUMAP_BATCH 8 static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, - int xdp_n, struct xdp_cpumap_stats *stats, - struct list_head *list) + int xdp_n, void **skbs, u32 skb_n, + struct xdp_cpumap_stats *stats) { struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int nframes; @@ -267,8 +268,8 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, if (stats->redirect) xdp_do_flush(); - if (unlikely(!list_empty(list))) - cpu_map_bpf_prog_run_skb(rcpu, list, stats); + if (unlikely(skb_n)) + cpu_map_bpf_prog_run_skb(rcpu, skbs, skb_n, stats); bpf_net_ctx_clear(bpf_net_ctx); @@ -288,9 +289,7 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget) gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; int i, n, m, nframes, xdp_n; void *frames[CPUMAP_BATCH]; - struct sk_buff *skb, *tmp; void *skbs[CPUMAP_BATCH]; - LIST_HEAD(list); if (__ptr_ring_empty(rcpu->queue)) break; @@ -304,15 +303,15 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget) n = __ptr_ring_consume_batched(rcpu->queue, frames, n); done += n; - for (i = 0, xdp_n = 0; i < n; i++) { + for (i = 0, xdp_n = 0, m = 0; i < n; i++) { void *f = frames[i]; struct page *page; if (unlikely(__ptr_test_bit(0, &f))) { - skb = f; + struct sk_buff *skb = f; __ptr_clear_bit(0, &skb); - list_add_tail(&skb->list, &list); + skbs[m++] = skb; continue; } @@ -327,19 +326,22 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget) } /* Support running another XDP prog on this CPU */ - nframes = cpu_map_bpf_prog_run(rcpu, frames, xdp_n, &stats, &list); - if (nframes) { - m = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, - gfp, nframes, skbs); - if (unlikely(m == 0)) { - for (i = 0; i < nframes; i++) - skbs[i] = NULL; /* effect: xdp_return_frame */ - kmem_alloc_drops += nframes; - } + nframes = cpu_map_bpf_prog_run(rcpu, frames, xdp_n, skbs, m, + &stats); + if (!nframes) + continue; + + m = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp, + nframes, skbs); + if (unlikely(!m)) { + for (i = 0; i < nframes; i++) + skbs[i] = NULL; /* effect: xdp_return_frame */ + kmem_alloc_drops += nframes; } for (i = 0; i < nframes; i++) { struct xdp_frame *xdpf = frames[i]; + struct sk_buff *skb; skb = __xdp_build_skb_from_frame(xdpf, skbs[i], xdpf->dev_rx); @@ -348,11 +350,6 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget) continue; } - list_add_tail(&skb->list, &list); - } - - list_for_each_entry_safe(skb, tmp, &list, list) { - skb_list_del_init(skb); napi_gro_receive(napi, skb); } } From patchwork Fri Aug 30 16:25:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785330 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7E2E1BC08B; Fri, 30 Aug 2024 16:26:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035162; cv=none; b=KEK6Btr0b/mxhXHiUIwEhpnZRuMQeigginrhn4kQhCn+mJSfpPqtR3zzs18j1RXh6KtC7Pby0Jstkqh2QmCAoitIYOvHD7FIG1RHcKNPsKas4ivTDTuQjHX4fAfw72jbEm/+yNScUX3h7wmYMQCA7i3gCcTg2NYYNdDZGJ8T4gw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035162; c=relaxed/simple; bh=P2rqqspjoJ+gwn37T9gSKXeAKPP9RlbcPxrJ572nRgs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HoDDkrNRUaYkX6eKX9cpC7eTPmYpjZLQB+7z4zs9PBmH8o5KS+FSHQ/SL1xOpmYfDEyQw72C/rFJUoj3MyHnhKJ/4q4Qm1uqTY5P52/+OWXVXog6I3QeByuDvXihTc0CimQa49cJCfkYZmYaVxmNOkaerp9LwoOWYDx0O34zqlE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=D3foGkK9; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="D3foGkK9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035162; x=1756571162; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=P2rqqspjoJ+gwn37T9gSKXeAKPP9RlbcPxrJ572nRgs=; b=D3foGkK99SH8Sj+nQ5oRvvAgLpVdqzF2qgmb6Qawpku/K0rlwPajXKSX V9P3cwV4CGGCWh2wZ5ypcFlA7qob3chCQ0jTidDqxb4N/nz2j9Xmx/KU6 MBjhaJh/NAXu1v6N/244YJ8s+Rtw5LBiUHn9/+rCRXSqIDzSWcGtocoHx COjAJR8QHzYk+ErqXgmx1AvWerekMYygUZiSh5Ofe58FHVQFGFYjQDUv6 ZxzlBqdUgQ/huXKK9+7ZKXV0Cv7dQZzbqq6/ykMHDxmr9bnzgxklP+NpK uDlYsLtZiaAYGaz8oy5m3HzT9NszGpukxxnfyKufGfXiymkYmL2M52NyW g==; X-CSE-ConnectionGUID: ziUKf/s+Q2ey+FLCLhfDEw== X-CSE-MsgGUID: X/ZcWqufSlG97CsMg5iWfA== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068955" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49068955" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:26:01 -0700 X-CSE-ConnectionGUID: ksPEpP6JQWeh19P0W3ROGw== X-CSE-MsgGUID: Gq3ljsB1SYa0zjGC//t7aQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996468" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:25:57 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 6/9] net: skbuff: introduce napi_skb_cache_get_bulk() Date: Fri, 30 Aug 2024 18:25:05 +0200 Message-ID: <20240830162508.1009458-7-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Add a function to get an array of skbs from the NAPI percpu cache. It's supposed to be a drop-in replacement for kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the requirement to call it only from the BH) is that it tries to use as many NAPI cache entries for skbs as possible, and allocate new ones only if needed. The logic is as follows: * there is enough skbs in the cache: decache them and return to the caller; * not enough: try refilling the cache first. If there is now enough skbs, return; * still not enough: try allocating skbs directly to the output array with %GFP_ZERO, maybe we'll be able to get some. If there's now enough, return; * still not enough: return as many as we were able to obtain. Most of times, if called from the NAPI polling loop, the first one will be true, sometimes (rarely) the second one. The third and the fourth -- only under heavy memory pressure. It can save significant amounts of CPU cycles if there are GRO cycles and/or Tx completion cycles (anything that descends to napi_skb_cache_put()) happening on this CPU. Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 62 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 63 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cf8f6ce06742..2bc3ca79bc6e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1304,6 +1304,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size); void skb_attempt_defer_free(struct sk_buff *skb); +u32 napi_skb_cache_get_bulk(void **skbs, u32 n); struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); struct sk_buff *slab_build_skb(void *data); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index a52638363ea5..0a34f3aa00d1 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -366,6 +366,68 @@ static struct sk_buff *napi_skb_cache_get(void) return skb; } +/** + * napi_skb_cache_get_bulk - obtain a number of zeroed skb heads from the cache + * @skbs: pointer to an at least @n-sized array to fill with skb pointers + * @n: number of entries to provide + * + * Tries to obtain @n &sk_buff entries from the NAPI percpu cache and writes + * the pointers into the provided array @skbs. If there are less entries + * available, tries to replenish the cache and bulk-allocates the diff from + * the MM layer if needed. + * The heads are being zeroed with either memset() or %__GFP_ZERO, so they are + * ready for {,__}build_skb_around() and don't have any data buffers attached. + * Must be called *only* from the BH context. + * + * Return: number of successfully allocated skbs (@n if no actual allocation + * needed or kmem_cache_alloc_bulk() didn't fail). + */ +u32 napi_skb_cache_get_bulk(void **skbs, u32 n) +{ + struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); + u32 bulk, total = n; + + local_lock_nested_bh(&napi_alloc_cache.bh_lock); + + if (nc->skb_count >= n) + goto get; + + /* No enough cached skbs. Try refilling the cache first */ + bulk = min(NAPI_SKB_CACHE_SIZE - nc->skb_count, NAPI_SKB_CACHE_BULK); + nc->skb_count += kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, + GFP_ATOMIC | __GFP_NOWARN, bulk, + &nc->skb_cache[nc->skb_count]); + if (likely(nc->skb_count >= n)) + goto get; + + /* Still not enough. Bulk-allocate the missing part directly, zeroed */ + n -= kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, + GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN, + n - nc->skb_count, &skbs[nc->skb_count]); + if (likely(nc->skb_count >= n)) + goto get; + + /* kmem_cache didn't allocate the number we need, limit the output */ + total -= n - nc->skb_count; + n = nc->skb_count; + +get: + for (u32 base = nc->skb_count - n, i = 0; i < n; i++) { + u32 cache_size = kmem_cache_size(net_hotdata.skbuff_cache); + + skbs[i] = nc->skb_cache[base + i]; + + kasan_mempool_unpoison_object(skbs[i], cache_size); + memset(skbs[i], 0, offsetof(struct sk_buff, tail)); + } + + nc->skb_count -= n; + local_unlock_nested_bh(&napi_alloc_cache.bh_lock); + + return total; +} +EXPORT_SYMBOL_GPL(napi_skb_cache_get_bulk); + static inline void __finalize_skb_around(struct sk_buff *skb, void *data, unsigned int size) { From patchwork Fri Aug 30 16:25:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785331 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E14C1BD00B; Fri, 30 Aug 2024 16:26:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035166; cv=none; b=ly1frr9g5tY/+0HiS8W/LWYNv2UKR91/mSv51GtK7VsjjAppxKTvJkpR3zcspBlxSEFDkil/d/RrqvkMcGiVH6I3HeYTAyRM6GjlYC+dv+f75Gb/6p4spCoXoEsEQX4jv1eRt58WLe0miRJdafsW/stCHZ16/kOiGQbRPIEh4cs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035166; c=relaxed/simple; bh=j2+AZk0CsXXjVOjOXZCdGaih+ir0sDyytVAQE5WUi20=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y/TaTtQb8WOrHXrEIxjaCQkXmeMDb3rxlIOJf+3mpqFV3SVh49ARLE1HdkmT7Z0Ymd3gS8OokLkgzFPprGwJv+ymd25XjxhBTeEKD00pdx6PKnqRcZzDsGEJS+o44DUXRT+R7fZFDbaM+PP3QjdVmIYsTXfrcX8oki7+8n5NsZA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RerfOeeh; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RerfOeeh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035165; x=1756571165; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=j2+AZk0CsXXjVOjOXZCdGaih+ir0sDyytVAQE5WUi20=; b=RerfOeehav4Al0LLZaTEyznoItXQ4wEQnWfWNh+gjDmLlhDzjAktgN+K 7aAJoh6g5gckiuuw0TwAJyhFE5ffGc1j0HtFCQtjEXjdMnVudjhuMb/B2 JkrZWlCHfjMG+c2IcCRUmR3WxycT4PEsuObYZbb029Ct5uZGhEBWz06ZM 48qwcrq/Od+CKkI4OY088QS/feVyD6tnQB4BptiadX17VppB5W01nrttG dpthgYBrVczTeppYZYxszo/7Jw0csKwaoyR1t8d1aoROLsqjP9YOXWWEO Ax1Scj2azIeaJUfk/b0FNOdJC3nrVETG+TlnmvIYwOga2Li2m3ZMMirVo A==; X-CSE-ConnectionGUID: cPk3HRwuSKODHnPYxNwmmw== X-CSE-MsgGUID: LDdpWjg5SwScA30/9jfAfw== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068975" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49068975" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:26:05 -0700 X-CSE-ConnectionGUID: HDToTPshTM65WgoCtd7RVw== X-CSE-MsgGUID: if0gYkwKR8uHL2UIvzn1Jg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996487" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:26:01 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 7/9] bpf: cpumap: switch to napi_skb_cache_get_bulk() Date: Fri, 30 Aug 2024 18:25:06 +0200 Message-ID: <20240830162508.1009458-8-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Now that cpumap uses GRO, which drops unused skb heads to the NAPI cache, use napi_skb_cache_get_bulk() to try to reuse cached entries and lower the MM layer pressure. The polling loop already happens in the BH context, so the switch is safe from that perspective. The better GRO aggregates packets, the less new skbs will be allocated. If an aggregated skb contains 16 frags, this means 15 skbs were returned to the cache, so next 15 skbs will be built without allocating anything. The same trafficgen UDP GRO test now shows: GRO off GRO on threaded GRO 2.3 4 Mpps thr bulk GRO 2.4 4.7 Mpps diff +4 +17 % Comparing to the baseline cpumap: baseline 2.7 N/A Mpps thr bulk GRO 2.4 4.7 Mpps diff -11 +74 % Signed-off-by: Alexander Lobakin --- kernel/bpf/cpumap.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index d7206f3f6e80..992f4e30a589 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -286,7 +286,6 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget) rcpu = container_of(napi, typeof(*rcpu), napi); while (likely(done < budget)) { - gfp_t gfp = __GFP_ZERO | GFP_ATOMIC; int i, n, m, nframes, xdp_n; void *frames[CPUMAP_BATCH]; void *skbs[CPUMAP_BATCH]; @@ -331,8 +330,7 @@ static int cpu_map_napi_poll(struct napi_struct *napi, int budget) if (!nframes) continue; - m = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp, - nframes, skbs); + m = napi_skb_cache_get_bulk(skbs, nframes); if (unlikely(!m)) { for (i = 0; i < nframes; i++) skbs[i] = NULL; /* effect: xdp_return_frame */ From patchwork Fri Aug 30 16:25:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785332 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 34BE91BD515; Fri, 30 Aug 2024 16:26:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035169; cv=none; b=HLv43Nd3Xm6OEh63ePzMCAstjENMfEWLZZeZzjLYnujj7sc0bEre4/Q4eBRvYkEck+Ncs0EcXveik7KI3V0dKwx0x9QOAkJ381rVhhFSobQ2hkwkR8W+AZyhb6/l9kGGPSdZkewvjYVmIvrndRQlr1Kn13Hk0dEM3oSFy2qulcU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035169; c=relaxed/simple; bh=2E5MS1Ol4fcBhQi1/pEkQ0LUv2e4MXiN4O2hZZN3thk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sRS6J88ZUWkXNoQytZLc91acRBOf6rDMV3IrPZRXiLtOnAL/l6070pIlwkVSzgXvI9X7lBerdhScZQbf3DsXkyd5TB8FSMBlgASpClpUwy3aNT9Ut9lpIC3xjefKIeOJd91FYBgUnsbICLYGNmpT2Yq98o4tYTkmRaJkJNwz19Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=JjS+FLTA; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="JjS+FLTA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035169; x=1756571169; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2E5MS1Ol4fcBhQi1/pEkQ0LUv2e4MXiN4O2hZZN3thk=; b=JjS+FLTA3ojfk9CyR54zOWGEqSBiD/oAXcXcrQayx9veu/BdKoOVC3Yr H0V+fd8rJB7oeVJ9ZPwItWO5SW4Zb5nTBtizDgmzApcBwawXkASYAyKfY zk4aZUDFNq2Nwuot3c+7RKH4Y3uIbifLA479ll8LLdJ+COiQHhXHrt9Cs 7EOCTwzBOSROaevfKz6rEoStHbo8lGD9bh8PpiZZjSLWU5nUvgfMtp48V b0w4Tw/wEsBb2OUGHwJGxhimHuAWGPuCm69fPkiKFj47G5wNFVlJnxway A5NeA2Fkyyto2wmN8bm3JlkN8lslkC1bPIDfALswIYJh7Ly78viJgmCye Q==; X-CSE-ConnectionGUID: wY+YDYe1QPakfxwLIOUMtA== X-CSE-MsgGUID: TKdNvpd9TNarBS5nS0onlw== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49068998" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49068998" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:26:09 -0700 X-CSE-ConnectionGUID: rLMPYcbcTUKfkmrcVr/nnw== X-CSE-MsgGUID: LIdNwYpzSpinkdl3soQIyg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996501" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:26:04 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 8/9] veth: use napi_skb_cache_get_bulk() instead of xdp_alloc_skb_bulk() Date: Fri, 30 Aug 2024 18:25:07 +0200 Message-ID: <20240830162508.1009458-9-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net Now that we can bulk-allocate skbs from the NAPI cache, use that function to do that in veth as well instead of direct allocation from the kmem caches. veth already uses NAPI for Rx processing, so this is safe from the context perspective. veth also uses GRO, so using NAPI caches makes real difference here. Signed-off-by: Alexander Lobakin --- drivers/net/veth.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 18148e068aa0..774f226666c8 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -684,8 +684,7 @@ static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames, void *skbs[VETH_XDP_BATCH]; int i; - if (xdp_alloc_skb_bulk(skbs, n_xdpf, - GFP_ATOMIC | __GFP_ZERO) < 0) { + if (unlikely(!napi_skb_cache_get_bulk(skbs, n_xdpf))) { for (i = 0; i < n_xdpf; i++) xdp_return_frame(frames[i]); stats->rx_drops += n_xdpf; From patchwork Fri Aug 30 16:25:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13785333 X-Patchwork-Delegate: bpf@iogearbox.net Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 080251B9B2E; Fri, 30 Aug 2024 16:26:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035173; cv=none; b=mcGsbykYF5miimUbUm2PhFQkI9msDx1UIPjLq3gRa6vta9TmJF1WBeMSXgVH/t69Z07VU4JZqiGXkew/KiK/X5j7qStVdzxexxRUMpbr53S4Ssv6EhOpdhJxCAVFgN5aWgRDTzqo6SdhtEuBlmvzIjv9KSmyCZWiQkslPJRQMdQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725035173; c=relaxed/simple; bh=/VslWGAIDy5p/4CWQoCT3JSpikohdk4tCvhbyE0xCgQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YLLEK63EWs/YEDgnkgZFkD/DY/yGiOWa3yhEAibZgr/XHHEwVzIaoy5Cp/SvzyxeLJr5kCJQ9F3Pq98lHPN4XH1nLIGN4vqjcTkyP/S0HsLfufH3gUgaXyuMrqOeNlAC+Ch74deddMw7oQnfbjEwZvKf+agj3O7mVG5mPxUiwwY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SybPVICr; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SybPVICr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725035173; x=1756571173; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/VslWGAIDy5p/4CWQoCT3JSpikohdk4tCvhbyE0xCgQ=; b=SybPVICrZUBjLieyFAfRtUMGmaFDj+s9zvRwKaYJ43JS1rFvr7fhjftM mzOGEw1q1ZNkD5SR8b0Wh6gP43vb5SscS8soL+nSm+CcG3Kp04AcefBQ5 6yut8rHDHtzbRMNjjHvMNoO0smqtS9XF5abtO9z08GkQ5lfgm8OC9emXb rirF3E7ndfK/BtBTiIyFuK8EinGgM63Su+fblSZ5RkELBlqiMMzBxMDKK F0v8p9pN0w4s/8jcvcukAJh46NktWY5chN3sO0RUkhw6HH1jtpbP2oRFZ X+kNRIoC7LQeniHsjPqf9Jct3yrj6AzcmNAqeASIllaoPy1EYQd0R3Fkj g==; X-CSE-ConnectionGUID: EcDHOY1fRby5UZfeSQG8JQ== X-CSE-MsgGUID: umX4z53uTy+RfnrRSyCD6w== X-IronPort-AV: E=McAfee;i="6700,10204,11180"; a="49069021" X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="49069021" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2024 09:26:12 -0700 X-CSE-ConnectionGUID: DlCb8T8HRxWFJcV8eb1G9g== X-CSE-MsgGUID: NzvQYEt2SN2w+Owk9Sjh7w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,189,1719903600"; d="scan'208";a="63996508" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa009.fm.intel.com with ESMTP; 30 Aug 2024 09:26:08 -0700 From: Alexander Lobakin To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Alexander Lobakin , Lorenzo Bianconi , Daniel Xu , John Fastabend , Jesper Dangaard Brouer , Martin KaFai Lau , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next 9/9] xdp: remove xdp_alloc_skb_bulk() Date: Fri, 30 Aug 2024 18:25:08 +0200 Message-ID: <20240830162508.1009458-10-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20240830162508.1009458-1-aleksander.lobakin@intel.com> References: <20240830162508.1009458-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: bpf@iogearbox.net The only user was veth, which now uses napi_skb_cache_get_bulk(). It's now preferred over a direct allocation and is exported as well, so remove this one. Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 1 - net/core/xdp.c | 10 ---------- 2 files changed, 11 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index e6770dd40c91..bd3363e384b2 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -245,7 +245,6 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct net_device *dev); struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct net_device *dev); -int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp); struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf); static inline diff --git a/net/core/xdp.c b/net/core/xdp.c index bcc5551c6424..34d057089d20 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -584,16 +584,6 @@ void xdp_warn(const char *msg, const char *func, const int line) }; EXPORT_SYMBOL_GPL(xdp_warn); -int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp) -{ - n_skb = kmem_cache_alloc_bulk(net_hotdata.skbuff_cache, gfp, n_skb, skbs); - if (unlikely(!n_skb)) - return -ENOMEM; - - return 0; -} -EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk); - struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct sk_buff *skb, struct net_device *dev)